% Discount -- a C implementation of the Markdown markup language % Jessica L. Parsons (orc@pell.portland.or.us) % #Discount >%class:tincup% >Discount is free software released under the terms of a >[BSD-style](COPYRIGHT.html) license. > >If you find it useful, please consider >making a contribution to help support onward development. > >[donate?](https://www.paypal.com/donate/?hosted_button_id=E9G5WQXEFNA8U) > ##download [Discount 2.2.7d](discount-2.2.7d.tar.bz2), released 22-Aug-2023 ##description This is my implementation of [John Gruber][]'s [Markdown][] text to html language. There's not much here that differentiates it from any of the existing Markdown implementations except that it's written in C instead of one of the vast flock of scripting languages that are fighting it out for the Perl crown. **Markdown** provides a library that gives you formatting functions suitable for marking down entire documents or lines of text, a command-line program that you can use to mark down documents interactively or from a script, and a tiny (3 programs so far) suite of example programs that show how to fully utilize the markdown library. My markdown also does, by default, various [smartypants][]-style substitutions. ###The program The `markdown` program is a trivial compiler that reads in a markdown file and writes out a html document or -- if you use the `-d` flag -- an outline showing the parse tree. It does have a few options; `-d` : is, as previously mentioned, the flag that makes markdown produce a parse tree instead of a html document. `-F ` : sets various [flags](#flags) that change how markdown works. The flags argument is a somewhat less than obvious bitmask -- for example, `-F 0x4` tells `markdown` to **not** do the [smartypants][] translations on the output. (there are cases -- like running the [test suite][] -- where this is a useful feature.) `-f ` : sets various [flags](#flags) that change how markdown works. Unlike `-F`, these are the names of the flags (you can get a list of the supported flags with the `-f?` option; supported flags + synonyms with `-Vf?` ) optionally prefixed with `no` or `-` to turn them off. To reuse the example to disable [smartypants][], you'd do `-f nopants` ("pants" is a synonym for "smarty" == smartypants.) `-o file` : tells markdown to write the output to *`file`* `-V` : tells you a markdown version number and how the package was configured. For example $ markdown -V markdown: discount 2.2.2 TAB=8 DEBUG tells you that this is markdown 2.2.2, and that the package was configured with support for sensible tabs & debugging malloc. `-VV` : is like `-V`, except it also returns the current values of many of the flags that can be set with `-f` or `-F`. ###The library There are 17 public functions in the [markdown][] library, broken into three categories: ####Input functions 1. `MMIOT *mkd_in(FILE *f, int flags)` reads a markdown input file and returns a MMIOT containing the preprocessed document. (which is then fed to `markdown()` for final formatting.) 2. `MMIOT *mkd_string(char *bfr, int size, int flags)` reads the markdown input file that's been written into `bfr` and returns a preprocessed blob suitable for feeding to `markdown()`. This function exists because [annotations][] uses `mmap()` to access message files instead of traditional file i/o. (If you're going to port Markdown to an AS/400, this function *is* the droid you've been looking for.) ####"Big Picture"-style processing functions 1. `int markdown(MMIOT *doc, FILE *out, int flags)` formats a document (created with `mkd_in()` or `mkd_string()`) and writes the resulting HTML document to `out`. 2. `int mkd_line(char *bfr, int size, char **out, int flags)` allocates a buffer, then formats the text string into that buffer. text string, allocates a buffer, The differences from `markdown()` are it doesn't support quoting, footnotes ("reference links",) multiple paragraphs, lists, code sections, or pure html sections. 3. `int mkd_generateline(char*bfr, int size, FILE *out, int flags)` formats the text string and writes the resulting HTML fragment to `out`. It is exactly like `mkd_line()` except that it writes the output to a `FILE*`. ####Fine-grained access to the internals 1. `int mkd_compile(MMIOT *doc, int flags)` takes a document created by `mkd_in()` or `mkd_string()` and compiles it into a tree of block elements. 2. `int mkd_generatehtml(MMIOT *doc, FILE *out)` generates html from a compiled document. 2. `int mkd_document(MMIOT *doc, char **text)` returns (in `text`) a pointer to the compiled html document, and (in the return code) the size of that document. 2. `int mkd_css(MMIOT *doc, char **out)` allocates a buffer and populates it with any style blocks found in the document. 2. `int mkd_generatecss(MMIOT *doc, FILE *out)` prints any style blocks in the document. 3. `int mkd_toc(MMIOT *doc, char **out)` allocates a buffer, populates it with a table of contents, assigns it to out, and returns the length of the buffer. To get a table of contents, you must `compile()` the document with the `MKD_TOC` flag (described below) 3. `int mkd_generatetoc(MMIOT *doc, FILE *out)` writes a table of contents to _out_; other than writing to a `FILE*`, it operates exactly like `mkd_toc()` 3. `int mkd_dump(MMIOT *doc, FILE *f, int flags, char *title)` prints a block structure diagram of a compiled document. 4. `void mkd_cleanup(MMIOT *doc)` releases the MMIOT allocated for the document. ####[Document header](id:document-header) access functions 1. `char *mkd_doc_title(MMIOT *doc)` returns the `% title` line. 2. `char *mkd_doc_author(MMIOT *doc)` returns the `% author(s)` line. 3. `char *mkd_doc_date(MMIOT *doc)` returns the `% date` line. ####[Url callback](id:callback) functions 1. `void mkd_e_url(MMIOT*, char* (callback)(char*,int,void*))` sets up a callback function that is called whenever [discount] processes a `[]()` or `` construct. The callback function is passed a pointer to the url, the size of the url, and a data pointer (null or supplied by `mkd_e_data()`) 2. `void mkd_e_flags(MMIOT*, char *(callback)(char*,int,void*))` sets up a callback to provide additional arguments to the tags generated by `[]()` and `` constructs. If, for instance, you wanted to add `target="_blank"` to every generated url, you could just make a callback function that returned that string. 3. `void mkd_e_code(MMIOT*, char *(callback)(char*,int,void*))' sets up a callback to format the contents of a code block. 3. `void mkd_e_free(char *, void*)` is called to free any allocated memory returned by the url or flags callbacks. 4. `void mkd_e_data(MMIOT*, void*)` assigns a callback data area to the url & flags callbacks. [The flags argument](id:flags) in `markdown()`, `mkd_text()`, `mkd_in()`, `mkd_string()`, `mkd_compile()`, and `mkd_generatehtml()` is a mask of the following flag bits: [Flag](id:mkd_flags) | Action ------------------------|------------------------------------------ MKD_NOLINKS | Don't do link processing, block `` tags MKD_NOIMAGE | Don't do image processing, block `` MKD_NOPANTS | Don't run `smartypants()` MKD_NOHTML | Don't allow raw html through **AT ALL** MKD_STRICT | Disable `SUPERSCRIPT`, `RELAXED_EMPHASIS` MKD_TAGTEXT | Process text to go inside an html tag; no emphasis or html expansion & embedded html will be stripped out. MKD_NO_EXT | Don't allow pseudo-protocols MKD_CDATA | Generate code for xml `![CDATA[...]]` MKD_NOSUPERSCRIPT | No `A^B` MKD_NORELAXED | Emphasis happens _everywhere_ MKD_NOTABLES | Don't process [PHP Markdown Extra] tables. MKD_NOSTRIKETHROUGH | Forbid `~~strikethrough~~` MKD_TOC | Do table-of-contents processing MKD_1_COMPAT | Compatability with MarkdownTest_1.0 MKD_AUTOLINK | Make `http://foo.com` a link even without `<>`s MKD_SAFELINK | Paranoid check for link protocol MKD_NOHEADER | Don't process document headers MKD_TABSTOP | Expand tabs to 4 spaces MKD_NODIVQUOTE | Forbid `>%class%` blocks MKD_NOALPHALIST | Forbid alphabetic lists MKD_NODLIST | Forbid definition lists MKD_EXTRA_FOOTNOTE | Enable [PHP Markdown Extra]-style [footnotes] (warning: not the later version of multiple-paragraph ones.) MKD_NOSTYLE | Don't extract `` blocks and set them aside for printing via `mkd_style()`. [Class blocks](id:classblock) : A blockquote with a first line of `> %class%` will become `
` instead of a `
`. [Tables](id:tables) : [PHP Markdown Extra]-style tables are supported; aaa | bbbb -----|------ hello|sailor becomes the following table: aaa | bbbb -----|------ hello|sailor And much of the rest of the current table syntax (alignment, handling of orphan columns) follows the [PHP Markdown Extra] spec. [Document Headers](id:headers) : [Pandoc]-style document headers are supported; if the first three lines in the document begin with a `%` character, they are taken to be a document header in the form of % Document title % Document author % Document date and can be retrieved by the [library functions](id:document_header) `mkd_doc_title()`, `mkd_doc_author()`, and `mkd_doc_date()`. Note that I implement Pandoc document headers as they were documented in 2008; any Pandoc changes since then will not be reflected in my implementation. [Fenced code blocks](id:fencedcodeblocks) : If called with the `MKD_FENCEDCODE` option, [Pandoc]-style fenced code blocks are supported; blocks of code wrapped in `~~~` lines are treated as code just as if it was indented the traditional 4 spaces. Github-flavored-markdown fenced code blocks (blocks wrapped in backtick lines) are also supported. Both of these formats support the github-flavored-markdown class extension where you can put a word at the end of the opening backtick line and have the block given that class. [Embedded LaTeX (mathjax)](id:latex) : If called with the `MKD_LATEX` option, text wrapped in `$$`...`$$`, `\[`...`\]`, and `\(`...`\)` is passed unchanged (except for encoding `<`, `>`, and `&`) to the output for processing by a LaTeX renderer. This collides with how Markdown escapes '[', ']', '(', and ')' -- if discount is called with `MKD_LATEX`, `\(` and `\[` will only map to `(` and `[` if corresponding `\)` or `\]`s are **not** found in the same paragraph. [Github checkbox list items](id:checkbox) : If configured with the `--github-checkbox` flag, discount will understand [github-style checkboxes](https://github.github.com/gfm/#task-list-items-extension-) and generate checkboxes using either html entities (`--github-checkbox` w/o an argument) or `` elements (`--github-checkbox=input`) ###How standard is it? When I run the [standard test suite (version 1.0.3)][test suite] from daringfireball, `MarkdownTest.pl` reports: > $ MARKDOWN_FLAGS=0x20004 ./MarkdownTest.pl --tidy --script=/usr/local/bin/markdown > Amps and angle encoding ... OK > Auto links ... OK > Backslash escapes ... OK > Blockquotes with code blocks ... OK > Code Blocks ... OK > Code Spans ... OK > Hard-wrapped paragraphs with list-like lines ... OK > Horizontal rules ... OK > Inline HTML (Advanced) ... OK > Inline HTML (Simple) ... OK > Inline HTML comments ... OK > Links, inline style ... OK > Links, reference style ... OK > Links, shortcut references ... OK > Literal quotes in titles ... OK > Markdown Documentation - Basics ... OK > Markdown Documentation - Syntax ... OK > Nested blockquotes ... OK > Ordered and unordered lists ... OK > Strong and em together ... OK > Tabs ... OK > Tidyness ... OK > > > 22 passed; 0 failed. When I run the [old standard test suite][old test suite] from daringfireball, `MarkdownTest.pl` reports: > $ MARKDOWN_FLAGS=0x22004 ./MarkdownTest.pl --tidy --script=/usr/local/bin/markdown > Amps and angle encoding ... OK > Auto links ... OK > Backslash escapes ... OK > Blockquotes with code blocks ... OK > Hard-wrapped paragraphs with list-like lines ... OK > Horizontal rules ... OK > Inline HTML (Advanced) ... OK > Inline HTML (Simple) ... OK > Inline HTML comments ... OK > Links, inline style ... OK > Links, reference style ... OK > Literal quotes in titles ... OK > Markdown Documentation - Basics ... OK > Markdown Documentation - Syntax ... OK > Nested blockquotes ... OK > Ordered and unordered lists ... OK > Strong and em together ... OK > Tabs ... OK > Tidyness ... OK > > > 19 passed; 0 failed. Most of the "how to get standards compliant" changes that went in were cleaning up corner cases and blatant misreading of the spec, but there were two places where I had to do a horrible hack to get compliant: 1. To pass the **Hard-wrapped paragraphs with list-like lines** test, I had to modify `mkd_compile()` so that it would have top-level paragraphs absorb adjacent list items, but I had to retain the old (and, IMO, _correct_) behavior of a new list forcing a block break within indented (quoted, inside lists) blocks.. 2. To pass the **Markdown Documentation - Syntax** test in MarkdownTest 1.0, I had to change the behavior of code blocks from "preserve trailing whitespace" to "preserve trailing whitespace *unless* it's the first line in the block." From version 1.3.3 on, this is no longer the default, but the flag `MKD_1_COMPAT` (0x2000) turns it on again for testing purposes. ###Does this markdown treat tabs as 4 spaces? By default, yes, it does. The habit of compensating for broken editors that give no way to indent except for tabbing by setting tabstops to 4 is so intertwined with this language that treating tabs properly would be the moral equivalent of dropping nuclear devices into the testsuite. But if you use a proper tabstop (8 characters), you can configure markdown with **`--with-tabstop`** and it will expand tabs to 8 spaces. If you've configured your markdown like this (`markdown -V` will report **`TAB=`**8) and you need to mark up text from other sources, you can set the input flag **`MKD_TABSTOP`** to revert those documents back to the icky standard 4-space tab. ##Source Code > To build discount, untar your selected tarball, cd into the > directory it creates, then do `configure.sh` to generate > your Makefiles. After doing this, a `make` should give you > a functional stack of programs and libraries. > > Discount builds, for me, on MacOS 10.12, FreeBSD 4.8, NetBSD 8, Minix 3, > and Debian Linux (dunno which version, but it's a `systemd` joint that's > running a 3.10 kernel.) It may build on SLS Linux and Windows with > mingw, but I'm not sure about that. * [[Version 2.2.7d](discount-2.2.7c.tar.bz2)](id:v2.2.7d) One more maintenance release to fix a few more bugs: * markdown extra footnotes were slightly broken -- two adjacent footnotes (`[^1][^2]`) were being treated as a regular old hyperlink because the code I put in originally was a huge old monster bodge. * change the description for -fstrict. * add a fistful of test cases for the new! improved! Markdown.pl compatible `` * rework the tag handler to more closely adhere to Markdown.pl's observed behavior; also treat incomplete tags as actual tags in compatability mode (-fstrict) * fix weird behavior on freebsd 4.8 w/ gcc 2.95.4; if an \#ifdef ... #else ... #endif wraps the end of a `if ()` it pukes up an error about a syntax error in a macro; pull those tests out and conditionally #define a macro instead * the behavior of tags-starting-with-alpha-/-! was not like that of the reference implementation; fixed (in a very script-kiddy leaky fashion which I can't resolve with a runtime flag thanks to the 32 bit flags field in the 2.x series :-( * [[version 2.2.7b](discount-2.2.7b.tar.bz2)](id:v2.2.7b) _Another_ maintenance release to fix a bug in the Makefile; I didn't have the proper dependencies for the pandoc_headers tool, so if a parallel make was tried it would fall over complaining about missing objects when it tried to link everything together. * [[version 2.2.7](discount-2.2.7.tar.bz2)](id:v2.2.7) A maintenance release to clean up a buffer overflow when generating label names (backported from the new v3 under [slow] development) * [[version 2.2.6](discount-2.2.6.tar.bz2)](id:v2.2.6) should have been released last fall, but I was too busy working on breaking the published interface so I could expand the flag structure to permit more than 32 feature flags. _\*sigh\*_ It's not a huge release; it fixes a few bugs, tweaks a few things, and adds a couple of features: =bugs= * deal with cached text before handling fenced code blocks (because otherwise it would accumulate the contents of things between the blocks and spit them out afterwards.) * don't give a code block class if it's just an empty string * mkd2html wasn't passing pgm to hoptusage() * mkd2html wasn't properly handling two arguments for input & output files. While I was at it, add the -G option to use gfm_in() to read the input file in (and preserve newlines) * push the fencedcode block handler up to the toplevel scanner so it will handle blank lines inside the fence (it was splitting the code block up into paragraphs and treating them as text.) =features= * add an anchor format callback, plus add the -x option to the markdown program to squash toc anchors to (close to) github compatability * add an external code block formatter to the markdown program * clean up all references to flags; define them EVERYWHERE as mkd_flag_t * support external code formatters =tweaks= * configure to 1. use the modern standard uintXX_t types for DWORD, WORD, BYTE if possible and 3. if a program is defined via an environment variable (like CC) try to use that variable instead of doing a path search in AC_PROG * make a test program to examine pandoc header elements * patch the cmakefile so that has an option to only generate a library * set the Plan9 CFLAGS to the two extensions (instead of having them be part of the CC macro) and wipe out any CFLAGS (like -T, which tells the Plan9 posix C compiler to be stupid) that already exists * the Makefile macros a bit so people can wedge in their own compilers (BUILD) and linkers (LINK) during a build * when dumping the parse tree, say which header is being dumped -- h1..h6 vs just header * [[version 2.2.4](discount-2.2.4.tar.bz2)](id:v2.2.4) fixes a couple of small buglets, adds a couple of new features, and tweaks the html5 support module to chase the html5 standard: 1. (**bugfix**) when splitting a line, null terminate the new line; also don't inherit the dle from the previous line 2. (**bugfix**) fix the broken footnote code; allow multi-paragraph footnotes, not just one line of footnote. 3. (**feature**) support github checkbox list items (static configuration option) a. `--github-checkbox` does the checkbox with html check entities b. `--github-checkbox=input` does them with html `` elements (set to disabled so that people can't check and uncheck willy-nilly) 4. (**feature**) on systems that support it, use the 'destructor' attribute on `mkd_shlib_destructor()` so it will run automatically when the library unloads. 5. (**feature**) add `--cxx-binding` option to `configure.sh` to generate a `mkdio.h` that's got an `extern "C"` wrapper around it. 6. (**tweak**)delete `HGROUP` from the list of html5 elements, because the w3c working group decided to [punt it](http://lists.w3.org/Archives/Public/public-html-admin/2013Apr/0003.html) The bugfix for not null-terminating a split line was reported by Github user [fCorleone](https://github.com/fCorleone), who was running a input mangler program ([afl](http://lcamtuf.coredump.cx/afl/)) to stress discount by feeding random garbage to it. It saw that the `splitline()` function (used when breaking out embedded chunks of html) was not null-terminating the split line chunks, and was throwing errors on that. (this is something that `clang` on modern macos catches if you use the `-fsanitize=address` option.) I never actually use footnotes, so I'd been running for about a decade assuming that footnotes were one or maybe two lines, but github user [somasis](https://github.com/somasis) pointed out that this was wrong wrong wrong! Ooops; at least it was fixable w/o much pain and suffering. `HGROUP` was caught by (no longer registered) github user Crypto-Anarchist in their own branch of discount, so I cherry-picked that changeset and pulled it back into the mainline discount. * [[version 2.2.3a](discount-2.2.3a.tar.bz2)](id:v2.2.3a) 2.2.3 has a configuration glitch (not properly testing for the existance of S_ISSOCK, et seq) which 2.2.3a corrects. I also pulled the plug on the single use of `alloca()` in `theme.c` (Windows MinGW doesn't support it properly(?) and in this one case I'm better off just `malloc()`ing the offending thing and letting it be garbage collected when the program finishes.) * [[version 2.2.3](discount-2.2.3.tar.bz2)](id:v2.2.3) I turn around and blink, and suddenly 13 months have gone by while I slowly tested various beta versions of 2.2.3, and now it's 2018 and a small collection of cosmetic, portability, and build tweaks, plus some and actually bad bugs have been cleaned up: 1. Have `tools/branch` be a no-op if there's no git that I can use to check for my SCCS status. 3. Add user-contributed cmake support 4. check for non-null T(link) in the safelink function; rename paranoia.t to safelink.t, add a couple more tests for safelinks 5. tweak the safelink detection code to allow more types of url fragments 6. when finding the installer, check that -s works (doesn't work on Minix 3?) 7. Correct makepage & theme to use the new set_flag() return scheme (0==success, !0==pointer to bogus flag) 8. if an unknown flag was in the middle of a comma-delimited flag string (like -flatex,bogus,footnote), the markdown program would incorrectly report the first flag as unknown (because set_flag returned 0 on error, 1 on success and the strtok() of the flag string had already replaced the commas up to that point with nulls.) Change it so that set_flag returns null on successful processing and a pointer to the offending flag on an unknown one. 9. Tweak the install rule to install a GNU-style .pc file iff @MK_PKGCONFIG@ is not defined as # 10. Hand-resolve a conflict in the msvc Makefile 12. Add in paranoid list splitting [EXPLICITLIST] (the default behavior many many versions ago before I realized it wasn't the standard) (and take the last of the 32-bit flag mask for it) between adjacent ordered and unordered lists. 13. Create a utility function [`notspecial()`] for theme & mkd2html -- check a filename to see if it's a special file that doesn't need to be deleted or to have a .html suffix added to it. It only works on machines that have a stat() system call + the S_ISFIFO, S_ISCHR, and S_ISSOCK macros, otherwise it thinks that nothing is special. 14. Add a README for the utilities in the tools subdirectory 15. Add the git branch name into the version string if we're not on the master branch. 16. Add support for NMAKE and Visual C++ toolset. (courtesy of Martin Hofmann (tin-pot) fork of Discount). 17. Add a 'dirty' flag to the Document structure & set it whenever a callback changes (github issue #136) so that the next mkd_compile will regenerate the document. 18. Tweak superscript handling to be able to superscript something wrapped in html 19. Tweak the `show_flags()` function so that if it's called verbosely it will show synonyms for named flags. (Calling it verbosely is done by the `V` option, which is overloaded for verbosity when listing flags.) 20. Process html blocks in compile (as well as in compile_document(); `compile_document()` needs to handle `