This is my implementation of John Gruber’s Markdown text to html language. There’s not much here that differentiates it from any of the existing Markdown implementations except that it’s written in C instead of one of the vast flock of scripting languages that are fighting it out for the Perl crown.
Markdown provides a library that gives you formatting functions suitable for marking down entire documents or lines of text, a command-line program that you can use to mark down documents interactively or from a script, and a tiny (1 program so far) suite of example programs that show how to fully utilize the markdown library.
My markdown also does, by default, various smartypants-style substitutions.
The markdown program is a trivial compiler that reads in a markdown
file and writes out a html document or — if you use the -d flag —
an outline showing the parse tree. It does have a few options;
-d is, as previously mentioned, the flag that makes markdown
produce a parse tree instead of a html document.-F <flags> sets various flags that change
how markdown works. The flags argument is a somewhat less
than obvious bitmask — for example, -F 0x4 tells markdown
to not do the smartypants translations on the output.
(there are cases — like running the test suite — where
this is a useful feature.)-o file tells markdown to write the output to file-V tells you a markdown version number and how the package
was configured. For example
$ markdown -V
markdown: discount 1.0.0 DL_TAG HEADER TAB=8
tells you that this is markdown 1.0.0, and that the package
was configured with support for definition lists,
pandoc document headers, and sensible
tabs.There are 14 public functions in the markdown library, broken into three categories:
MMIOT *mkd_in(FILE *f, int flags) reads a markdown input file
and returns a MMIOT containing the preprocessed document.
(which is then fed to markdown() for final formatting.)
MMIOT *mkd_string(char *bfr, int size, int flags) reads the
markdown input file that’s been written into bfr and
returns a preprocessed blob suitable for feedin to markdown().
This function exists because annotations uses mmap() to
access message files instead of traditional file i/o. (If
you're going to port Markdown to an AS/400, this function is
the droid you've been looking for.)
int markdown(MMIOT *doc, FILE *out, int flags) formats a document
(created with mkd_in() or mkd_string()) and writes the
resulting HTML document to out.
int mkd_text(char*bfr, int size, FILE *out, int flags) formats the
text string and writes the resulting HTML fragment to out.
The differences from markdown() are it doesn’t support quoting,
footnotes (“reference links”,) multiple paragraphs, lists, code
sections, or pure html sections.
int mkd_compile(MMIOT *doc, int flags) takes a document created by
mkd_in() or mkd_string() and compiles it into a tree of block
elements.
int mkd_generatehtml(MMIOT *doc, FILE *out) generates html from
a compiled document.
int mkd_document(MMIOT *doc, char **text) returns (in text) a
pointer to the compiled html document, and (in the return code)
the size of that document. MKD_QUOT expansions are not done
to this document — if you want those expansions, you must use
mkd_generatehtml().
int mkd_style(MMIOT *doc, FILE *out) prints any style blocks in
the document.
int mkd_generatetoc(MMIOT *doc, FILE *out) writes a table of contents
to out; to get a table of contents, you must compile() the document
with the MKD_TOC flag (described below)
int mkd_dump(MMIOT *doc, FILE *f, int flags, char *title) prints
a block structure diagram of a compiled document.
void mkd_cleanup(MMIOT *doc) releases the MMIOT allocated for the
document.
char *mkd_doc_title(MMIOT *doc) returns the % title line.char *mkd_doc_author(MMIOT *doc) returns the % author(s) line.char *mkd_doc_date(MMIOT *doc) returns the % date line.The flags argument in
markdown(), mkd_text(), mkd_in(),
mkd_string(), mkd_compile(), and
mkd_generatehtml() is a mask of the following flag bits:
MKD_NOLINKS<a” or expand “[][]” into a link.MKD_NOIMAGE<img” or expand “![][]” into
an IMG tag.MKD_NOHTML<’s with >.MKD_NOPANTS MKD_QUOT"s into "s.MKD_NOHEADERMKD_TABSTOPMKD_NO_EXTMKD_STRICTMKD_TOC<h1>, <h2>, etc will include a id="name" argument.)text '' is translated to “text”."double-quoted text" becomes “double-quoted text”'single-quoted text' becomes ‘single-quoted text’don't is “don’t.” as well as anything-else’t.
(But foo'tbar is just foo'tbar.)it's is “it’s,” as well as anything-else’s
(except not foo'sbar and the like.)(tm) becomes ™(r) becomes ®(c) becomes ©1/4th ? ¼th. Ditto for 1/4 (¼), 1/2 (½),
3/4ths (¾ths), and 3/4 (¾).... becomes …. . . also becomes …-- becomes — - becomes – , but A-B remains A-B.A^B becomes AB.My markdown was written so I could replace the fairly gross homemade
text to html prettifier that I wrote for annotations, so I've extended
it in a few ways; I've put support for paragraph centering in
so that I don’t have to hand enter the <center> and </center> tags,
I've added support for specifying image sizes, and I've written a
not-earthshatteringly-horrible markup extension for definition lists.
-> and <-.
->this is a test<-produces
<center>this is a test</center>
=widthxheight field to the image tag:
![dust mite] (http://dust.mite =150x150)produces
<img src="http://dust mite" height=150 width=150 alt="dust mite">
= characters, then put the body of the list
item on the next line, indented 4 spaces.
=hey!= This is a definition listproduces
<dt>hey!</dt> <dd>This is a definition list</dd>
A definition list label is just a regular line of markdown code, so you can put links and images into it.
In discount 1.2.3, the definition list syntax has been
extended so that you can define sequential <dt> blocks by doing
=tag1=
=tag2=
data.
which generates
<dt>tag1</dt>
<dt>tag2</dt>
<dd>data.</dd>
Ordered lists with alphabetic labels (enabled by --enable-alpha-list
during configuration) are supported in the same way that numeric ordered
lists are:
a. first item
b. second item
generates
I wanted to be able to apply styles inline without having
to manually enter the <span class="xxx">…</span>
html. So I redid the [][] code to support some new
“protocols” within my markdown:
class:name<span class="name">…</span>id:name<a id="name">…</a>raw:textText will be written verbatim to the output. The protocol
was inspired by a short thread on the markdown mailing list
about someone wanting to embed LaTeX inside <!-- --> and
finding, to their distress, that markdown mangled it.
Passing text through in comments seems to be a path to unreadable madness, so I didn’t want to do that. This is, to my mind, a better solution.
<style>…</style> blocks and set them aside for printing
via mkd_style().> %class% will become
<div class="class"> instead of a <blockquote>.When I run the standard test suite from daringfireball,
MarkdownTest.pl reports:
$ MARKDOWN_FLAGS=0x0204 ./MarkdownTest.pl --tidy --script=/usr/local/bin/markdown Amps and angle encoding ... OK Auto links ... OK Backslash escapes ... OK Blockquotes with code blocks ... OK Hard-wrapped paragraphs with list-like lines ... OK Horizontal rules ... OK Inline HTML (Advanced) ... OK Inline HTML (Simple) ... OK Inline HTML comments ... OK Links, inline style ... OK Links, reference style ... OK Literal quotes in titles ... OK Markdown Documentation - Basics ... OK Markdown Documentation - Syntax ... OK Nested blockquotes ... OK Ordered and unordered lists ... OK Strong and em together ... OK Tabs ... OK Tidyness ... OK 19 passed; 0 failed.
Most of the “how to get standards compliant” changes that went in were cleaning up corner cases and blatant misreading of the spec, but there were two places where I had to do a horrible hack to get compliant:
mkd_compile() so that it would have top-level
paragraphs absorb adjacent list items, but I had to retain the
old (and, IMO, correct) behavior of a new list forcing a block
break within indented (quoted, inside lists) blocks..By default, yes, it does. The habit of compensating for broken editors that give no way to indent except for tabbing by setting tabstops to 4 is so intertwined with this language that treating tabs properly would be the moral equivalent of dropping nuclear devices into the testsuite.
But if you use a proper tabstop (8 characters), you can configure
markdown with --with-tabstop and it will expand tabs to 8
spaces. If you've configured your markdown like this (markdown -V
will report TAB=8) and you need to mark up text from other
sources, you can set the input flag MKD_TABSTOP to revert those
documents back to the icky standard 4-space tab.
To build discount, untar your selected tarball, cd into the directory it creates, then do
configure.shto generate your Makefiles. After doing this, amakeshould give you a functional stack of programs and libraries.Discount builds, for me, on SLS Linux, MacOS 10.5, FreeBSD 4.8, and RHEL3. It may build on Windows with mingw, but I'm not sure about that.
version 1.3.0 documents the 1.3.0 changes, so that they aren’t simply little magic cookies left in the code as obstacles for the unwary.
version 1.3.0pre4 corrects a bunch of edge cases and misfeatures, and adds in a small raft of new experimental features, such as:
mkd_xhtmlpage() function suggested by Tim Channon.
mkd_xhtmlpage() takes
a markdown document and attempts to build a complete xhtml page out of
it, inserting the contents of any headers and
style blocks in the correct places.version 1.2.10 corrects a tiny
bug in [title](url) and  links, where if the [title]
was empty the tag wouldn’t be expanded. Admittedly, it’s a good idea
to actually put a meaningful title in a link or image, but there’s no
reason to actually require it and I've cleaned up the code to reflect
that.
version 1.2.9, which includes:
MKD_NOHTML, which blocks all embedded html and
forces the user to use the markup language for doing markup.RELAXED_EMPHASIS so that it only applies to underscores
in the middle of words. This breaks the emphasis rules for things
like a_b
(or, if you're describing code, thing like size_t or time_t
or …) which may not be a problem for you if you're writing
about code.# FOO won’t generate
<h1> FOO</h1>Version 1.2.9 was originally version 1.2.8, except that 1.2.8 was incorrectly published after git didn’t push the 1.2.8 changes back to my master repository. It pushed the VERSION file, which was something, but not the actual code changes.
version 1.2.7 contains two bugfixes and one new feature:
strget() not properly returning EOF
because it was returning a char or unsigned int instead
of an int (should be a signed int, but that way lies C99
madness.) This bug was reported by A.S.Bradbury, who discovered
it when trying to build markdown with a 64 bit version of gcc.![]() and/or
[](), where the content parser stops parsing 4 (for IMG) or 5
(for HTML) characters before the end of the embedded content.
This bug was reported by Christian Herenz, who has been finding
many many boundary condition bugs in the way ![]() and []()
(don’t) work.--enable-superscript
turns on fancypants substitutions that translate word^word
into word<sup>word</sup>. I'm not sure if it’s the most
useful thing in the world, but I have found cases where I'd
like to be able to get superscripts generated just by themselves.
(and since I don’t code in PASCAL any more, I don’t need to
worry about tripping over a dereferencing ^ as much as I
used to worry about it.)version 1.2.6 contains patches
to work around some of the broken features in Xcode on macos 10.5.
The AC_SUB function in configure.inc has been reworked to not
do echo -n (because that one spits out ^? when I echo an empty
string,) and I've had to write my own echo because the builtin
echo that the Xcode /bin/sh (actually the gnu clone shell) uses
no longer accepts -n as the “do not put a newline at the end of
this line, k?” switch.
There’s also another tiny bugfix: I discovered that if I had a
line of the form [link][] text text [link][]: ..., it would
get treated like a footnote despite obviously not being one.
This release corrects the isfootnote() function in markdown.c
so it won’t do that anymore.
version 1.2.5 fixes a small
bug inside <url>’s, where the url would be parsed as if it was
text instead of being treated as code.
version 1.2.4 fixes a bug
which caused builds to fail if --enable-dl-tag was not set;
I was combining code that required WITH_DL_TAG and code that
didn’t, so a isdefinition() call crept into unDL_TAG territory.
To fix that bug, I moved the test for WITH_DL_TAG into isdefinition()
and worked the code so that isdefinition() always returns 0 if the
code was configured without --enable-dl-tag.
version 1.2.3 fixes two
bugs in the markdown program, fixes one bug inside the markdown
library, adds support for testing mkd_text() inside the markdown
program (-t text processes text with mkd_text(),) and adds
support for sequential <dt> blocks with sequential =tag= labels.
version 1.2.2 fixes two bugs;
[] sections.version 1.2.1 cleans up a few more memory
leaks, adds a new function (mkd_document(), which returns a pointer
to the compiled document,) adds the new configuration option --enable-amalloc,
which builds against a debugging memory allocator, and, finally, it
rips into the guts of the second-pass of the compiler and rearranges
how * and ** are compiled to (hopefully) guarantee that they will
never generate incorrect XML.
The “rip into the guts” part adds a half-pass to the second pass of
the compiler; htmlify() now writes into a linked list of text and
emphasis tokens, which are then flattened into the output document
by the new function emblock() which attempts to match emphasis
blocks up properly instead of the old-style “generate <strong> and
<em> blindly as we roll on down the line.”)
version 1.2.0 has two subtle output change,
plus a bugfix that steps on a memory leak in codeblock().
The output changes (which caused the minor# to go from 1 to 2) are
<html tag> are now parsed as code, not as
text. This keeps <a href="linky"> from being written as
<a href=“linky”>, and may thus confuse source
that was written with \"’s to get around the smartypants
extensions.<email@address> automatic links, ensure that
the human-readable part of the link doesn’t include a mailto:,
just the mail address.older versions of the code are still available.