Discount

description

This is my implementation of John Gruber’s Markdown text to html language. There’s not much here that differentiates it from any of the existing Markdown implementations except that it’s written in C instead of one of the vast flock of scripting languages that are fighting it out for the Perl crown.

Markdown provides a library that gives you formatting functions suitable for marking down entire documents or lines of text, a command-line program that you can use to mark down documents interactively or from a script, and a tiny (1 program so far) suite of example programs that show how to fully utilize the markdown library.

My markdown also does, by default, various smartypants-style substitutions.

The program

The markdown program is a trivial compiler that reads in a markdown file and writes out a html document or — if you use the -d flag — an outline showing the parse tree. It does have a few options;

The library

There are 14 public functions in the markdown library, broken into three categories:

Input functions

  1. MMIOT *mkd_in(FILE *f, int flags) reads a markdown input file and returns a MMIOT containing the preprocessed document. (which is then fed to markdown() for final formatting.)

  2. MMIOT *mkd_string(char *bfr, int size, int flags) reads the markdown input file that’s been written into bfr and returns a preprocessed blob suitable for feedin to markdown(). This function exists because annotations uses mmap() to access message files instead of traditional file i/o. (If you're going to port Markdown to an AS/400, this function is the droid you've been looking for.)

“Big Picture”-style processing functions

  1. int markdown(MMIOT *doc, FILE *out, int flags) formats a document (created with mkd_in() or mkd_string()) and writes the resulting HTML document to out.

  2. int mkd_text(char*bfr, int size, FILE *out, int flags) formats the text string and writes the resulting HTML fragment to out. The differences from markdown() are it doesn’t support quoting, footnotes (“reference links”,) multiple paragraphs, lists, code sections, or pure html sections.

Fine-grained access to the internals

  1. int mkd_compile(MMIOT *doc, int flags) takes a document created by mkd_in() or mkd_string() and compiles it into a tree of block elements.

  2. int mkd_generatehtml(MMIOT *doc, FILE *out) generates html from a compiled document.

  3. int mkd_document(MMIOT *doc, char **text) returns (in text) a pointer to the compiled html document, and (in the return code) the size of that document. MKD_QUOT expansions are not done to this document — if you want those expansions, you must use mkd_generatehtml().

  4. int mkd_style(MMIOT *doc, FILE *out) prints any style blocks in the document.

  5. int mkd_generatetoc(MMIOT *doc, FILE *out) writes a table of contents to out; to get a table of contents, you must compile() the document with the MKD_TOC flag (described below)

  6. int mkd_dump(MMIOT *doc, FILE *f, int flags, char *title) prints a block structure diagram of a compiled document.

  7. void mkd_cleanup(MMIOT *doc) releases the MMIOT allocated for the document.

Document header access functions

  1. char *mkd_doc_title(MMIOT *doc) returns the % title line.
  2. char *mkd_doc_author(MMIOT *doc) returns the % author(s) line.
  3. char *mkd_doc_date(MMIOT *doc) returns the % date line.

The flags argument in markdown(), mkd_text(), mkd_in(), mkd_string(), mkd_compile(), and mkd_generatehtml() is a mask of the following flag bits:

MKD_NOLINKS
do not allow “<a” or expand “[][]” into a link.
MKD_NOIMAGE
do not allow “<img” or expand “![][]” into an IMG tag.
MKD_NOHTML
Disable all embedded html by replacing all <’s with &gt;.
MKD_NOPANTS
disable SmartyPants processing.
MKD_QUOT
expand "s into &quot;s.
MKD_NOHEADER
do not process the document header, but treat it like regular text..
MKD_TABSTOP
use the standard markdown tabstop of 4 characters, regardless of what the package is configured for.
MKD_NO_EXT
do not process pseudo-protocols.
MKD_STRICT
Disable relaxed emphasis and superscripts.
MKD_TOC
Generate table-of-contents headers (each generated <h1>, <h2>, etc will include a id="name" argument.)

Smartypants substitutions

  1. `` text '' is translated to “text”.
  2. "double-quoted text" becomes “double-quoted text”
  3. 'single-quoted text' becomes ‘single-quoted text’
  4. don't is “don’t.” as well as anything-else’t. (But foo'tbar is just foo'tbar.)
  5. And it's is “it’s,” as well as anything-else’s (except not foo'sbar and the like.)
  6. (tm) becomes ™
  7. (r) becomes ®
  8. (c) becomes ©
  9. 1/4th ? ¼th. Ditto for 1/4 (¼), 1/2 (½), 3/4ths (¾ths), and 3/4 (¾).
  10. ... becomes …
  11. . . . also becomes …
  12. -- becomes —
  13. - becomes – , but A-B remains A-B.
  14. A^B becomes AB.

Language extensions

My markdown was written so I could replace the fairly gross homemade text to html prettifier that I wrote for annotations, so I've extended it in a few ways; I've put support for paragraph centering in so that I don’t have to hand enter the <center> and </center> tags, I've added support for specifying image sizes, and I've written a not-earthshatteringly-horrible markup extension for definition lists.

Paragraph centering
To center a paragraph, frame it with -> and <-.
->this is a test<-

produces

<center>this is a test</center>
Specifying image sizes
An image size is defined by adding an additional =widthxheight field to the image tag:
 ![dust mite] (http://dust.mite =150x150)

produces

 <img src="http://dust mite" height=150 width=150 alt="dust mite">
Definition lists
To mark up a definition list, left-justify the label and frame it with = characters, then put the body of the list item on the next line, indented 4 spaces.
=hey!=
    This is a definition list

produces

<dt>hey!</dt>
<dd>This is a definition list</dd>

A definition list label is just a regular line of markdown code, so you can put links and images into it.

In discount 1.2.3, the definition list syntax has been extended so that you can define sequential <dt> blocks by doing

=tag1=
=tag2=
    data.
which generates
<dt>tag1</dt>
<dt>tag2</dt>
<dd>data.</dd>
alpha lists

Ordered lists with alphabetic labels (enabled by --enable-alpha-list during configuration) are supported in the same way that numeric ordered lists are:

a. first item
b. second item

generates

  1. first item
  2. second item
New pseudo-protocols for [] links

I wanted to be able to apply styles inline without having to manually enter the <span class="xxx"></span> html. So I redid the [][] code to support some new “protocols” within my markdown:

class:name
The label will be wrapped by <span class="name"></span>
id:name
The label will be wrapped by <a id="name"></a>
raw:text

Text will be written verbatim to the output. The protocol was inspired by a short thread on the markdown mailing list about someone wanting to embed LaTeX inside <!-- --> and finding, to their distress, that markdown mangled it.

Passing text through in comments seems to be a path to unreadable madness, so I didn’t want to do that. This is, to my mind, a better solution.

Style blocks
accept <style></style> blocks and set them aside for printing via mkd_style().
Class blocks
A blockquote with a first line of > %class% will become <div class="class"> instead of a <blockquote>.

How standard is it?

When I run the standard test suite from daringfireball, MarkdownTest.pl reports:

$ MARKDOWN_FLAGS=0x0204 ./MarkdownTest.pl --tidy --script=/usr/local/bin/markdown
Amps and angle encoding ... OK
Auto links ... OK
Backslash escapes ... OK
Blockquotes with code blocks ... OK
Hard-wrapped paragraphs with list-like lines ... OK
Horizontal rules ... OK
Inline HTML (Advanced) ... OK
Inline HTML (Simple) ... OK
Inline HTML comments ... OK
Links, inline style ... OK
Links, reference style ... OK
Literal quotes in titles ... OK
Markdown Documentation - Basics ... OK
Markdown Documentation - Syntax ... OK
Nested blockquotes ... OK
Ordered and unordered lists ... OK
Strong and em together ... OK
Tabs ... OK
Tidyness ... OK


19 passed; 0 failed.

Most of the “how to get standards compliant” changes that went in were cleaning up corner cases and blatant misreading of the spec, but there were two places where I had to do a horrible hack to get compliant:

  1. To pass the Hard-wrapped paragraphs with list-like lines test, I had to modify mkd_compile() so that it would have top-level paragraphs absorb adjacent list items, but I had to retain the old (and, IMO, correct) behavior of a new list forcing a block break within indented (quoted, inside lists) blocks..
  2. To pass the Markdown Documentation – Syntax test, I had to change the behavior of code block from “preserve trailing whitespace” to “preserve trailing whitespace unless it’s the first line in the block.”

Does this markdown treat tabs as 4 spaces?

By default, yes, it does. The habit of compensating for broken editors that give no way to indent except for tabbing by setting tabstops to 4 is so intertwined with this language that treating tabs properly would be the moral equivalent of dropping nuclear devices into the testsuite.

But if you use a proper tabstop (8 characters), you can configure markdown with --with-tabstop and it will expand tabs to 8 spaces. If you've configured your markdown like this (markdown -V will report TAB=8) and you need to mark up text from other sources, you can set the input flag MKD_TABSTOP to revert those documents back to the icky standard 4-space tab.

Source Code

To build discount, untar your selected tarball, cd into the directory it creates, then do configure.sh to generate your Makefiles. After doing this, a make should give you a functional stack of programs and libraries.

Discount builds, for me, on SLS Linux, MacOS 10.5, FreeBSD 4.8, and RHEL3. It may build on Windows with mingw, but I'm not sure about that.

Archived releases

older versions of the code are still available.

Trivia

  1. This document is generated from markdown source.
  2. I'm tracking markdown’s code bloat. Grab some popcorn, sit down, and enjoy watching a tiny tarball grow to gargantuan size.