Jeff Atwood summarizes Markdown and its problems perfectly:
- Markdown is as close to a perfect plain-text language for formatting that I’ve seen.
- It’s got a few minor, but gnawing, flaws in its specification.
- It’s got a few more flaws in its implementation (which is both clever and robust, but more the former than the latter).
- It’s got many alternative implementations — too many, and too fragmenting.
- It needs a more solid reference implementation, more involvement and more direction; most could be gained by rallying more help and by moving towards the bazaar end of the spectrum, but also needs a different tinge of strong leadership. (Markdown is incredibly “opinionated“, but not so much “release early-release often”.)
What Jeff does not mention is that it’s hard to write a parser for it. I know because I’ve been trying to write a small Perl 6 parser for it in the past few days. (One of the big improvements in Perl 6 is that regexes are basically really good tools for parsing — you write more or less BNF with heavily modified regex syntax and can cart along side-effects, and it’s good enough to parse Perl 6 itself. You might say that in Perl 6, jwz’s Razor is proven untrue.)
I think Markdown deserves as much as for someone that knows the language (including corner cases like a code block in three layers nested blockquotes) and knows parsers to sit down and design a good production for Markdown. The only thing better than a prescriptive heap of source code is a prescriptive algorithm, which is what it comes down to.
Also, John Gruber needs to be more forthcoming about whether the fixed flaws (which validity he has acknowledged) should be corrected or made optional.
~
Appendix: The current state of my Perl 6 Markdown parser. Public domain, for all I care; I’m sure there are better ways to arrange the parsing, but this is where I’ve gotten. I know that this doesn’t even get to most of the interesting block stuff, like blockquotes and nesting them, but that kind of recurring prefix carrying state is also something that’s hard to specify in a parser, which is why it’s not here yet.
use v6;
grammar Markdown {
token TOP {
<blox>*
}
regex blox {
[<block_construct>|[<inx>+?]] [\n|$]
}
regex block_construct {
<blank_line> | <link_label_spec> | <header_spec>
}
regex header_spec {
<setext_header_spec> | <atx_header_spec>
}
regex setext_header_spec {
^^ <header_text> \n <[=\-]>+ $$
}
regex atx_header_spec {
# Strip leading space from header? spec not clear
# ^^ \# ** 1..6 <[\ \t]>+ <header_text> \#* $$
^^ \# ** 1..6 <header_text> \#* $$
}
regex header_text {
[.*?]
}
regex blank_line {
^^\s*$$
}
regex link_label_spec {
^^\s ** 0..3 <link_label> ':'
<[\ \t]>+ <link_label_spec_url> \s*
<link_label_spec_title>? $$
}
regex link_label_spec_url {
('<' $<url>=[\N*?] '>') |
($<url>=[\N*?])
}
regex link_label_spec_title {
['\'' $<title>=[.*?] '\''] |
['"' $<title>=[.*?] '"'] |
['(' $<title>=[.*?] ')']
}
token link_label {
'[' $<label>=[.+?] ']'
}
regex inx {
<inline_construct> | <atom>
}
token inline_construct {
<link> | <inline_pres>
}
token inline_pres {
<em>
}
token inx_link_title {
<inline_pres>
}
regex link {
'[' $<inner>=[<inx>+?] ']' ' '? [<link_direct>|<link_deferred>]
}
token link_direct {
'(' .+? [$<title>=[' "' <inx_link_title> '"']]? ')'
}
token link_deferred {
<link_label>
}
token em {
(<[*_]> ** 1..2) $<inner>=[<inx>+?] $0
}
token atom {
['\\' $<ch>=<[*#.\\\-]> | $<ch>=.]
}
}
my $testcase = "x*y[z#\\#] [a]å*
__ab__c
[a]: /foo (title)
[b]: /foo
[c]: /bar \"title\"
[d]: /bar 'title'
d*ef*";
$testcase = "abc
===
### bfx";
my $match = Markdown.parse($testcase);
say $match.perl; # print the AST
Have you seen http://eigenclass.org/R2/writings/fast-extensible-simplified-markdown-in-ocaml ?
By Anon · 2010.01.01 04:37
Not that exactly, but I’ve seen that there already are a bunch of parsers. I just have no idea if they share a common production and which of them are more complete.
By Jesper · 2010.01.01 12:06