The Future of Markdown

codinghorror · October 25, 2012, 12:00am

Markdown is a simple little humane markup language based on time-tested plain text conventions from the last 40 years of computing.

This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2012/10/the-future-of-markdown.html

Ile · October 25, 2012, 12:00am

I don’t know Markdown fully, but reading the example, there was the text _code_ which I though and hoped would parse into an underlined text, but it didn’t - it was italicized.

Would it be anyhow possible to make _text_ parse into an underlined text? I think that’s what two underscores represent very well.

Pvorb · October 25, 2012, 12:00am

Please consider to talk to John McFarlane, the creator of Pandoc http://johnmacfarlane.net/pandoc/README.html, which adds some nice features (e.g. tables, definition lists) to the traditional Markdown format.

Yjmbo · October 25, 2012, 12:00am

I would strongly recommend a hard look as asciidoc (http://www.methods.co.nz/asciidoc/) as a mature (10 year+) and extensible text markup format – and as a mature bunch of code to process it.

If you leave out most of the features, you can make it look like Markdown, too

Lilleyt · October 25, 2012, 12:00am

I’m reminded of the guy who decides that there should be one standard because there are n divergent implementations. So he goes and writes his own. Now there are n+1 divergent implementations.

Of course, I understand that he’s talking about a process to go along with a blessed and convergent implementation, but by throwing out there the goal of creating an implementation as the goal of this standardization process I think may be premature. It’s so easy for people to get dragged into the idea of getting something done that everyone simply moves in their own direction again and the convergence doesn’t happen.

I’m not saying that’s what he’s calling for, but I’d be careful. Rather than an implementation, I think what’s needed is the will of the major consumers (sounds like you’re on board, so that’s a good start), a good sense of compromise, a willingness to recognize the desirable features which have evolved to address actual shortcomings of John’s original spec, and a discipline about preventing feeping creaturism.

Most of all, however, I would say that there needs to be a concrete and formal grammar. This should be the goal and distillation of all of that process. Tests yes, of course, hand in hand with it, but a formal grammar which eliminates all ambiguity in the language (and therefore in the hopefully many standards-compliant implementations).

I would propose formalizing the language in a Parser Expression Grammar. There’s great tooling available (even in js), PEGs are very comprehensible, and in fact, it’s already been done more than once already. What’s lacking is a blessed PEG and implementations of the same spec in multiple languages.

I can’t help with any of those things, but I can help with a couple technical observations.

In my book, kramdown is the current best-of-breed. A spec needn’t be quite so ambitious, but I find support for element attributes and basic table syntax to be essential.
Pandoc has the tightest and most complete implementation, albeit in Haskell. A good start would be to lift the PEG from it. There is one other PEG floating around, but I couldn’t name it off my head and it’s not as rich as Pandoc’s.

FyodorS · October 25, 2012, 12:00am

Interstingly enough, Markdown standardisation topic just took off today at Markdown discussion list. It is happening here: http://markdown.github.com

I personally believe that writing raw Markdown markup is not for everyone. That’s why we are building a true WYSIWYM editor for Markdown. http://www.texts.io/

Olpersonality · October 25, 2012, 12:00am

What about reStructuredText?

Tim_Post · October 25, 2012, 12:00am

I don’t think tables are a good fit for markdown. I’ve worked with Pandoc and Asciidoc as well as most wiki variants like Creole and Mediawiki. Using text to describe a table structure horribly sucks. If you want people to make tables, allow the basic HTML needed for them to do it.

I would really love to see a single unified implementation that could just become a part of popular languages. If you use Ruby, Python, PHP, Perl or whatever else you have in your toolbox, there’s libraries to handle markdown with many behaving differently in subtle ways (like the gotchas you list).

A simple testable implementation would encourage those languages to just natively support it, giving Markdown even more adoption.

By the way, if you use Markdown for lots of stuff (leanpub, project documentation, etc) I really recommend seeking out and installing Markdownpad. It’s like Notepad, but uses Markdown with a preview that can be easily customized by anyone that knows simple CSS. Just associate it with the ‘.md’ file type and off you go. I think it prints nicely too, but I’m not sure, I haven’t owned a printer in over a decade.

Thegload · October 25, 2012, 12:00am

I’m in Jeff!
We run a large business to consumer site that is open to public and needs to be secure.

Markdow is perfect for injection prevention and most of all, repurposing of content (ie applying different style interpretations, text versions of content and publising to html, pdf or other formats).

I’ve done dozens of CMS projects and am totally over the quirks of html based editors, the security issues they can cause and the inability to repurpose the generated content.

We currently use markdown although it can be a little tricky to get clients into it. If we had time we’d be doing this ourselves, but I’d love to be in the process.

Looking forward to a unified Markdown standard with a good wysiwyg editor view.

Languagehacker · October 25, 2012, 12:00am

Hi Jeff,

Referring to “Wikipedia” as a lightweight markup language is both a misnomer and slightly inaccurate.

The correct terminology for the markup language that Wikipedia supports is wikitext. Wikipedia is one of numerous sites that use MediaWiki as a platform. MediaWiki is capable of parsing wikitext.

The crazy thing is that “lightweight” markup languages are only lightweight in terms of ease of use. Parsing SGML is context-free, whereas wikitext and markdown are both context-sensitive, and therefore more complex to write parsers for.

CodeC · October 25, 2012, 12:00am

I agree with you a lot Jeff, you can’t really avoid Markdown. Although I’m hesitant to agree that it needs standardization.

You say it’s a humane markup language, well if it’s meant to be human-friendly then that doesn’t exactly lend itself to standardization. Different people have different preferences. I’m no expert by any means, but that’s just my initial reaction.

For any beginners reading this, you might want to check out my introduction to Markdown that I wrote on my blog:

http://codeconquest.com/learn-markdown-youll-thank-yourself-later/

Ncrow · October 25, 2012, 12:00am

The owners of a spec aren’t its creators. It’s the users. You don’t need Gruber’s stamp any more than you need mine.

It’s easy to get confused. The creators of open source projects are often intimately involved with the interests of the users. But it’s not the case here. The active stakeholders seem aligned. That’s all you need. Is there any real dissent besides Gruber’s inactivity?

(Users include both sides of the fence - people implementing the translation and people writing in Markdown)

Psychoticmeow · October 25, 2012, 12:00am

I tried to love Markdown and Textile, but as a web developer it just doesn’t make sense to write articles with it. I’m often confused about the syntax for writing links and images, and if you want to write about HTML you’ve got to escape your HTML snippets.

For this reason I wrote my own text formatter, it’s basically HTML with a lighter syntax, and like so called “lightweight” markup languages paragraph tags are implied.

http://nbsp.io/development/doccy-a-mid-weight-markup-language

That article mentions Symphony, a CMS framework, but it’s not tied to that at all, the source code is at:

https://github.com/rowan-lewis/doccy

It’s also written in a more sophisticated way that your usual text formatter; by parsing the input and building the output using an XML DOM, you’re guaranteed that the output is sane.

jjnguy · October 25, 2012, 12:00am

People who are suggesting alternatives or saying that Markdown is too difficult are missing the point.

We ARE going to use Markdown. We like it.

I think it’s a great idea to standardize Markdown. My site uses it for our user submitted content and I think standardizing it is just what the doctor ordered.

I’d love to help, where do I sign up?

JohnM · October 25, 2012, 12:00am

A better spec is sorely needed. There have been calls for this for years on the markdown-discuss mailing list, but never any uptake. I wish you luck in persuading John Gruber, who has been resistant even to requests for informal clarifications of the spec. Here is his most recent contribution to the markdown-discuss list, after a long period of absence:

http://www.mail-archive.com/markdown-discuss@six.pairlist.net/msg02703.html

In any case, I am willing to help out. I have written three markdown implementations: pandoc (in Haskell, using parser combinators, http://johnmacfarlane.net/pandoc), peg-markdown (in C, using a PEG grammar, https://github.com/jgm/peg-markdown), and lunamark (in lua, using lpeg, https://github.com/jgm/lunamark). So I know quite a bit about parsing markdown, and particularly about what would have to be settled in a more determinate spec.

To start, here is a list of some big (not edge-case) questions that the current syntax description leaves open:

http://johnmacfarlane.net/babelmark2/faq.html#what-are-some-big-questions-that-the-markdown-spec-does-not-answer

The list includes links to a tool which will show you the output of a bunch of different implementations, so you can see how they differ. Further up in the FAQ you’ll find a longer list of divergences between various implementations (including lots of bugs).

Nick · October 25, 2012, 12:00am

@FyodorS:

Bless you! I was going to post that the world is in desperate need of a WYSIWYG Markdown editor, but I see you have that covered.

Thanks –

trent_mick · October 25, 2012, 12:00am

I’m in, with what I can.

I wrote https://github.com/trentm/python-markdown2
It is basically a straight port of Gruber’s Markdown.pl – i.e. it is regex based. Plus, like most processors, it adds a number of extras/extensions.

If this effort bears fruit, perhaps the most useful part of my implementation would be the test suite that I use: https://github.com/trentm/python-markdown2/tree/master/test/tm-cases

Some of those tests are specific to markdown2 (e.g. those tagged with “extra” in the .tags files). If helpful I could easily separate out those tests that are for core Markdown functionality to a separate repo.

The other “*-cases” dirs in https://github.com/trentm/python-markdown2/tree/master/test are copies of (likely old versions of) test suites from other Markdown projects.

Good luck,
Trent Mick (@trentmick, github.com/trentm)

Cory_R · October 25, 2012, 12:00am

I understand why you made the announcement, but part of me thinks you would have been better off to work with all of your partners in private to produce something quickly. People are already trying to tell you how this needs to be done (including me, I guess). The only thing worse than work produced by a committee is when the committee solicits input from the general public.

I do like that you are starting with Gruber’s basic description. If I were running this, I would limit the scope of the initial release to what Gruber describes as much as possible. The smallest possible spec with reference implementation should be all you need for v1.0.

Good luck. You’re probably going to piss off as many people as you please with what you eventually release.

AdamP · October 25, 2012, 12:00am

I feel dirty pimping my project, but I think it’s relevant enough and sets out one angle of my interest in Markdown: I created a Chrome/Firefox/Thunderbird extension called Markdown Here (MDH) that lets you write your email in Markdown and then render it before sending. Check it out: https://github.com/adam-p/markdown-here – I wrote it because I wanted it to exist, and it’s actually pretty sweet.

I’m very ambivalent about the prospect of a new Markdown spec.

In favour of a new spec:

Experience with one MD dialect is irritatingly non-transferable.
- I work on projects on both Github and Bitbucket. I'm pretty comfortable with GFM at this point, but I struggle to figure out Bitbucket's dialect (seems to have undocumented backtick-fences? but not syntax name? unlike GFM there's no clear description of it?).
Even dialects have dialects.
- In MDH I use the JS renderer Marked (https://github.com/chjj/marked) -- specifically in GFM mode (it was the best GFM-supporting JS lib I could find). But even it doesn't exactly implement GFM's mods (line breaks, tables).

Concern about a new spec:

I have trouble believing that any single spec can encompass enough of the MD extensions to "win". And I think winning is probably necessary, or else the "n+1 specs" objection is compelling.
Maybe some kind of extensibility can be built into the spec? Cool, but complex. (Standardized flags indicating what table format to use? Editor symbols to quickly show users what table format is available?)
- Would such extensibility really gain us enough/anything over what we have now? (I don't mean to be glib with that question. I think it might gain us a lot.)
- Tangent: If MD starts getting used a lot for extracted code docs, there's going to be a push for Javadoc-ish extensions.

With the help of a user I, uh, added TeX math formula support to MDH. Like so: “$-b \pm \sqrt{b^2 - 4ac} \over 2a$”. So I’ve done my part to dirty the dialect waters. I don’t feel good about this.

I think that one of the issues is that there’s a tension between “MD as primarily markup” and “MD as primarily plaintext”. Two examples of this, from GFM:

Fenced code blocks. Indented blocks of code look pretty nice when reading plaintext. Fenced code doesn't look as nice, and even less so when you specify the language. But writing/pasting 4-space indented code is more of a hassle, and it's not clear how to specify the language.
GFM line breaks. Gruber's original spec basically discarded single linebreaks -- this allowed MD writers to maintain 80-char lines without breaking flow when rendering. In contrast, GFM interprets a single linebreak as a <br>. I don't like this GFM change, but I also don't like the original spec's "two spaces at the end of the line to get a <br>".

Those examples might seem pretty minor, but the more we extend MD, the less plaintext-readable it becomes. Probably. (I don’t mean for that statement to be shrill. I love many of the extensions. And maybe it’s okay that extensions get somewhat less plaintext-readable if the base stays clean. Artificially constraining MD’s growth sure won’t work, anyway.)

KeithT · October 25, 2012, 12:00am

Ile: I think the reason this is shown in italics rather than underlined is that underlined text looks like a hyperlink. Underlining has been suggested and rejected on Stack Overflow.