The Future of Markdown

Hi Jeff,

Referring to “Wikipedia” as a lightweight markup language is both a misnomer and slightly inaccurate.

The correct terminology for the markup language that Wikipedia supports is wikitext. Wikipedia is one of numerous sites that use MediaWiki as a platform. MediaWiki is capable of parsing wikitext.

The crazy thing is that “lightweight” markup languages are only lightweight in terms of ease of use. Parsing SGML is context-free, whereas wikitext and markdown are both context-sensitive, and therefore more complex to write parsers for.

I agree with you a lot Jeff, you can’t really avoid Markdown. Although I’m hesitant to agree that it needs standardization.

You say it’s a humane markup language, well if it’s meant to be human-friendly then that doesn’t exactly lend itself to standardization. Different people have different preferences. I’m no expert by any means, but that’s just my initial reaction.

For any beginners reading this, you might want to check out my introduction to Markdown that I wrote on my blog:

http://codeconquest.com/learn-markdown-youll-thank-yourself-later/

The owners of a spec aren’t its creators. It’s the users. You don’t need Gruber’s stamp any more than you need mine.

It’s easy to get confused. The creators of open source projects are often intimately involved with the interests of the users. But it’s not the case here. The active stakeholders seem aligned. That’s all you need. Is there any real dissent besides Gruber’s inactivity?

(Users include both sides of the fence - people implementing the translation and people writing in Markdown)

I tried to love Markdown and Textile, but as a web developer it just doesn’t make sense to write articles with it. I’m often confused about the syntax for writing links and images, and if you want to write about HTML you’ve got to escape your HTML snippets.

For this reason I wrote my own text formatter, it’s basically HTML with a lighter syntax, and like so called “lightweight” markup languages paragraph tags are implied.

http://nbsp.io/development/doccy-a-mid-weight-markup-language

That article mentions Symphony, a CMS framework, but it’s not tied to that at all, the source code is at:

https://github.com/rowan-lewis/doccy

It’s also written in a more sophisticated way that your usual text formatter; by parsing the input and building the output using an XML DOM, you’re guaranteed that the output is sane.

People who are suggesting alternatives or saying that Markdown is too difficult are missing the point.

We ARE going to use Markdown. We like it.

I think it’s a great idea to standardize Markdown. My site uses it for our user submitted content and I think standardizing it is just what the doctor ordered.

I’d love to help, where do I sign up?

A better spec is sorely needed. There have been calls for this for years on the markdown-discuss mailing list, but never any uptake. I wish you luck in persuading John Gruber, who has been resistant even to requests for informal clarifications of the spec. Here is his most recent contribution to the markdown-discuss list, after a long period of absence:

http://www.mail-archive.com/markdown-discuss@six.pairlist.net/msg02703.html

In any case, I am willing to help out. I have written three markdown implementations: pandoc (in Haskell, using parser combinators, http://johnmacfarlane.net/pandoc), peg-markdown (in C, using a PEG grammar, https://github.com/jgm/peg-markdown), and lunamark (in lua, using lpeg, https://github.com/jgm/lunamark). So I know quite a bit about parsing markdown, and particularly about what would have to be settled in a more determinate spec.

To start, here is a list of some big (not edge-case) questions that the current syntax description leaves open:

http://johnmacfarlane.net/babelmark2/faq.html#what-are-some-big-questions-that-the-markdown-spec-does-not-answer

The list includes links to a tool which will show you the output of a bunch of different implementations, so you can see how they differ. Further up in the FAQ you’ll find a longer list of divergences between various implementations (including lots of bugs).

@FyodorS:

Bless you! I was going to post that the world is in desperate need of a WYSIWYG Markdown editor, but I see you have that covered.

Thanks –

I’m in, with what I can.

I wrote https://github.com/trentm/python-markdown2
It is basically a straight port of Gruber’s Markdown.pl – i.e. it is regex based. Plus, like most processors, it adds a number of extras/extensions.

If this effort bears fruit, perhaps the most useful part of my implementation would be the test suite that I use: https://github.com/trentm/python-markdown2/tree/master/test/tm-cases

Some of those tests are specific to markdown2 (e.g. those tagged with “extra” in the .tags files). If helpful I could easily separate out those tests that are for core Markdown functionality to a separate repo.

The other “*-cases” dirs in https://github.com/trentm/python-markdown2/tree/master/test are copies of (likely old versions of) test suites from other Markdown projects.

Good luck,
Trent Mick (@trentmick, github.com/trentm)

I understand why you made the announcement, but part of me thinks you would have been better off to work with all of your partners in private to produce something quickly. People are already trying to tell you how this needs to be done (including me, I guess). The only thing worse than work produced by a committee is when the committee solicits input from the general public.

I do like that you are starting with Gruber’s basic description. If I were running this, I would limit the scope of the initial release to what Gruber describes as much as possible. The smallest possible spec with reference implementation should be all you need for v1.0.

Good luck. You’re probably going to piss off as many people as you please with what you eventually release.

I feel dirty pimping my project, but I think it’s relevant enough and sets out one angle of my interest in Markdown: I created a Chrome/Firefox/Thunderbird extension called Markdown Here (MDH) that lets you write your email in Markdown and then render it before sending. Check it out: https://github.com/adam-p/markdown-here – I wrote it because I wanted it to exist, and it’s actually pretty sweet.

I’m very ambivalent about the prospect of a new Markdown spec.

In favour of a new spec:

  • Experience with one MD dialect is irritatingly non-transferable.
    • I work on projects on both Github and Bitbucket. I'm pretty comfortable with GFM at this point, but I struggle to figure out Bitbucket's dialect (seems to have undocumented backtick-fences? but not syntax name? unlike GFM there's no clear description of it?).
  • Even dialects have dialects.
    • In MDH I use the JS renderer Marked (https://github.com/chjj/marked) -- specifically in GFM mode (it was the best GFM-supporting JS lib I could find). But even it doesn't exactly implement GFM's mods (line breaks, tables).

Concern about a new spec:

  • I have trouble believing that any single spec can encompass enough of the MD extensions to "win". And I think winning is probably necessary, or else the "n+1 specs" objection is compelling.
  • Maybe some kind of extensibility can be built into the spec? Cool, but complex. (Standardized flags indicating what table format to use? Editor symbols to quickly show users what table format is available?)
    • Would such extensibility really gain us enough/anything over what we have now? (I don't mean to be glib with that question. I think it might gain us a lot.)
    • Tangent: If MD starts getting used a lot for extracted code docs, there's going to be a push for Javadoc-ish extensions.

With the help of a user I, uh, added TeX math formula support to MDH. Like so: “$-b \pm \sqrt{b^2 - 4ac} \over 2a$”. So I’ve done my part to dirty the dialect waters. I don’t feel good about this.

I think that one of the issues is that there’s a tension between “MD as primarily markup” and “MD as primarily plaintext”. Two examples of this, from GFM:

  1. Fenced code blocks. Indented blocks of code look pretty nice when reading plaintext. Fenced code doesn't look as nice, and even less so when you specify the language. But writing/pasting 4-space indented code is more of a hassle, and it's not clear how to specify the language.
  2. GFM line breaks. Gruber's original spec basically discarded single linebreaks -- this allowed MD writers to maintain 80-char lines without breaking flow when rendering. In contrast, GFM interprets a single linebreak as a <br>. I don't like this GFM change, but I also don't like the original spec's "two spaces at the end of the line to get a <br>".

Those examples might seem pretty minor, but the more we extend MD, the less plaintext-readable it becomes. Probably. (I don’t mean for that statement to be shrill. I love many of the extensions. And maybe it’s okay that extensions get somewhat less plaintext-readable if the base stays clean. Artificially constraining MD’s growth sure won’t work, anyway.)

Ile: I think the reason this is shown in italics rather than underlined is that underlined text looks like a hyperlink. Underlining has been suggested and rejected on Stack Overflow.

Hey Jeff,

I think a standardized grammar and (executable) specs would be a great thing. Having everyone rally around the common goal, bringing consistency to Markdown processing and representation across web/desktop/mobile applications would be really nice.

The challenge is that Markdown can be used in a number of contexts. Perhaps a small safe subset for blog comments, vs. writing a book with something like Leanpub, where they support a variety of Kramdown extensions.

Given the variety of extensions, the scenario sounds a lot like trying to become the W3C of Markdown implementations. A big effort, but I suspect a really good outcome. Thanks for taking this on. :slight_smile:

Nathan.

“automatic return-based linebreaks (on)” – Please, no. Markdown already includes syntax for lists and code blocks. There are very few other occasions where you need a hard line break. Currently markdown works both for people who like to hard-wrap their text and for people who don’t. It’s best to keep it that way. The proposed change would radically change how most existing markdown documents are rendered. And why? Because some new users are surprised that hard breaks are treated as spaces? The same users are surprised when indented paragraphs are treated as code. No matter what the rules are, some users will be surprised by them.

Saying it’s a choice with a default doesn’t help. That just fragments markdown into many variants, so that you can’t be sure that markdown that works fine on one site will render the same on another. It would be best to reduce fragmentation rather than fostering it.

This is exactly what I wanted to do few months ago. Sadly, I was all alone and my coding skills weren’t so good that I can write Markdown parser even w/o thinking about it. :slight_smile:

So here’s what I would like to see:

  • Solid documentation (how that all works, edge-cases),
  • Code that makes cry tears of joy (very easy to read, later-on - to extend or port),
  • Default behaviour must be XSS safe (not as it is in PHP port of Markdown)!

P.S. In what language it will be implemeneted?

reStructuredText has already been mentioned. It has a single mature definition.

Personally, I had no opinion on which was better, till I needed to mark up a poem in Markdown. How does Markdown represent text with
explicit
line breaks
(like this)? Why, with invisible white space! That was a poor design decision.

Unfortunately, I think that reStructuredText is set to be the Betamax of lightweight markup languages to Markdown’s VHS: technically superior, but eventually eclipsed.

David Jeter is overrated (http://www.youtube.com/watch?v=EbRo_anmIDc)

You have my axe!

The one feature I truly miss in some Markdown implementations is the ability to specify code block language. For example on Github, I can do:

alert('woohoo');

And the code block will be syntax highlighted for Javascript.

Oops, can’t edit comment. Just wanted to add that Posterous also had that feature but the syntax was different.

#!javascript
alert('woohoo');

I really wish there was only one way to do it all over the place.

after reading on of the first comments I digress that I wish /this/ was italic rather than this but I know that won’t come out of this