I think a standardized grammar and (executable) specs would be a great thing. Having everyone rally around the common goal, bringing consistency to Markdown processing and representation across web/desktop/mobile applications would be really nice.
The challenge is that Markdown can be used in a number of contexts. Perhaps a small safe subset for blog comments, vs. writing a book with something like Leanpub, where they support a variety of Kramdown extensions.
Given the variety of extensions, the scenario sounds a lot like trying to become the W3C of Markdown implementations. A big effort, but I suspect a really good outcome. Thanks for taking this on.
“automatic return-based linebreaks (on)” – Please, no. Markdown already includes syntax for lists and code blocks. There are very few other occasions where you need a hard line break. Currently markdown works both for people who like to hard-wrap their text and for people who don’t. It’s best to keep it that way. The proposed change would radically change how most existing markdown documents are rendered. And why? Because some new users are surprised that hard breaks are treated as spaces? The same users are surprised when indented paragraphs are treated as code. No matter what the rules are, some users will be surprised by them.
Saying it’s a choice with a default doesn’t help. That just fragments markdown into many variants, so that you can’t be sure that markdown that works fine on one site will render the same on another. It would be best to reduce fragmentation rather than fostering it.
This is exactly what I wanted to do few months ago. Sadly, I was all alone and my coding skills weren’t so good that I can write Markdown parser even w/o thinking about it.
So here’s what I would like to see:
- Solid documentation (how that all works, edge-cases),
- Code that makes cry tears of joy (very easy to read, later-on - to extend or port),
- Default behaviour must be XSS safe (not as it is in PHP port of Markdown)!
P.S. In what language it will be implemeneted?
reStructuredText has already been mentioned. It has a single mature definition.
Personally, I had no opinion on which was better, till I needed to mark up a poem in Markdown. How does Markdown represent text with
(like this)? Why, with invisible white space! That was a poor design decision.
Unfortunately, I think that reStructuredText is set to be the Betamax of lightweight markup languages to Markdown’s VHS: technically superior, but eventually eclipsed.
The one feature I truly miss in some Markdown implementations is the ability to specify code block language. For example on Github, I can do:
Oops, can’t edit comment. Just wanted to add that Posterous also had that feature but the syntax was different.
I really wish there was only one way to do it all over the place.
after reading on of the first comments I digress that I wish /this/ was italic rather than this but I know that won’t come out of this
“Bless you! I was going to post that the world is in desperate need of a WYSIWYG Markdown editor, but I see you have that covered.”
Tim Post already mentioned it, but I want to call it out: http://www.markdownpad.com/
I use it virtually every day. Actively developed too: it auto-updates itself quite regularly.
I never knew what I was using on reddit until I read this article. I’m definitely a fan and hope to see it in some of my favourite technologies like Meteor and StackOverflow.
Just to continue what @DaGrevis said… I would love for the default implementation to NOT accept HTML by default, but for whatever tags being used to render in plain text. This is because in most cases, you don’t want a random user typing in a [script] tag or anything like [a onclick=“evil code”]
And then there’s Reddit flavor. You forgot about the most important one!
I personally tried to implement PegJS-based parser for Markdown (see links below). However, PegJS-generated result looks totally huge, about 10+MB, partly because my version of parser is for sure not finished, not perfect and there are a lot of ways to optimize, but partly because PegJS by David Majda renders rules in a plain way - no operator-functions or something like that. The last fact affects speed in a good way, but it also affects parser size in a very bad way. So, while I haven’t finished Markdown parser, about a year ago I’ve started to tune up (refactor) PegJS implementation to have a function for each operator and to improve scoping and stuff. This parser-generator may in result parse a bit slower than original, but will weight much-much less number of bytes: it is totally in minimalistic style, not-used operators are excluded, and so on. But I am still in this, writing a code in small portions, still seeing a good end, but still in progress.
So, here are two facts I have for now:
- It is hard to represent some complex rules of Markdown in PEG, like blockquote-in-list-followed-by-block-of-code, but is achievable, since there is a googd enough implementation in C++ by Ali Rantakari (however, it also fails in some complex variations).
- Current version of PegJS is not a very good match for it, at least for now (or may be I am very wrong in a way I am impementing a parser). It will be almost impossible to include the parser in mobile applications and so on.
My version of PegJS parser for Markdown, in progress: https://github.com/shamansir/mdown-parse-pegjs
My customized version of PegJS, inteded to produce very compact parsers using the powers of functional code, in progress: https://github.com/shamansir/pegjs
C++/PEG GUI-oriented implementation of Markdown parser: http://hasseg.org/peg-markdown-highlight/
Useful links on parsing Markdown: https://github.com/shamansir/mdown-parse-pegjs#sources
MDTest to test your Markdown parse on compatibility with spec: http://git.michelf.com/mdtest/
A thoughts regarding parsing Markdown in general and its improvements.
Markdown became a geeky-language, easier version of LaTeX, reduced in functionality in favor of speed of writing. However, geeks term is not equal to mean programmers only, but also it’s about designers and even literature authors. As a result, language spec may not to require all this programming-language-marks & s.o. in plain version, it should may be detect language by itself using similar-to-SO approach. Or it should not have special syntax for it, but use HTML-comments to mark a language, since they are supported everywhere (I know that there is a lack of copying code with 4 spaces before, but I think it is easily-resolvable in any modern code editor and it breaks markdown-compatibility if doing it other way than John recommended at start).
And I agree, there is a huge “want” to include tasty features in Markdown, but there should be a very strict selection of such features, because it is very hard to make all of them still look lovely, so someone (like John) and only him should say “I said so and it’ll be”. Or, including features should be based on votes. Or, even better, the plugin-like system may save us all, if there will be a central plugin repository (say, “Markdown Flavors”), with parsers/PEG for every programming language, and it will be as easy to include one as including script tag or head-file in your document. And, of course, there should be a central distribution site, where all tests will run every second for a main implementation itself and CDN for every parser/plugin and so on…
BTW, Mou is the best Markdown-editor for Mac OS for me, it parses almost all of the list-in-list-in-blockquote problems)
This sounds great and I’m really glad to see Pandoc is already in this discussion. I’m working in educational technology and also with researchers in academia and the Pandoc is really something I’ve been recommend to all researchers frustrated with Word and other WYSIWYG word processors. Having tables and footnotes makes it ideal markup for researchers.
I’d just like to point out that this probably isn’t the best idea…
IF you can get Markdown’s parents to agree and mention you in their page, you have a chance. Otherwise, as others have pointed out, you’re just the 15th standard where only 14 existed before.
Also, if you do try to do this thing, I’d advise making several Markdown “profiles”. The “basic” profile would cover the current Markdown without any additions (as you’ve said), while the “extended” profile will add all the new features and bells and whistles. I’m in the camp that thinks that the basic Markdown is too limiting and needs more features (like tables, colors, fonts, etc)
There are many different use-cases for a language like Markdown, and trying to make a “one size fits all” solution rarely works.
fully supportive of this idea. Taking it a tiny bit further, I think this is ideal for the Community group of W3C. Couple of other sponsors (not required to be W3C members) and the work can be done jointly there with the potential of taking it further if needed when the spec is done.
http://www.w3.org/community/groups/proposed/#markdown if you would like to see it developed this way.
I think this is a really important step. What has most concerned me about the markdown clones I have seen springing up is the feature creep. So I have been in favor of a spec to lock down (clarify) the essential features of markdown for a while, as well as a road-map or guidance on how extension should be developed to play nicely with the core features.
I would love to help.