Parsing: Beyond Regex

I've blogged ad nauseam about how much I love Regular Expressions, but even the mighty regular expression has limits. As noted in Daniel Cazzulini's blog:

This is a companion discussion topic for the original blog entry at:

Full scale parsing is not cheap, but I admit I have not tested it to see how expensive it really is.
There is no VB.NET parser for the moment :slight_smile: and yes, I did know this was coming :wink: On the other hand, I was “challenged” by few guys in the comment section of the CodeProject article to write more about how to write a parser for a language, so VB.NET might be a good language to describe basics :wink: Actually, considering that attributed grammar for C# could be to a certain extent reused for VB.NET, in the end it might be relatively easy to generate a parser - all you need to do is to express EBNF syntax of VB.NET in Coco-R, reuse some of hand-written code and Coco-R will generate parser for you. Then you need to add the same code I added for C# parser (colorizing and formatting) but that’s the easy part - all of the heavy lifting was already done by Coco-R guys…

Thanks for the kind words.

Drazen, can you comment on the performance profile of this technique, and how you’re using it? Any FAQs or other stuff we should know?

And-- you knew this was coming-- is there a VB.NET parser?


HTML is not a regular language. (See: for information) Using regular expressions to ‘parse’ it is a really bad idea. for even more information

Just use a parser

It’s amazing how deep the complexity goes from physical CPU to assembler to regex-es to full-blown compilers. This article states that the regex is not solving all the problems, and the thing is that even to parse a regex itself you need some smarter logic. If anyone is interested in regex inner working check this small regex parser concept.

1 Like