Parsing: Beyond Regex

I've blogged ad nauseam about how much I love Regular Expressions, but even the mighty regular expression has limits. As noted in Daniel Cazzulini's blog:


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2005/04/parsing-beyond-regex.html

Full scale parsing is not cheap, but I admit I have not tested it to see how expensive it really is.
There is no VB.NET parser for the moment :slight_smile: and yes, I did know this was coming :wink: On the other hand, I was “challenged” by few guys in the comment section of the CodeProject article to write more about how to write a parser for a language, so VB.NET might be a good language to describe basics :wink: Actually, considering that attributed grammar for C# could be to a certain extent reused for VB.NET, in the end it might be relatively easy to generate a parser - all you need to do is to express EBNF syntax of VB.NET in Coco-R, reuse some of hand-written code and Coco-R will generate parser for you. Then you need to add the same code I added for C# parser (colorizing and formatting) but that’s the easy part - all of the heavy lifting was already done by Coco-R guys…

Thanks for the kind words.

Drazen, can you comment on the performance profile of this technique, and how you’re using it? Any FAQs or other stuff we should know?

And-- you knew this was coming-- is there a VB.NET parser?

COCO/R for VB.NET

http://www.ssw.uni-linz.ac.at/coco/

HTML is not a regular language. (See: http://en.wikipedia.org/wiki/Regular_language for information) Using regular expressions to ‘parse’ it is a really bad idea.

http://htmlparsing.icenine.ca/ for even more information

Just use a parser

It’s amazing how deep the complexity goes from physical CPU to assembler to regex-es to full-blown compilers. This article states that the regex is not solving all the problems, and the thing is that even to parse a regex itself you need some smarter logic. If anyone is interested in regex inner working check this small regex parser concept.

1 Like