To Compile or Not To Compile

I am currently in the middle of a way-overdue refactoring of MhtBuilder, which uses regular expressions extensively. I noticed that I had sort of mindlessly added RegexOptions.Compiled all over the place. It says "compiled" so it must be faster, right? Well, like so many other things, that depends:


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2005/03/to-compile-or-not-to-compile.html

I have seen a factor of 3 improvement in performance for compiled regular expressions over uncompiled ones. I think the performance is greatest when the text you are matching against is very long compared to the pattern.

Well, 3x is definitely the kind of performance increase that would seriously tip the scales in favor of compilation! As always, measurement is critical.

For this particular app, I doubt it matters; I may be running regex against 100kb of HTML, but that’s utterly dwarfed by the time it will take me to HTTP GET all the files referenced in that HTML (to build the MHT), so it’s negligible in terms of overall runtime. If I was writing a straight parser or code colorizer, I’d look harder at compilation.

I think you pretty much got it right. .Compile reminds me much of Perl’s Study(). In an intepreted language, that might be very help… in a compiled one, I’m not so sure.

In my own perf testing of an app that uses regex extensivly, i got a 2X improvement in performance by compiling to assembly. If your only calling the regex a few times, than its problably not worth it. But if you are processing a LOT of data with regex, than it probably is. In my case this change took 30 seconds off a 2 minute process.

Correct me if I’m wrong. Is having compiled Regexs in static fields a little optimization?

Thanks alot for this information…
I was trying to parse 4 gig text files as streams a meg at a time and got a 6x improvement…but initially didn’t want to even try a compiled regex without some kind of concrete data about it performing better compiled or not.

Note that the caching behavior has changed from .NET V1.1 to .NET 2.0. In V2.0, not all regex expressions are cached.
See this excellent blog entry for more details:
http://blogs.msdn.com/bclteam/archive/2006/10/19/regex-class-caching-changes-between-net-framework-1-1-and-net-framework-2-0-josh-free.aspx

Just to add my (small) experience in regards to speed of compiled regex, I found that if you have large source strings, it’s the only way to go.

By large, I mean anything starting from a few hundreds Kbs of text.

I am doing a file preparation app and had a regex hanging up. Worked with small files, but on larger files, it looked like it wasn’t doing anything for 10 minutes or so.

Before tossing the code in the garbage (I need to handle large files, so that was a real deal breaker), I used regexoptions.compiled.

Using the same files, file processing was pretty much instant.

So in my particular case, it really made a difference.

Well, this post is old news, but I just wanted to add my two cents. The process of creating compiled regular expressions is painstaking at best. If you have the need to compile your regular expressions to an assembly that is, say, CLS compliant, strong named, etc, you might like to take a look at this tool that I put together. It makes the management of compilation a snap.

http://blog.devstone.com/aaron/archive/2006/08/22/1766.aspx

http://blog.devstone.com/aaron/archive/2006/08/18/1762.aspx

It’s already saved me a ton of time. I have full source and compiled EXE for download on the site. There are also more features forthcoming.

Probably best to use the later version here - the download link works:
http://blog.devstone.com/aaron/2008/12/23/NETRegularExpressAssemblyBuilderToolV2003.aspx