Java vs. .NET RegEx performance

Carlos made a huge mistake in his article. He added RegexOptions.Compiled. This causes IL to be emitted at runtime, causing a huge slowdown. Take this out and .NET wins easily.

Hey, would somebody save me a lot of reading time and summarize the results?
thanks.

I know this is an old bllog, but you don’t tell us whether you were using -server or -client JVM

Run the test with -server flag, which is much faster than client.

Athlon 64 3800+ (2.4ghz)
.NET 170ms

P4 3.2 (3.2ghz), w/hyperthreading
.NET 185ms

Athlon XP 2400+ (1.7 ghz), dual proc
.NET 250ms

Pentium M (centrino) 1.2ghz
.NET 320ms

Four years later:

Core 2 Duo E8500 @ 3.5 Ghz
.NET 100 ms

Well, your four years later numbers are missing java result? Also, were you using -client or -server JVM?

Also, 100 ms is very short time. Java (especially server JVM) has a slower startup time. The benchmark should be at least 5 sec or larger.

I don’t get the same result. Java regex is twice faster for me.

http://kingrazi.blogspot.com/2008/05/shootout-c-net-vs-java-benchmarks.html

Razi,
You said in your own post the code was not properly ported, benchmarks need to be based off of relatively identical source code.

Ak,
All of the measurements for this article are self generated by the code. So start up time is not a factor. And as Jeff said, he tried orders of magnitude higher and they scaled as expected.

On the post before that: I think rather then showing the performance between Java vs .NET, he was showing the performance of the CPU, specifically 10% more Frequency speed but nearly 100% better performance compared to the P4.

I know this is a bit late, but since this page appears in the Google search results, I’d like to add a comment.

From what I can see, the code is not directly comparable. I did the following to make things more equal

  • Because the .Net version does not have a matcher object, I assume that it benefits from reusing the same matcher. To correct this, I changed the Java code to also reuse the Matcher object, instead of initialising it inside the loop.

  • I also moved the timing outside the loop, so that it only counts the total time.

  • I assume that you ran the Java code with the -server parameter, because that will improve the speed lots.

So K, the stuff you say you did makes a lot of sense. From what I read, I don’t believe Jeff’s tests could be expected to yield meaningful results, and most of the other posts above did more to muddle than to clarify the issue

But now, before the suspense kills us all, what kinda results did you get? And are you able to compare to .NET on the same box?

I’ve just run the test you posted up Jeff. I only did the total time and here are the results for 1,000,000 iterations:

.NET 14.126 secs
Java 20.422 secs

My specs are:

1.86 Ghz with 2Gb RAM

(I’m a Java and .NET programmer so I’d like to thing I’m not biased!)