The Sad Tragedy of Micro-Optimization Theater

codinghorror · January 29, 2009, 12:00am

I'll just come right out and say it: I love strings. As far as I'm concerned, there isn't a problem that I can't solve with a string and perhaps a regular expression or two. But maybe that's just my lack of math skills talking.

This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2009/01/the-sad-tragedy-of-micro-optimization-theater.html

Kenneth · January 30, 2009, 12:00am

Meatballs…groovy…classic early Bill Murray

CW11 · January 30, 2009, 12:00am

My comment about compiler optimization is that there is no automatic compiler conversion from string concatenation to using StringBuilder as some suspected was the reason for the differences being negligible. For the simple example I gave, I also would not expect the compiler to unroll the 100000 iteration for-loop to generate a single optimized string.

DennisF · January 30, 2009, 12:00am

CW,

I apologize, CW, I misunderstood.

D_Lamblin · January 30, 2009, 12:00am

I expected version 1 and 3 to perform identically; but it seems version 1 has newlines and version 3 doesn’t so there’s a 2 to 4 byte difference there, could that have affected the timing a tad.
Otherwise I accept the overall conclusion, and though I love regular expressions I sort of thing version 1 is more readable (though regexp may be more flexible to maintain through a config file or something).

Justin · January 30, 2009, 12:00am

Yes, you should avoid the obvious beginner mistakes of string concatenation, the stuff every programmer learns their first year on the job. But after that, you should be more worried about the maintainability and readability of your code than its performance. And that is perhaps the most tragic thing about letting yourself get sucked into micro-optimization theater – it distracts you from your real goal: writing better code.

Define better?? In the one vein of web-apps using strings, you can say that maintainability and readability are more important than performance, but sometimes (a lot of the time) that’s just dead wrong.

You’re drawing a broad generalization from a very narrow viewpoint. Its simply not true. Would you put readability ahead of performance when you’re trying to do a real time system? Or a computer vision system? Sometimes performance does matter, and your blanket statement completely ignores these kinds of scenarios.

codinghorror · January 30, 2009, 12:00am

Build a whole page with +=, Jeff, and get back to us.

This is my entire point-- nobody builds entire webpages with naive string concatenation. But they sure do build webpages made of lots of page fragments which can, in fact, be built with naive string concatenation with no ill effects whatsoever.

But what if you’re doing nothing but small bits of string concatenation, dozens to hundreds of times – as in most web apps? Then you might develop a nagging doubt, as I did, that lots of little Shlemiels could possibly be as bad as one giant Shlemiel.

Is everything OK, Dennis? You come across as angry and belligerent, and I don’t think that’s your intent.

codinghorror · January 30, 2009, 12:00am

Programmers from C/C++ backgrounds would concatenate over and over again to the same variable reference assuming that it would work like strcat/strncat, which actually DID operate on a mutable buffer

note that even in C, it’s still pretty easy to encounter this problem by doing this the wrong way with strcat

http://www.joelonsoftware.com/articles/fog0000000319.html

Lodle · January 30, 2009, 12:00am

Im not a c# person (c++ for me) but cant you do this in c# (you can in c/c++) and is it faster? (again taking in to fact what the article is about, and you shouldn’t really be worrying about this in the first place.)

string s =
@div class=user-action-time{0}{0}/div
div class=user-gravatar32{0}/div
div class=user-details{0}br/{0}/div;
return String.Format(s, st());

DennisF · January 30, 2009, 12:00am

Is everything OK, Dennis? You come across as angry and belligerent, and I don’t think that’s your intent.

Gosh, I guess you must be right.

Come on, Jeff. The truth is that I am angry at some of these posts. You seem to be on some sort of anti-efficiency kick, and it is an offensive message. It offends me as a professional in this field. Platforms like .NET and Java bring us a lot of goodness, and when used appropriately you can achieve very impressive efficiencies of both development and runtime. You seem to be absolutely set on undermining those efficiencies though, in ways that are often truly devastating to performance.

I still have some nagging suspicion that one day you’re going to post HA HA! The last 6 months of posts were all from opposite land. Those of you who agreed with any of them please hit the books. Or maybe you’re just trying to head off stack overflow competition: Go ahead, don’t dispose those connections! Premature optimization! Concatenate those strings! Don’t worry, you can always just buy more hardware.

Dale_Harvey · January 30, 2009, 12:00am

people DO build large strings with in loops,thats the entire point of advice of avoiding them in loops, it isnt to avoid the 0.0001ms cost in concatenation vs stringbuilder of a 20character string, its to avoid the exponential slowdown when appending to the result of each iteration

str = div
for(i = 0; i 1000000; i++)
str+=somedatainhere;

I mean, Steve H just posted it, its a 400% difference, thats a pretty huge speedup and the reason why noone build large strings with concatenation

charles8 · January 30, 2009, 12:00am

@Dennis Forbes
Memory is freeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee

I have 3 words for you performance nuts. Allocate. Allocate. Allocate. Who cares! If you’re using C then STOP. But if you have to … DON’T CALL FREE. It’s crazy. When you exit the app the memory returns anyway … Premature idiots!

All that matters is that code is readable but… don’t use comments because supposedly the code IS the best documentation you need, since you are a programmer … but wait …

If you’re a programmer, give up, be a scripter. It’s more practical!

Algorithms are for nutters! Who even knows what an algorithm is anyway? That’s craziness! Be a software developer.

Steve · January 30, 2009, 12:00am

We’re constantly building them, merging them, processing them, or dumping them out to a HTTP stream.

Which is why I use Perl for Web development.

dean_nolan2 · January 30, 2009, 12:00am

I didn’t expect there to be much difference but did think StringBuilder would be the winner.

Your right about avoiding micro optimization.

You should avoid basic mistakes like what Jeff said and make your code readable!

Christopher · January 30, 2009, 12:00am

Just out of curiosity, if you increase the size of the string to be formed, does the time grow linearly for each method you used?

Niyaz_PK · January 30, 2009, 12:00am

The problem with micro-optimizations are almost always that people try to micro-optimize the wrong stuff.
There are always places to optimize. Right?

Use this simple formula:

Optimizability ( I mean desirability of optimization(whatever))= Price saved by optimizing / Price for optimizing

So generally speaking micro-optimization theatre is bad while micro-optimization can help you very much. Again, it all depends on the application. For example, when I was developing a chess program, I was trying to optimize it every single bit. You otimize small operations in the move generation or move validation functions and you may be able to squeeze out an extra ply or two from your program. The same with high traffic web applications like SO. You better know where to optimize first.

Another thing: I know nothing about softball other that the wild guess that it must be a really soft ball.

DavidA · January 30, 2009, 12:00am

Personally, I would use the technique that is most maintainable. By that I mean, which one is easiest to make a small change to (add another part to be concatenated or inserted, or remove an existing part).

This is closely related, but not the same, as the most readable. So I’d pick Format, slightly over Concat or StringBuilder, for that reason. It’s easy to maintain and also very readable as the code looks the closest to the final output.

Catto · January 30, 2009, 12:00am

Hey Now Jeff,

It just doesn’t matter!

Coding Horror Fan,
Catto

Daniel · January 30, 2009, 12:00am

Every decent programmer knows that string concatenation, while fine in small doses, is deadly poison in loops.

You mean, every decent programmer working with immutable strings on garbage-collected, memory-managed languages?

I never had a problem with it in C. I allocated the estimated space (and checked for overflow if the estimate wasn’t guaranteed to be correct), kept a pointer to the null-terminator, and just copied.

Zsolt · January 30, 2009, 12:00am

You might want to check the actual compiled bytecode to see what you’re actually measuring. The Java compiler, at least, is smart enough to figure out you’re doing a bunch of concatenations and automatically optimizes it to use StringBuilder - I’m pretty sure the C# compiler is smart in this respect too. This could be the reason you got the results above.