The Sad Tragedy of Micro-Optimization Theater

Jeff,

I have to echo what others have said, check the byte code. The string.Concat test and simple concatenation look suspiciously close. It would not surprise me to find that when the compiler sees a lot of concatenation in a single statement, it replaces it with a call to string.Concat. The other question is, how does simple concatenation fare if it is broken out over multiple statements? I suspect that you might see a true optimization using any of the tested methods over that.

Never underestimate the ability of geeks in large groups to miss the freakin point. :slight_smile:

~L

Good to know, since I usually just use printf

Oh wait, that was 10+ years ago, now I’m all about StringBuilder and Appending string.Format() - my intuitive guess for best trade off between readability and performance.

Of course I don’t use any of these when building web pages, unless you are doing MVC, I’m not sure it would come up that often…

Interesting post.

(that wasn’t a Meatballs reference was it? (It just doesn’t matter…because all the really good looking girls would still go out with the guys from Mohawk)

The thing that struck me most about the code samples is that you’re building up HTML in C# strings. Surely your MVC framework provides a templating language which would be a far more readable way to generate HTML?

Perhaps this is just a toy example that you used to illustrate the problem of micro-optimization and is not how you actually generate the HTML on stackoverflow?

This post is utter crap. Of course it is almost as fast since there is no difference in speed whether you are allocating 4 bytes or 2 gigabytes.

Try running that abhorrent code through perfmon.

I agree String.Format is the most readable, but why so many arguments? I’d have expected someone of your calibre to have used:

string s =
@div class=user-action-time{0}{0}/div
div class=user-gravatar32{0}/div
div class=user-details{0}br/{0}/div;
return String.Format(s, st());

I am astounded by all these commenters who think String.Format() is more readable than String.Concat(). Um… when you’re trying to understand what a line of code does, wouldn’t the function who’s name SAYS EXACTLY what you’re doing is more readable than any other function, period?

Anyway, I was thinking about this recently myself, and I figured, I probably can’t go wrong performance-wise by using the function that Microsoft created specifically for the purpose of concatenation. Does it have the very best performance for incredibly large strings? Who know… but it stands to reason that it’s not far off.

it all depends on the string functions you are using… i had exactly this problem before from using strcat from the C++ library. it doesn’t keep track of the lengths of the strings anyway, since it works with a plain null terminated array of chars. the fix was simple, keep track of the length and do the concatenation by hand, making sure to reallocate the string at regular intervals based on the length of the existing string and the string to be added. the result saved several minutes in the worst cases.

all it was doing was taking numerical values from memory and creating a string like this, one gmegabuf(i) = some_value;\r\n at a time:

gmegabuf(0) = 0.1;
gmeagbuf(1) = 0.222222;
gmeagbuf(2) = -0.303; etc…

but because it was up to a million values (in actual use it only ever reached 250k or so…) the performance impact by each strcat call finding the end of the string was significant.

optimisating away library calls is only a potential waste of time if you don’t understand the original call. luckily i could step through the strcat code using MSVS and identify the problem myself…

bottom line, if you don’t understand the implementation do some testing first. don’t just guess.

personally i wouldn’t have made the same assumption about the gc that you did… if its well made it wouldn’t have the problem you described, it could, for example, save up a few old strings and then deallocate them together, or mark it as free space to put the next string, keeping the memory overhead small without impacting performance too greatly. there are no references around that force the gc to leave them there (at least judging by this code only).

i would say look into malloc implementations, you’ll gain some knowledge you will find valuable for thinking through situations like this one… then again, you don’t like memory allocation, so whats the point? its not like every program you ever write is dependent on it

While I mostly agree with the posts suggesting readability should come first, I have to say it completely depends. I work at an NLP house and so pushing, pulling, chopping, building, comparing and counting extremely large quantities of strings and characters is what we do all day long. While readability is desireable, sometimes you simply have to go with the best performance and be clear in the coding style and commenting.

A comment that it just doesn’t matter when the difference between format and concat is over 10% (on a small test set) wouldn’t fly here.

I’m definitely spending too much time on stackoverflow, I read the post and started looking for the vote up button…

Wise words from the Man who is worried about whether or not a SQL Query runs 60 or 100 ms :slight_smile:

But I’ll prefer String.Format for the exact reason Karl pointed out: It’s the most readable. It’s essentially only a (powerful) wrapper around StringBuilder, but as said, more readable for very little extra price.

Wow, for once a post I wholeheartedly agree with. What Jeff said, times two.

Ignore optimization until you have a performance problem, and then make no assumptions about where it is - measure it in your application! The problem is almost never where you think it will be.

To quote computer scientist Michael Jackson, “The First Rule of Program Optimization: Don’t do it. The Second Rule of Program Optimization (for experts only!): Don’t do it yet.”

To the guy above who was saying that the 300ms difference is hugely significant… if that’s such a big impact on your application’s performance, maybe you shouldn’t run your application 100,000 times in a loop before delivering the result to the user? Did you really just miss that the difference was per 10^5 iterations, and that the actual maximum difference for one group of concatenations was 3 microseconds?

If your application is performance critical, measure where the performance is going and then tighten up that code or better still change algorithms - don’t fiddle with trivialities because they look easy!

Jeff,
Great post and good conclusions. Pre-optimization is truly the root of evil in software development projects. What you don’t discuss in detail is when this potential optimization should take place. If you are observing performance that isn’t acceptable, then investigate. If you are tinkering with the code before you know there is a problem, then that’s where the damage is done. Assumption Driven Development is a horrid bane on our industry.

If you wonder why IT software projects are chronically late and over budget, just read the comments to this post. It’s an interesting insight into how developers think about their jobs.

I think you know nothing about performance.

There are good reasons, why tools like FXCop warn you about using += . In greater organizations you will never know, who will use your code and how often it is used. For what reeason do you think the StringBuilder was designed? If you learn to program correctly you don’t need to worry.

But what should I suppose from someone who needs extra print stylesheets instead of creating a proper website.

It’s certainly reasonable to avoid preoptimization, but it’s still no excuse to consciously decide to use a potentially very slow algorithm simply for the sake of laziness or readability when a better option is only a few extra minutes of work.

I ran into this problem the other day when I had created a simple procedure that needed to determine if any member of one list was contained in a second list. Being lazy and figuring, oh… the lists will only ever contain a few dozen or maybe a hundred entries, I simply did the following:

iterate through list a
{
iterate through list b
{
if item from a = item from b return true
}
}
return false

worked fine until the one fateful day when a case came up with 50,000,000 entries in list a and b… all unequal. oops!!! The lesson is: use the algorithm that best suits the worst case condition, not the typical condition. (or at least make sure the worst case doesn’t completely break down)

Well now i’m all sorts of nervous. I never thought about string cat’ing as a possible killer. How could the first example loop be rewritten as to not anger the gods?

How much of this can we leave up to the compiler, interpreter?

Performance aside, if you ever plan on localizing your interface you’ll want to avoid += in user facing strings like the plague. String.Format is generally much better in that case because it allows your translators to rearrange your tokens to their correct positions for each language.

Best comments to this post are Mecki, Dennis angry and belligerent Forbes, and Dan Covill…

My own thoughts are something of a mesh: The difference between the string.replace method and the stringbuilder method is nearly 30%. To draw the example, let’s extend an additional half-baked scenario: say your app is taking 1.3s to return. In that case, using stringbuilder will improve your app’s speed by 25%. But then, you’d have to actually measure your app and decide which fruit is the lowest hanging…

@Steve
I understand your dilemma, but that case, if it really wasn’t expected is a change in requirements, and the onus should be put back on the user/customer to provide complete requirements with all possible scenarios.

If you think you need to explore all the potential worse cases, then you will never get done writing that software. Where do you draw the line? You say you draw the line from your own assumptions. I say draw the line from the information provided in the requirements. If that information is incomplete, then that’s a requirement/communication issue, not something to be remedied by making assumptions in the development cycle.