The Sad Tragedy of Micro-Optimization Theater

Henry · January 30, 2009, 12:00am

I think Knuth still has it best, and most succinctly:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

If this code presents a bottleneck, then optimise it. If not, go and have a look at the code that is.

(Those tests don’t seem to be at all scientific - what was the variance on the results? How do the results depend on input size? Did you run the test a few times first to warm the cache? What exactly did you measure - are the other factors dominating the string cost?)

Jonathan · January 30, 2009, 12:00am

Great post, but you missed something that DOES make a difference… the size of the strings being concatenated.

Stringbuilder wins hugely once you have megabytes of text being added multiple times.

DavidR · January 30, 2009, 12:00am

I think the original warnings, which led to the advice to use StringBuilder, were directed towards those old-school Basic programmers who used to write readable single-line statements something like (I’ve seen it in stuff I inherited):

string s = @div class=user-action-time;
s = s + st();
s = s + st();
s = s + @/div;
s = s + @div class=user-gravatar32;
s = s + st();
s = s + @/div;
…

which has to reallocate space for s on each statement, and leaves a load of GC until later.
There may be conditionals or further calculations interspersed, which would prevent this being turned into a single multiline statement, but would be a good candidate for the Stringbuilder.Append() approach.

BuddyC · January 30, 2009, 12:00am

@Stephen Touset: that was the first thing that came to my mind, too!

And btw., as a java dev, I can trust HotSpot to do these micro-optimizations for me (like using StringBuilder instead of concat).

headsling · January 30, 2009, 12:00am

wow… i wish you’d put this on SO…

When testing performance in a managed environment, it is never enough to look at simply the memory overhead, or how many operations you can achieve per second.

If you ran those tests with multiple threads on multiple cores (as you would be running in a web server) you’d see a large difference over time between the tests that create lots of objects and those that don’t. The cost of our code, for the most part needs to measure the GC cost. Two threads, ‘doing the right thing’ with regard to object creation, will run into each other (regardless of the collection strategy used by the gc) over time - that small temporary object (or 1000) you just created? Guess what… they are now in gen 1…

Dan_Covill · January 30, 2009, 12:00am

In 1964 I worked in Management Sciences, a marketing assistance function of the Burroughs Corporation. Your blog made my ears ring with the memory of my boss’s voice, Instrument first, then Optimize! If you don’t know where the time is going, then how do you know you’re optimizing the right thing.

I remember a defense system where we found that 90% of the CPU time was being used by the string extraction routine, and that 90% of that time the length of the string being extracted was 1 !! A simple library change (extract single character) worked wonders!

I don’t think readability and performance are a trade-off. You can, and should, have both.

Dan Covill
San Diego

Tom_Hawtin · January 30, 2009, 12:00am

The String.Replace version is actually wrong. But then I expect that from Jeff.

ZuBsPaCe · January 30, 2009, 12:00am

Well said, but your conclusion, Jeff, is treacherous, because it may not apply to real-life scenarios.

What happens if those strings are larger? Say, appending some tags to a near complete site you’re going to render or maybe performance characteristics change after a while, due to memory fragmentation?

I have mainly a C++ background, where you have to think more about handling memory in general. There are lots of possibilities to reduce the risk of performance degradation due to bad programming practices: string copy-on-write, realloc’s, memory-pooling,…

But everything falls apart if you mix it with bad code and loads of data.

All I can say is, as soon as your program gets more complex and unmaintainable its far more important to profile your complete codebase in a realistic scenario. (I love glowcode for such tasks)

But yeah, for a website, use += and be done with it

Scott · January 30, 2009, 12:00am

There’s one other technique and I’m wondering how it would compare: throwing all your strings into an array and joining them at the end.

PeterP · January 30, 2009, 12:00am

hi Jeff,

Great post, personally I use String.Format, because its most easy to maintain an spot errors IMO.

I guess you could also do some crap like :
string str = new string[] {@div class=user-action-time, st(), st(), br /}

Given its a asp.net application, the fastest must be to reference the output stream directly, without instancing new strings I guess like the Control.Render(HttpTextWriter) method? (or that doesn’t matter?)

I was wondering about this scenario:

var sb = new StringBuilder();
for (int i=0;i123456;i++)
{
sb.Append(@div class=user-action-time);
sb.Append(st());
sb.Append(st());
bla bla…
}
return sb.ToString();

That will of cause use more memory, but the .ToString() method must be where the StringBuilder uses most time to execute?

Jeremy_Gray · January 30, 2009, 12:00am

Why has no one asked what is going on inside st()?

(True as it likely is that st() or the real-world equivalents thereof dominate the processing time requirements.)

Noah_Yetter · January 30, 2009, 12:00am

Unless you happen to write PL/SQL (or equivalent) in a multi-hundred-GB system, in which case you have to worry about performance every second of every day.

Now what was that about saying never?

Steve_H · January 30, 2009, 12:00am

I may be wrong here, but I think your testing the wrong thing. Try the code below, it shows the relative performance of each method over a large number of iterations. In your example above, each concatenation was a known-cost operation:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;

namespace StringOptimization
{

class Program
{
    const int ManyIterations = 2000;
    const int FewIterations = 1000;

    static void Time(Action action, string name) 
    { 
        Stopwatch sw = Stopwatch.StartNew(); 
        action(); 
        sw.Stop(); 
        Console.WriteLine({0}: {1}ms, name, sw.ElapsedMilliseconds); 
        GC.Collect(); GC.WaitForPendingFinalizers(); 
    }

    static void Concatenation(int iterations, int stringLength)
    {
        string result = ;
        for (int currentIteration = 0; currentIteration  iterations; currentIteration++)
        {
            result += .PadRight(stringLength);
        }
    }

    static void StringBuild(int iterations, int stringLength)
    {
        StringBuilder sb = new StringBuilder();
        for (int currentIteration = 0; currentIteration  iterations; currentIteration++)
        {
            sb.Append(.PadRight(stringLength));
        }
        string result = sb.ToString();
    }

    static void Main(string[] args)
    {
        Time(() = Concatenation(FewIterations, 1), Concatenation FewIterations:ShortString);
        Time(() = Concatenation(ManyIterations, 1), Concatenation ManyIterations:ShortString);
        Time(() = Concatenation(FewIterations, 1000), Concatenation FewIterations:LongString);
        Time(() = Concatenation(ManyIterations, 1000), Concatenation ManyIterations:LongString);

        Time(() = StringBuild(FewIterations, 1), StringBuild FewIterations:ShortString);
        Time(() = StringBuild(ManyIterations, 1), StringBuild ManyIterations:ShortString);
        Time(() = StringBuild(FewIterations, 1000), StringBuild FewIterations:LongString);
        Time(() = StringBuild(ManyIterations, 1000), StringBuild ManyIterations:LongString);

        Console.ReadLine();
    }
}

}

JohnK · January 30, 2009, 12:00am

So-called micro optimizations aren’t the problem, the problem is premature optimizations. If you’ve profiled your code and find that the bottle neck is in the construction of the page (as opposed to the fetching of the data that goes on the page) and the the page is constructed primarily through string concatenations, it’s absolutely appropriate to try to find faster ways of doing that.

Optimization is fundamentally about applying things that you know about a performance problem that aren’t known by a generic function or compiler can’t figure out for itself.

Mecki · January 30, 2009, 12:00am

And there it is again. My favorite no-brainer:

Does it matter if I’m 300 ms faster or not? User won’t even notice

Right… user won’t notice if your application is wasting 300 ms more on something that could be done without these additional 300 ms. And of course your application is the only one ever running at any time on the user machines or on a server, right? No wait… there is multi-tasking. Actually a server may run your application 100 times in parallel to satisfy 100 HTTP requests. But it may not have 100 cores, only 2. So 50 copies must run on the same core. And each of them wastes 300 ms while running on the core… hmmm 50 * 300 = 15000 ms - oh wait, those are actually 15 SECONDS! Yeah, the user certainly won’t notice if a web page opens 15 seconds faster or slower, right? RIGHT? Or maybe, just maybe, he will.

A single snow flake can’t do any harm. Have millions of them, they become a snowslide and cover whole villages.

If you look at single, tiny, isolated microbenchmarks, any optimization seems a waste of time. I can optimize parts of the Linux kernel to be three times faster, nobody in the whole world will ever notice. But if optimize 500 different functions to only become 5% faster, the overall gain for the kernel as a whole may be more than noticeable. It may be so much, that I don’t even need a benchmark, I can see it’s faster with my bare eyes.

Jordan · January 30, 2009, 12:00am

Jeff-
I question if you’d get performance gains by pulling any new calls out of the loop. I suspect new StringBuilder(256) carves 256+ bytes of memory off the heap. Does pulling this out of the loop and just clearing the contents each iteration make a difference (it would in C++…) or is this just another silly microoptimization?

To keep with the spirit of the test- the object would have to have a private string buffer to maintain itself…

Steve_H · January 30, 2009, 12:00am

Doh, I should have put the results up. Many iterations of a long string is 8678ms Vs 19ms. Seems on hell of a difference to me.

Concatenation FewIterations:ShortString: 4ms
Concatenation ManyIterations:ShortString: 9ms
Concatenation FewIterations:LongString: 2196ms
Concatenation ManyIterations:LongString: 8678ms

StringBuild FewIterations:ShortString: 1ms
StringBuild ManyIterations:ShortString: 1ms
StringBuild FewIterations:LongString: 10ms
StringBuild ManyIterations:LongString: 19ms

Practicality · January 30, 2009, 12:00am

Generally speaking the big time sinks are things like reading from disk or large database transactions. Another big one is using too many small functions try break apart strings piecemeal.

So the biggest ways to optimize:
(1) Use as few database transactions as possible (within reason). Also optimizing your database tables that you read often can make your programs much faster.
(1b) However, use a database over file transactions as a mature database is nearly always faster.
(2) Read or write all file data at once (with good functions, let the system optimize, it generally does well in mature languages).
(3) Use Regular expressions wherever it makes sense to-These suckers are FAST! Of course don’t use them for simple finds, as they are still slower than a single function.

String concatenation would not even be on the radar. It’s just too minor to be an issue. (Unless of course you have Schlemiel loops, but that’s a different problem).

Doug · January 30, 2009, 12:00am

++Daniel

It can make a huge difference in C++.

I changed a function that built some XML that gets absolutely hammered from:
CString tmp;
mp.Format(L/%s\r\n), elementName);

to:
wchar_t endElem[1000];
BuildBufferFromPieces(endElem, L/, elementName, L\r\n);

CPU usage dropped from 60% to about 15% (as I recall) – this is a very heavily threaded server app doing tons of work. The fact that there was no constructor, no heap allocation, etc, etc was probably involved.

Of course I didn’t optimize until I started profiling and found out I was spending a ton of time in this function.

But the point it is, it can matter if you’re coding close to the metal.

fishstickk · January 30, 2009, 12:00am

I would expect .NET to optimize this for me already (like java does). Your performance tests lead me to believe that optimizations are happening under the covers.