There Ain't No Such Thing as the Fastest Code

I was tickled to see that James Hague chose The Zen of Assembly Language Programming as one of five memorable books about programming. I wholeheartedly agree. Even if you never plan to touch a lick of assembly code in your entire professional career, this book is a fantastic and thoroughly useful read. I was a mere Visual Basic programmer when I found this book (along with The Zen of Code Optimization), picked it up on a lark, and I could barely put it down. It's that good.

This is a companion discussion topic for the original blog entry at:

I’m glad that I’m being educated in a world where such obsessions are not as vital. But still, it’s not about the craft of optimizing, it’s about understending what happens behinde your average code. Understanding that probably will make your day-to-day code more efficient.

The geek in me loves this sort of thing, but the reality is that doing the wrong thing quickly is worse that doing the right thing slowly.

Optimizing word-counting programs is fine, but figuring out why the user wants to know the number of words and making sure you are solving the problem (aka filling the need) is more important that coming up with the answer, even the right one.


I’ve had this book in my shelf since it was published. It was then intresting reading, and it still is.

Game of Life optimization challenge/examples are great in fact that they show optimization is much more than just fidling few asm cycles off here’n’there. Optimization often is about finding new way how to look your data.

Wow! That takes me back. I remember every time an article or book by Mr. Abrash came out, I would carefully read every word. I think I still have my copy of Zen. Good stuff!

I loved the chapters on the cycle eaters, especially the DRAM refresh. The rest of the book is absolutely win as well.

Now… who here downloaded that all by hand? If you did, shame on you… :slight_smile:

I have the fastest code for ading two numbers 3 5
’print 8’

Wow!!! I know I just told a great joke.

Faster than “print 8” is “x = 8”

(the requirements described adding two numbers, nothing about emitting the result)

or even “”

(strictly the requirements didn’t even say the result should be stored, so it can be optimized away)

What’s the saying? Debugging is twice as hard as writing the program in the first place, so if you write a program as cleverly as possible, buy definition you’re not smart enough to debug it.

Beating the C compiler at writing ASM code has been my main occupation for the last 10 years. The real fun began when I got away from x86 and started working on a “Man’s” processor - the ARM: Lots of registers, free barrel shifts, conditional execution…Fertile ground for true geeks to go optimization crazy :slight_smile:



“Case in point about C compilers vs hand written assembly. The best C code ran a particular benchmark in 2:30 seconds, while the equivalent assembly code (which took a week to write) runs in 15 seconds. Considering that we’ve been running the code on thousands of machines for the last 8 years, that week of optimising has proved to be priceless.”

That is an astounding job, and completely relevant in your case. The main drawbacks there are that:

  1. The code is now much harder to understand instantly
  2. Algorithm-level optimizations are less likely to occur

In the case where there are no algorithm-level optimizations left to put in, it makes sense to cement your current algorithm by hand-tooling the code, moving to lower-level languages to facilitate hand-tooling, etc. However, hand-tooling a poorly-fit algorithm might result in a 10x improvement, while getting a better-fit algorithm might result in a 100x improvement.

So, again, hand-optimizing is great when you know the algorithm is solid, the implementation bug-free, and the code you are hand-tooling is a bottleneck in the application (from the user’s perspective). Presumably this is the case in your example. If these three are not true, hand-tooling, thunking down to lower-level languages, and otherwise obfuscating your code is a bad move.

For a counter example: a constraints-satisfaction engine I work on saw a 120x performance improvement (22 minutes to 9 seconds) in large part due to moving from C to C++. Why? Because the underlying code had been exhaustively point optimized, but the algorithm it used was not optimal (and, frankly, managing the kinds of data structures and collections which was required for the more advanced algorithm would have been a maintenance nightmare in C). Perhaps we should have spent several years implementing the improved algorithm in Assembly (for each platform) instead, and then just decided to never modify it again. But since that particular improvement we have tweaked the algorithm several times, resulting in an additional 50% gain (ie, it is now down to 6 seconds), and know that we can gain another 200% by re-implementing it in Java.

The point of the story is: algorithm optimizations by far trump code optimizations and language optimizations. To the extent that lower-level optimizations make it harder for algorithm optimizations to occur (primarily by obfuscating the code), they are, as Knuth stated, the root of all evil.

At least in my line of work, algorithm advances aren’t a matter of coming across something someone just published, but rather determining how a particular published algorithm could be squeezed into our problem domain without destroying its efficiencies; there’s little confidence that the answer we have today is even the best answer out there for the state of the art today, much less for tomorrow, and as a result there is a definite likelihood that a new approach will pop forth which blows today’s implementation away, tomorrow. It’s already happened several times, and so I’ve learned to keep obfuscations safely tucked away so they don’t trip us up when the next algorithm advance shows up.

Definitionally, there must be a “fastest code”*, given a specific task and a specific environment (because otherwise we have the awkward problem of infinitely fast code, if it can always get faster by at least one cycle).

Given the difficulty of Iestablishing/i it, and the much more important fact that we almost never know how close we are, it’s not unreasonable to Iact/i like there’s no such thing, though.

  • Or “equally fastest codes”.

From a purely theoretical POV, this statement is quite true as well. There is no general way of proving that a given program is the fastest implementation of an algorithm. And while it’s possible to prove this non-generically about individual programs, this is horrendously difficult to do for real computer architectures, and your time is definitely better spent elsewhere.

Of course, all that is assuming the algorithm is what you want in the first place. Real programs are composed of many different algorithms, possibly subdivided, so sometimes the entire algorithm is the problem, not the implementation.

There is no such thing as fastest, it is about whether it maintains its efficiency when n is infinitely big.

People … don’t confuse Big O notation with optimization. Big O doesn’t tell the whole story; it’s meant to be used to compare algorithm, not implementation, when n is sufficiently large. For small n, performance will vary wildly based on implementation (and optimization).

I could write 10 different forms of an O(n) algorithm that, on the same data, take anywhere from 1 second to 1000 seconds to run. The gist is that no matter what optimizations I do, it doesn’t matter if the underlying algorithm is still O(n)

I can easily write an O(n) algorithm that performs better than an O(log n), but probably only for small n. Big O notation is the simplification of a mathematical formula to express magnitude to make the overall comparison easier.

My O(n) could really be a x = 12 + n/20
My O(log n) could really be x = 9999 + 2000 log n

@ Joseph - your pipeline example fits squarely into this. The branch failure of binary search is a constant per iteration that is, in it’s own way, just an implementation detail. BST’s are still superior when n gets large enough. As you indicate, for low n, the linear search was far more effective.

To be clear, the philosophy and pitfalls of optimization-- and in particular the human element-- are always the foremost topic in the Abrash books. Read some of it and you’ll see what I mean.

I had a quick look at Abrash’s chapter on optimisation. It seems that he has some great technical skills but lacks business skills.

The top 10 list that was given makes perfect sense if you consider that programmer time is the largest cost component of writing software.

He derides the people who thought about making a business out of software as “…focusing on means, and have forgotten about ends.”

In contrast in my whole career there have only been one or two cases where performance needed to be improved.


To download this to current dir on linux I did:
wget -r -nd -np -l1 -A ‘*.pdf’ a href=""

And no, I can never remember the wget parameters either:
a href=""

As for optimization, I would say that it’s more important
to try and use existing logic (libs etc.) as they will likely already
be optimized, or if not can be optimized independently of your program.
Now, don’t assume long implemented logic is well optimized.
Personally I’ve found lots of low hanging fruit in various
linux utilities and libraries for example. It was easier to
fix the generic logic, than incorporate the faster implementations
into my programs.

I’m always amazed at all the coders who claim that performance is old school, and that optimisation is a worthless, extinct art. A lot of posters above demonstrate this attitude. Meanwhile, the company I work for sells crap old embedded hardware with software which looks and runs amazingly fast compared to the competition on newer rigs. When I view their recruitment pages, it all makes sense. Needless to say, my employer makes a killing on margins, for a modest extra they pay for their engineers.

Case in point about C compilers vs hand written assembly. The best C code ran a particular benchmark in 2:30 seconds, while the equivalent assembly code (which took a week to write) runs in 15 seconds. Considering that we’ve been running the code on thousands of machines for the last 8 years, that week of optimising has proved to be priceless. That is a 10 fold speed increase in one particular benchmark alone. And the code doesn’t live in isolation, I’m sure that the rest of the untouched code is faster since this routine consumes less memory, hence it consumes less CPU cache, hence it leaves more room in L1 cache for other code.

Win-win for everyone. And the more people which claim optimisation is worthless, well, that just means that I can charge more since my particular expertise is almost impossible to find. And I’ll take the big $$$, thank you very much. Yeah, I agree with y’all, optimisation is crap. Dont bother learning it.