@no-fun:
“Case in point about C compilers vs hand written assembly. The best C code ran a particular benchmark in 2:30 seconds, while the equivalent assembly code (which took a week to write) runs in 15 seconds. Considering that we’ve been running the code on thousands of machines for the last 8 years, that week of optimising has proved to be priceless.”
That is an astounding job, and completely relevant in your case. The main drawbacks there are that:
- The code is now much harder to understand instantly
- Algorithm-level optimizations are less likely to occur
In the case where there are no algorithm-level optimizations left to put in, it makes sense to cement your current algorithm by hand-tooling the code, moving to lower-level languages to facilitate hand-tooling, etc. However, hand-tooling a poorly-fit algorithm might result in a 10x improvement, while getting a better-fit algorithm might result in a 100x improvement.
So, again, hand-optimizing is great when you know the algorithm is solid, the implementation bug-free, and the code you are hand-tooling is a bottleneck in the application (from the user’s perspective). Presumably this is the case in your example. If these three are not true, hand-tooling, thunking down to lower-level languages, and otherwise obfuscating your code is a bad move.
For a counter example: a constraints-satisfaction engine I work on saw a 120x performance improvement (22 minutes to 9 seconds) in large part due to moving from C to C++. Why? Because the underlying code had been exhaustively point optimized, but the algorithm it used was not optimal (and, frankly, managing the kinds of data structures and collections which was required for the more advanced algorithm would have been a maintenance nightmare in C). Perhaps we should have spent several years implementing the improved algorithm in Assembly (for each platform) instead, and then just decided to never modify it again. But since that particular improvement we have tweaked the algorithm several times, resulting in an additional 50% gain (ie, it is now down to 6 seconds), and know that we can gain another 200% by re-implementing it in Java.
The point of the story is: algorithm optimizations by far trump code optimizations and language optimizations. To the extent that lower-level optimizations make it harder for algorithm optimizations to occur (primarily by obfuscating the code), they are, as Knuth stated, the root of all evil.
At least in my line of work, algorithm advances aren’t a matter of coming across something someone just published, but rather determining how a particular published algorithm could be squeezed into our problem domain without destroying its efficiencies; there’s little confidence that the answer we have today is even the best answer out there for the state of the art today, much less for tomorrow, and as a result there is a definite likelihood that a new approach will pop forth which blows today’s implementation away, tomorrow. It’s already happened several times, and so I’ve learned to keep obfuscations safely tucked away so they don’t trip us up when the next algorithm advance shows up.