The First Rule of Programming: It's Always Your Fault

Leo: “All too often the problem is that the bug is our own fault and is due to our misunderstanding of how a component works. In those cases it’s easy to see why we assume the bug is somewhere else: We look through our code over and over and everything looks perfect, but only because we have an incorrect view of how it should be.”

In that case, it sounds like the other programmer wrote incomplete or misleading documentation – a common rookie mistake. I know the slashdot mentality is “we don’t need no steenkin’ documentation!”, but the truth is that all of the best programmers I’ve ever known were also the best at documenting their code.

I maintain that incomplete or misleading (to a person) documentation is a bug of no less severity than incomplete or misleading (to a machine) source code.

Who’s the god of computer science? Knuth – advocate of Literate Programming, not coincidentally. You don’t see him sending out “oh, it doesn’t work in that case, but you just misunderstood what I’m saying…” letters. When it doesn’t work like he claimed, he sends you money. That’s why he’s Knuth!

It’s interesting to read some of these responses; humility doesn’t come easily to some, it would seem.

The lesson isn’t that there are never bugs in “their” code; of course there are, and anyone who has worked with technology long enough has run into their share of them. The real lesson is a message about statistics, our own fallibility, and our need to validate our accusations before leveling them.

Great post, Jeff.

On a related note, a mantra that I have to repeat over and over again to fellow developers, and to myself for that matter, is “don’t guess - work backwards from the symptom.” In other words, determine what conditions could be directly causing the symptom of the problem, then figure out what could be causing that condition, and so on. Seems perfectly obvious, but it’s amazing to me how often this gets forgotten.

You should see some of the terribly written tools I have to deal with. Oracle Web Services Manager, for example, elides whitespace before verifying digital signatures, which makes it broken for signed content that contains whitespace and follows the specification. Then if you try to use their extension mechanism to do it yourself you find they pretty print all messages internally which also screws up signed content - working with this tool is like being a pinball bouncing from bug to bug to bug.

Well, it seems you never used Delphi 2007 . I challenge you to use it even for a single day without finding a bug.

At least most of its bugs don’t affect the final product, unlike Oracle, which has those annoying internal errors.

I once took over responsiblity for a system built and maintained by two fanatically, rabit, Microsoft hating VB 6 developers (hey I guess they had to pay the bills somehow) 8-).

For a couple of years before I was brought in, they had conditioned the business owner of the application that almost every bug was a bug in Windows 2000 (which was part of the reason I was brought in to er…well…babysit). There was always a ‘workaround’ that would take 2 or 3 days to meticulously craft to ‘FIX’ the crappy Microsoft OS…at least as was conveyed to me by our business partner (I found out later that everything took 2 or 3 days because of their addiction Everquest).

Within hours of my first day working with my 2 new best friends, our business partner comes in, after getting his ear chewed off by an angry customer, and explains the problem and literally the first words that are spoken by one of them was “F’ing Windows 2000…it’s such a POS” without even trying to reproduce or debug it.

My suggesting that we should do the work necessary to prove that it wasn’t our code first before casting aspersions at the OS was met with…well…let’s just say, they looked at me like I should be wearing a dunce cap and standing in the corner with my Microsoft fanboy-self.

Of course it was a bug in our code, arguably an odd, edge case, probably never, ever, happened before bug in our code. The next 30 bugs that were reported over the course of the next 2 months (it was really really buggy code) were NOT Microsoft related either.

Lucky for me, one of the two of them quit (because he could not longer tollerate my draconian ways…like implementing source control and bug tracking) and I was eventually only forced to let one of them go (the daily 6 hour Everquest exxersions only made it easier).

Do this day whenever I hear a developer quickly conclude that it was an Microsoft OS issue I have to really really have to fight the urge to run screaming from the room.

I think its unfair to cite a study conducted in 1973 and 1984 as scientific proof that its very often the programmer’s fault. 20+ years ago is ancient history for programming, in 1973 the first microcomputer hadn’t even been invented yet.

Its true that more people are using system and development software these days, and because of a larger user base the quality may be better. But I think its more important that system and development software is more complicated these days. The average programmer is building on a mountain of pre-existing code, and its hard to believe every line of it is 100% right (in that it functions correctly, and the correct functionality is explained in some sort of document).

I still agree that its almost always the programmers fault, just that a study done before I was even born is hardly proof.

I’ve found that the best way to deal with weird obscure, unfiguroutable (yes it’s a word now.) bugs is to get up and walk away.

99% of the time, if I get up, walk away, get a drink, go grab a snack or something, and come back, I find the bug I was dealing with almost instantly.

I love all the folks that reply to a generalized concept post like yours with anecdotes of how one time (presumably at band camp) it wound being the tools fault, therefore the general assertion doesn’t hold true.

Well, I also know of someone’s grandpa who smoked, ate and drank excessively, yet made it to 100, but I wouldn’t recommend it to everyone.

Bottom line: I’ve read a lot of forum posts that start with “bug in …” and 99.9% of the time it’s not a bug!

“So what do you do when you are alone?”

source control (frequently used), unit tests binary search are your friend. When all else fails, go back and find the change that introduced the bug.

I think it was Arnold Glasnow who first said “A good leader takes more than their fair share of the blame and gives more than their share of the credit.”

There’s a bit of madness down both paths. We recently had a bug that was related to Firefox and Vista. We spent four maddening days scraping our mental fingernails against the black box that is the Aero layer. Our bug was finally fixed… by Microsoft releasing Vista SP1. I gained a lot of knowledge along the way during our bug-hunt, but I still don’t know exactly why it was fixed. And since we’re happy it’s fixed at all, I no longer have the resources to chase the bug without pulling overtime. C’est la vie.

That having been said, it’s still worth chasing such bugs. At the end of the day, I only have control over one code base: mine. I can fix bugs by changing what I control, or I can hope that forces outside my sphere of influence will do it for me. My customers appreciate it if I do the former.

i wholeheartedly agree with this post as i always assume that a problem in the system is a problem with my code. but, i recently went through a very draining process of rooting out a bug when using linked servers in sql server 2005 64 bit to talk to oracle 10g. after a 3 week process of rewriting code, testing and performing research, turned out that there was a bug in sql server. environmental bugs do occur, but that is the exception.

I love your blog, it’s clear your dedicated to the craft, and that is what I have always aimed for.

To constantly learn, and hone your skills, to work with other’s to create applications, that you can be proud of, let go.

But it takes constant focus, to make sure we release bug-free, extremely clear cut applications, that do what the people who requested them, asked us to do…

Sadly most of my experience, has been with companies, that cared more about quantity and speed, than quality, which usually ended up biting them in the rear.

Let’s keep at it, and never give up on quality.

Reading Jeff’s blogs makes me smile (heck, even the last one on the rechargeable batteries got me a little excited). It takes some courage to admit one’s faults, but I think it shows strength of character. I’d always prefer a programmer who was less skilled but honest over one who was more skilled but failed to take responsibility, lied about or vehemently denied the sources of bugs, claimed things were tested thoroughly etc.

The second rule of programming is YOU DON’T TALK ABOUT PROGRAMMING.

Great post!

This post reminds me of one of my favourite quotations:

"To err is human–and to blame it on a computer is even more so."
Robert Orben

I love these kinds of posts, if only to remind myself that I can do better. I also love reading the comments, like “this one time it didn’t work” etc. My experience has been (much like the select guy), that it doesn’t work the way you think it should because it wasn’t designed for that. So much of programming is internalizing bizarre concepts invented elsewhere, and when we fail to drink the koolaid we assume it’s the other guys fault. Just like a carpenter, when the saw isn’t cutting right should we either have to sharpen the saw (learn more) or try something else. Too many times we blame the wood and the saw.

Having worked on, among other things, support, I am always amazed when a caller’s theory about why his code doesn’t work is b) there’s a bug in our software, c) there’s a bug in the operating system, d) there’s a problem with the hardware. Notice the conspicuous lack of (a).

Then again, sometimes it is (b). Sometimes. Very sometimes.