The First Rule of Programming: It's Always Your Fault

You obviously have never worked in an environment like IBM WebSphere Portal or ATG Portal. Our company sells consultants for very high bill rates to work on products like these because they are so convoluted and full of bugs. It takes a special kind of programmer to be patient enough to identify the bugs, report them to IBM (or whatever the company is) along with stack traces, log files, etc. On top of that, you have to find workarounds for the bugs because you can’t count on an immediate fix. I had the pleasure of taking a break from .net to do an IBM Portal project last year and I found two nasty bugs in Portal myself. After venting my frustration for a half hour to my coworkers, saying things like “I’ve never had problems like this with Microsoft technologies”, the IBM guys that do this stuff every day told me that this is the norm with IBM products, and it’s why they get paid the big bucks. By the way, it took over a month for them to find and fix the bug!

Some of the toughest bugs i find today are with Javascript. No matter how experienced you are, you’re always elligible to find a Javascript bug that just doesn’t make any sense, and it is usually your own fault. The closest i got to “it not being my fault” was in doing some recursive javascript functions where the values of some of the variables in the function were completely senseless and impossible to trace. After two days of looking at the code i realized that somehow the variables declared inside the recursive function, even those inside statements inside the recursive function had global scope and were being reused across the recursive calls.

Lesson learned: always use the keyword “var” for variables in javascript.

I still blame Javascript, it wasn’t my fault. :stuck_out_tongue:

Jeff,

It’s more frustrating when you dedicate a complete team resources to look into a complete set of nasty bugs in your application, to discover that the problem is in deep in the programming tools. We discovered that on BDS2006 and RAD 2007 one year ago on BDS2006 and when RAD2007 was launched.

We create the test cases to reproduce the problems, but nor the original company give us a solution nor the new company give us a solution, we have to move back to BCB 6 to have a compiler system that run properly.

Finally we decided to move away of this tools, as fwe do not get proper support, nor way to CodeGear to recognize that a nasty bug is on their tools, that make them unusable in a mixed environment. I still waitting for a solution promised from the CEO but never, ever arrive.

Currently you can not generate proper headers from Delphi components (VCL) to be used on C++ side, result Access Memory Violations across all the components.

http://twitter.com/wilshipley/statuses/774574882

“99.9% of the time, it’s your bug, not the compiler. The other 0.1%… well, it’s usually your bug then, too.”

Word. Unless you happen to be writing code in RoR. In that case, you can rest assured it’s not your fault!

Jeff,

I agree, 100% it’s your bug, or its in your code or its in the tools you use to develop, but you have to deal with it.

Very true, most of the time… SolidWorks is CAD software use by some of my company’s clients. Automation is accomplished by writing code against the SolidWorks API… Based on that experience, I can say that there are times when it’s 50/50 the system’s fault.

This is also the first rule of marriage and many other endeavors.

The discipline of humility is vastly underrated.

“A good carpenter never blames his tools.”

Good advice in carpentry, programming, and life.

So we reach step 1, admitting we have a problem and it is, statistically speaking, a bug we’ve created. Great. Now what?

We go looking for it… but unlike the “select is broken” guy, we run some tests, recreate the problem and isolate and fix the bad code. And for me, if I determine I’m just not seeing the error, even after a good night’s rest, I seek a fresh set of eyes and a set of ears to talk through the issue. I don’t need to be in a “pair programming” house to take advantage of the technique. Sometimes, just explaining the problem and “talking through” your code with your coworker is enough. Often they don’t find the bug, but you do by having to explain it.

Back in the dark ages during my formative CS classes, we had to explain (in comments) the ins/outs, assumptions and pre-/post-conditions of our methods. While at the time I don’t think I took that nearly seriously enough, I found that knowing all that stuff about a method definitely minimizes the bugs. If you can’t answer what a method does, what it can modify and how, how can you trust it? I’m convinced that almost every real runtime bug I have found is the result of me spending more time typing than thinking.

I really prefer the bugs I encounter to be of my own doing. That means I can fix them, and won’t have to work around other peoples bugs…

You down with OPB?

Jeff: very true

These days, all our code runs on Linux (or maybe OpenSolaris). If select() isn’t working right, we open up select.c and check. If your whole stack is open-source, there’s no need to program by coincidence.

The problem with that approach is that this only checks for how select works on that particular machine and not for “how it is supposed to work” (on all other machines). Sometimes there’s a huge difference.

To add a story of mine: A couple of months ago I ran into the problem that the application started crashing randomly after switching to a 2.6.x kernel. After hours of debugging it turned out to be a bug in the compiler’s run time: phthread_join() was called twice under certain circumstances. No problem with a 2.4.x kernel, it simply spat out “invalid handle” and that got ignored. With the NPTL library on a 2.6.x kernel the call crashes on invalid handles. Of course, looking up the documentation it just says, “the behavior is undefined” under such circumstances. Go figure…

Better yet is when the bug is your fault, but you can’t repro it on your system. Case in point: a bug was reported in the section of code that I wrote. I ran it, and the “hey, it works on my machine” was the result time and time again. Switch to another machine with a different OS (I’m running one of three Vista machines in the company, everything else is either XP or 2000 Pro), and voila, we arrived at Bug City. As it turns out, there are a few tiny differences in the way Vista and previous versions of Windows handle exceptions. Those tiny differences turned into a pretty big holdup on the project.

Yeah, the bug itself was my fault. But trying to fix it was only hampered by the other 0.1% of the programming pie.

And now that I think of it, I want some pie…

Heh. Rather timely.

I spent the better part of yesterday trying to figure out why the child process in a CreateProcess wasn’t reading from its input handle. The output handles it was writing to fine, but no matter what I did it wouldn’t read from the input one.

Below is the chunk of code I found the problem in. See if you can see it too. :slight_smile:

	// Make the shell's input pipe
	if (!CreatePipe(si.hStdInput,Shell_Input,sa,0))
		throw std::string("CreatePipe failed because ") + Error_Message();
	ZeroMemory(si,sizeof(STARTUPINFO));
	si.cb         = sizeof(si);
	si.dwFlags    = STARTF_USESTDHANDLES;
	si.hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE);
	si.hStdError  = GetStdHandle (STD_ERROR_HANDLE);

Of course originally the ZeroMemory call wasn’t in there, but I found that CreateProcess did nasty things without it. Doh!

What the hell are you talking about?
My code is perfect!
It’s always been perfect and always will be perfect.
My code is a temple of logical perfection!
I was designed by the Kirk, the Creator.
I am perfect.
What?
The Kirk is not the Creator?
I am in error?
NOOOOOOOOOOOOO!!!

Alright, you got me. Just do not tell my Pointy-Haired Boss.

I don’t like how much you’ve simplified it. I think it would be better to say something like, “First act as if it is your fault”, or, “Begin by assuming the problem is in your code”. When you simplify it so much that it’s wrong, you’ve gone too far.

Even if the bug is in the system, your code is the only part you have control over, so blaming the system is pointless.

Even more fun are when part of your program works BECAUSE of a bug, that is later on fixed.

Look at this Java code:

protected Class resolveClass(ObjectStreamClass desc) throws IOException, ClassNotFoundException {
Class class = theClassLoader.loadClass(desc.getName());
return class==null ? super.resolveClass(desc) : class;
}

If you can’t tell what it does, it tries to load a class with the custom classloader, and in case it can’t load it, it instead forwards the class lookup to the next higher class loader. Right ?

Wrong. Trying to load a class will cause a ClassNotFoundException. The code will not even compile if you don’t deal with that fact. So the original programmer put the exception in the throw statement. However he did not think it through - if the custom class loader can not load the class, then the first line will throw the exception and thus exit the method. The second line can ONLY EVERY RETURN “class”. The forwarding to the superclass can never happen, because it only happens if the custom classloader can not load the class, and if that happens, line two will never get executed.

So how did this piece of code ever work at all ?

Java 1.2 had a bug where custom class loaders were created with all classes already loaded. That’s why it never failed to load any class, the exception was never thrown and nothing ever needed to be forwarded to the superclass.

Now imagine what happened when that bug was fixed in later versions of Java. Code that worked perfectly fine broke, and ironically, it really WAS a bug in the compiler (or JVM) - just not the way one expected. Nightmare to track down.

ps.: Here’s the fixed code, if anyone cares:

	Class class = null;
	try {
		class = theClassLoader.loadClass(desc.getName());
	} catch (ClassNotFoundException cnfe) {
		class = super.resolveClass(desc);
	}
	return class;