Exception-Driven Development

If you're waiting around for users to tell you about problems with your website or application, you're only seeing a tiny fraction of all the problems that are actually occurring. The proverbial tip of the iceberg.


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2009/04/exception-driven-development.html

This post feels like the old cliche that I found a hammer and now everything looks like a nail.

First, I echo the comments here about the apparent misunderstanding or apparent misuse of TDD. TDD is not premature bug fixing.

Second, not all bugs can be caught. There are entire categories of bugs that don’t cause an application to fail, but still hinder or even preclude a user from successfully executing a task.

There is ALWAYS pressure to get your application out. And besides the economic pressures for getting into production, there are soft reasons. As creative beings, we get a huge amount of satisfaction seeing creations running being used (like proud parents or starving artists). Also, some feel like they are serving the better good - providing solutions for mundane and/or repetitive tasks or for highly critical processes or whatever.

There is a spectrum of tools/approaches to dealing with errors in our software. We need to appreciate them all - from system inception to system maintenance - and embrace a balanced approach using them each in an effective and timely manner.

There are different types of errors.

Would I wait for a user to find out that a button doesn’t do anything? Maybe.

Would I wait for the user to find out that when they delete an order they are actually deleting another customer’s orders, dropping 3 tables, and removing the primary key? Hardly.

I think Iain Holder and Matt Lentzner are absolutely right. In no way does EDD replace TDD, but I do buy Jeff’s argument that EDD is uniquely valuable. So use both. TDD requires an up-front time investment, but saves user frustration, improves overall design, and minimizes maintenance and refactoring. EDD requires (practically) no up-front time and provides valuable feedback on fixing bugs that do slip through to production.

So I do buy what I think Jeff intended to say. You can write tests until you’re blue in the face, but you’ll never guarantee that 100% of the hours are valuably spent and you can’t guarantee you’re going to stop 100% of the bugs. In a time-constrained, rapid development environment, this will feel to management like wasting time. Thus EDD provides the next layer of defense: you’ve tested all you can with the hours you have, now it’s time to push the bird out of the nest.

The caveat is that this doesn’t work for all development cycles. If you’re not going to release the next patch for three months, relying on EDD is going to murder you. But in web development, there’s no good excuse to wait that long. Be rapid or GTFO.

While there are many other valid reasons to practice TDD, as a pure bug fixing mechanism it’s always seemed far too much like premature optimization for my tastes

I wish people would stop trying to use Knuth’s saying in completly meaningless ways - testing or fixing bugs is not optimization, premature or otherwise.

obviously you haven’t seen my code, my software has no bugs J

I actually email myself every error that happens on my website www.postjobfree.com
That way I know almost immediately if there is any problem.

First! I’m so proud.

Isn’t WER just another way of waiting around for users to tell you about problems with your website or application?

Isn’t shipping software as fast as possible just another way of fail[ing] to see and address the issue before they [get] around to telling me?

WOW, great post…
I didn’t know about ELMAH and build my own system… doh!
(which actually has some more features than ELMAH like:

  • creating tickets automagically
  • allow users to add information and get notified about a follow up)
    , but is lacking the rss feeds. I am of to adding it :slight_smile:

@Daniel Straight:
There are ALWAYS (and I stress always) behaviours (I am not saying bugs here) users will find that you didn’t expect or anticipate. You will only find them when they are used by users that aren’t involved in the development/design/analysis of the product.

Isn’t WER just another way of waiting around for users to tell you about problems with your website or application?

The loop is closed much faster and it’s completely automatic. All the user has to do is … use the software. No emailing, no calling, no writing.

Isn’t shipping software as fast as possible just another way of fail[ing] to see and address the issue before they [get] around to telling me?

I get automatic Firefox (+plugins) updates all the time for issues I haven’t run into yet, personally, as a user. This method of shipping software as fast as possible should be seamless and automatic, and it is … in that case!

(although I did run into a major bug with FF 3.0.0 that drove me nuts. That one, they didn’t anticipate well.)

We collect WER dumps for our applications, and our experience is that they’re extremely helpful. Something like two thirds of the time they contain enough information to identify and correct the original fault.

And, to back up your assertion, many (perhaps most) of these reports do not tie up with anything we’ve had formally reported. The application has crashed, but the user hasn’t submitted a fault report to us (other than via WER).

I just wish there was an Would you like to include your contact information with this report in case the vendor needs to ask for additional information checkbox.

Although I remain a fan of test driven development, the speculative nature of the time investment is one problem I’ve always had with it. If you fix a bug that no actual user will ever encounter, what have you actually fixed?

But that isn’t what TDD is for. It is not a pre-emptive user defect fixer. TDD as a mechanism provides two major things:

  1. Better designed/implemented software.
  2. A set of tests to help you refactor without introducing errors.

Stephen Walther’s covers this and more here:
http://stephenwalther.com/blog/archive/2009/04/11/tdd-tests-are-not-unit-tests.aspx

What is the parallel to this when working with embedded consumer electronic devices with no internet connection? Currently the only method available to us is to spend a lot of money to have people handle service calls about errors they experience and then if service finds them of a high enough importance they let us know and we get some new firmware upgrades out there.

I’d love to know ahead of time what problems our users are experienceing and deploy fixes, especially in markets where upgrades for some consumer electronic devices can be done over-the-air) but there doesn’t seem to be a good facility for this. So, for now we are stuck with a QA department to, hopefully, find all of the bugs. also, people are generally more forgiving with a small webpage error than they would be if their TV crashed.

Also, if this is the case, I’m sorry to be the one to have to tell you this, but you kind of suck at your job

Well, in most job when you have real time limit and a lot of presure you have to cut sometime on good practice. You cannot always do what you should do. I do not think that I or We suck because we do not have the time to fix stuff that haven’t got any complains. Maybe with StackOverFlow you got yourself the priority fixed, good for you. But in fact, in a lot of enterprise, developper do not decide what they should do.

Think about it before telling people that we suck in our job.

Wow, ELMAH looks awesome. I’d like a Java port please thanks!

We too at Lokad are using ELMAH. It’s still a bit surprising how successful is this little piece of code compared to the whole Health Monitoring thing that is supposed to be built-in since .NET 2.0.

I think checking for user exceptions is a useful practice, but I’m dubious if it can be the main driver for development. Maybe you’re only really talking about web applications, where you can monitor exceptions server side, and roll out fixes instantly.

Although I remain a fan of test driven development, the speculative nature of the time investment is one problem I’ve always had with it. If you fix a bug that no actual user will ever encounter, what have you actually fixed?

Your attitude to TDD still confuses me. Are you saying that you implement TDD, but ignore test failures unless you’re sure a user will see a problem, or are you a fan of TDD, but don’t use it on your own projects?

Finally, the phase of testing that’s missing here is in-house testing. Does this mean you do not use any quality control for web applications, using your users as testers?

I quite agree with the post, but I also think knowing more about your application than the user is somewhat an abstraction.
I mean, unless you write very close to metal applications, your application will have to go nicely along on a lot of components.
And if you have the luck of writing something quite successfull, it will have to rely on / live with / work with virtually countless pieces of softwares, written by who-knows, maintained by who-cares and known by no-one.

ELMAH looks very interesting. Is there an equivalent for those of us in the Java world?