What's Worse Than Crashing?

KashifS · August 7, 2007, 12:00am

I work in the embedded storage industry, and the worst application is an app that runs fine but writes corrupted data. You can’t trust the app any longer and is a fatal blow to your companies reputation.

Customers will be RUNNING AWAY from your app and switching to other vendors. What’s more worse that data corruption bugs are downplayed so much (or not advertised at all) that a customer really doesn’t know if a software update contains a key fix.

So data corruption avoidance is priority #1.

This is why industrial-grade filesystems like ZFS include CRCs along with files because you really can’t trust the underlying storage hardware to do the right thing.

KashifS · August 7, 2007, 12:00am

On another note,

You could have a perfect applications that reads/writes data to files without guaranteeing data has been written to disk (i.e. data is still in file cache). So if your OS crashes before committing your writes to disk, you are toast.

Kashif

Mikecimerian · August 7, 2007, 12:00am

I think 90% bug free is setting the bar a bit low. During the good old days of text editors, compilers and linkers, I have written 6000 line apps without any bugs.

It has to do with methodology. Of course we weren’t plagued then with the horrendous implementations of object programming.

David_Kra · August 7, 2007, 12:00am

There’s another category 7: System causes crash or data corruption.

In a system with neither parity nor ECC memory, soft memory errors are neither detected nor corrected. An instruction which becomes an illegal opcode will crash. Otherwise, instructions just do the wrong thing and bad bits in the data just stay that way.

Many computers and networking components have buses and memory that are not error checked.

The draft article below from 2006 is entirely my own opinion. It has not been reviewed or approved by my employer.

doug2 · August 7, 2007, 12:00am

I am surprised how I have to constantly make the statement “If a condition that would cause an exception can be caught and dealt with, then it should be caught and dealt with before it throws the exception.”

That seems equivalent to saying, “Exceptions cannot be used for error-handling. Only return codes can be used.” Lots and lots of people disagree with that.

I find this very CS101. Exceptions should be left for the unknown, not the known.

It’s very much CS101 not to realize that “the unknown” is, at best, hard to recover from reliably, due to your program entering an undefined state.

Exceptions in languages such as C++ are for dealing with expected errors in correct programs, not dealing with bugs in incorrect programs.

Bill_Gates · August 7, 2007, 12:00am

Defensive programming is a form of defensive design intended to ensure the continuing function of a piece of software in spite of unforeseeable usage of said software. The idea can be viewed as reducing or eliminating the prospect of Murphy’s Law having effect. Defensive programming techniques are used especially when a piece of software could be misused mischievously or inadvertently to catastrophic effect.

Defensive programming is an approach to improve software and source code, in terms of:

General quality - Reducing the number of software bugs and problems.
Making the source code comprehensible - the source code should be readable and understandable so it is approved in a code audit.
Making the software behave in a predictable manner despite unexpected input or user actions.

ICR113 · August 8, 2007, 12:00am

Jon - If you were using:

try
{
openfile = file.Open(pathtofile)
}

What happens if an OutOfMemoryException or something occurs? It’s failed silently and problems will continue to arise later on which are difficult to track down.
What if it’s some form of permissions exception? Maybe the function is in a library and the person using that library wants to be able to display a useful message to the person about changing permissions. A contrived example, but similar legitamate ones exist. You’d suggest explicitly catching this and turning it into an error code? Why not just let the exception bubble up?

Whats so different from:

try {}
catch (A) {}
catch (B) {}
catch (C) {}

and:

switch(errorcode)
{
  case A:
     break;
  case B:
     break;
  case C:
     break;
}

It’s just as much code with a minorly different syntax without the need to wrap exceptions up in something less detailed and informative.
Apologies if I’ve misunderstood what you meant.

Lastly, very often problems caused by exceptions in a finally block are caused because a using block should have been utilised instead.

VinzentH · August 9, 2007, 12:00am

For example:

if (file.exists(pathtofile))
{
openfile = file.Open(pathtofile)
}

This is better than just doing file.Open() as the Open can throw a fileNotExist exception. Doing these checks can avoid the exception and maybe this is what Mr. Farley was referring to in his previous post before checking any known conditions.

In any case, one should do this:

try
{
openfile = file.Open(pathtofile)
}

A perfect example for totally wrong error checking code and wrong assumptions of the programmer.

In the first snippet: The file can disappear between the check for its existence and the actual try to open it, so you still have to handle the exception then (apart from the fact that there are more reasons why you won’t be allowed to open a file rather from its mere non-existence). What you did here is to make sure the QA won’t ever catch the error you made here, because it won’t be able to delete the file right between running the two statements, but it will happen at the customer site.

The second snippet is also wrong, because of the assumption that the only error would be a non-existent file. This leads to those funny error messages where the software keeps telling the user that the file “does not exist”, although it does. Here it may be only a minor problem, but - wrong assumptions are the reasons why fail-fast usually is better. It kills your dreams and actually tells you what happened and not what you thought would happen, if…

Oh well, who wanted to make error checking easier by adding complexity to a complex system[1] with “higher up code” delegating error handling to “lower level” exception handlers? What’s that good for if the average programmer’s assumption about the reason why any error could occur is wrong already?

[1] And counting the lines of code just needed to handle errors, error handling is one of the most complex tasks when writing software.

VinzentH · August 9, 2007, 12:00am

Anyway, I think one issue is the relative crappiness of most exception systems. What is really needed is a system which enables higher up code to catch exceptions and delegate to a lower level handler.

No. Adding complexity to an already complex task[1] does not make it simpler. It’s amazing how much of the code we are writing each day is just there to handle errors. Funny thing: This is code that almost never gets tested, because those errors never occur during the test[2]. Most problems I have ever seen with error handling code is not that it isn’t there at all, but handles errors under wrong assumptions about why and where the error occured. Programmers only handle those errors they can think of[3] and that doesn’t change due to a more complex exception handling system.

[1] Look at the average code and check how many lines you could get rid off if no subroutine could ever fail.

[2] Besides more subtle problems like for example: How do you save the user’s data if the error in question was the equivalent of a “disk full” from the OS?

[3] Back in the ages where we only had return codes and goto instead of nested exception handling, who ever cared checking the return code of printf()?

Xepol · August 9, 2007, 12:00am

I’ve heard the standard work user’s experience described as “sitting down in the morning, and randomly clicking at buttons and dialogs until they all go away and they can leave for the day”

Sadly, I have not yet found any real reason to disagree. Ask any normal user what they just clicked/pressed before a dialog appeared, and even if you ask them just after it appears on the screen, and just about every single one I have ever asked replies “I don’t know”

I can’t believe the level of non-awareness most people put into their computing experience. It’s like some bizarre whack-a-mole game for them that results in a pay cheque.

LOG EVERYTHING, it’s the only chance you’ll get. (Now you know why people have drives a million times larger than they’ll ever fill up - room for logs!)

ICR114 · August 11, 2007, 12:00am

Jon - The examples you showed are a specific case - that of a helper function in the application area that handles and recovers. Most of it would be acceptable, and I can see where you are coming from now with returning null. However, I still believe that swallowing the base exception like that and returning null to reflect not just “I’ve encountered an error, but don’t worry - I’ve dealt with it. You continue however is best for you now.” but that “Something has gone wrong. I don’t really know what, but I’m not going to bother to tell you.” can lead to problems and this is an example of where it should fail fast. It also limits the end user of the function to only being able to handle what you think they will want to handle. If any other odd ball exceptions are thrown (which are numerous and I will deal with this topic later) you run the risk of not letting them handle it when they legitimately might want to. If you really want to try and shelter the user of the function you can always catch the base exception type of the operation you are performing. In this case, you would catch and log an IOException, which would take care of any form of IO operation but still let others such as (as is my favourite example) the OutOfMemoryException.

Of course these criticisms really apply to reusable libraries (and if it’s not reusable, why is it in a library generally?) but they can be extended to functions in the programs themselves.

However, as previously said, I do now better understand what you mean by returning null and would agree with all except suppressing the base exception in the sort of cases used in your examples.

But consider if this was a library; you don’t want to, as I allude to previously (but the implications are more serious) simply hide what the actual problem is because you can’t fix it yourself. In this case, exceptions provide a great way of passing the problem up until somewhere, someone has the right knowledge of how to handle it appropriately and does so whilst also providing a lot of the information one might need to deal with it.

The problem comes in that as a programmer you have little idea what will be thrown with any given function because the code path is so complicated. This leads to people not catching things (at the higher level where it is appropriate to know what to do with it) because they don’t think that these problems could arise. A lot of the .NET libraries are actually fairly well documented with the exceptions that are likely to be thrown as a cause of the operation (rather than all sorts of ones such as OutOfMemoryException) but most programmers don’t know that this information exists. It’s in the documentation, but who really has to/wants to look at the documentation every time they use a function? Intellisense and the likes has drawn us away from that and we really need that sort of information - what exceptions are likely to be thrown - in the editor at our fingertips.

Take File.Open for instance. There are 9 different exceptions that could be thrown that you realistically will want to be able to handle, and if you use the pre-emptive check method (using File.Exists etc.) there are none that you should actually need to catch (aside from maybe the IOException). However, it does give you a list of things, a reminder if you will, of what checks you need to do.

If we had that sort of information in the editor I firmly believe programs would be written more robustly.

(I realise that you probably know and maybe even agree with a lot of what I just said. However, I think some people are too black and white or don’t consider that there are different practises for different areas and would apply a “return null on failure” at the lowest levels of a library where I would argue it’s inapropriate.)

SpinL · January 2, 2008, 12:00am

I led a team about 15 years ago that built a Sprays Control System for an Alcan aluminum mill. The team was rookie at the software dev game; I was not (I was a rookie in the aluminum biz instead). We had code running on 3 different platforms, written in 3 different languages, connected with a 10 Mb/s network - debugging it was going to be tough. After many long debates, I convinced the team that Fail Fast was the way to go, despite the fact that most real-time factory systems do the other thing and try really hard to stay running. But because we squashed sooo many bugs during dev testing, the production system was on time and almost on budget. Worked like a champ first time out too, running 36 hours before a database table filled up due to a faulty sensor (quick easy to fix). But the victory was obvious to us, and now two of the team members who continued on with software careers are are believers in Fail Fast. And I remain one, of course.

codinghorror · February 19, 2008, 12:00am

Here’s a nice primer on troubleshooting data corruption.

http://valhenson.livejournal.com/9540.html

Gives you an idea of how hard that kind of troubleshooting really is, too…

M_Kenyon_II · February 6, 2010, 12:00am

A story

Our Win Forms app was a data entry app. Generally, users would enter 100s of records into the app, one after the other, before finishing the job. While keeping data in an in-memory dataset, it was also written to the database religiously. (Every record navigation assumed to save the data to the database.)

One update released a bug, that, in certain untested circumstances, data could/would not be saved to the database, but, would not alert the user that this happened.

Imagine the frustration when, after 800 records, they navigated back to find only 800 empty records.

I’m a bad programmer, and I can always do better.

SteveJ · February 6, 2010, 12:00am

Certainly logging is nice, but failing fast adds a certain level of immediacy to the bug fix. Every bug is an application crash. I spent a few years doing QA and any dev can explain away a bug that merely spits out an error message (oh just move that file over there…fix this registry setting etc). Once the app stops working though, you have a moral imperative to harp about it until its really fixed, or made extremely user friendly. There’s way too much of catch exception and log out there - after all it makes that red squiggly go away.

That said, once your customer becomes someone that can’t reach out and grab a dev, you have to have some outside handler that can take any error and make the failure as friendly as possible. Assertions aren’t quite ideal, but it gives you a little of both worlds when they can be turned off/replaced at runtime.

Aaron_G · February 6, 2010, 12:00am

try
{
  Application.Run();
}
catch { }

Problem solved.

Seriously, though, for all the .NET programmers here, the Exception framework gives you an InnerException property for a reason. Use it, and whenever your application crashes, make sure you dump (to a log, not to the user) all of the information for every exception in that hierarchy.

For some odd reason (laziness?), people don’t like to create custom exception classes. They should, though, because it’s by far the easiest way to give meaningful feedback to the user while still keeping all your debug info intact. When you’re writing the UI code, good luck trying to handle an InvalidOperationException and figure out whether or not it’s actually fatal. A NotLoggedInException, on the other hand, tells you exactly what you need to do (show the login form). Throwing a generic exception is the quickest way to shoot yourself and your users in the foot.

Jon_Raynor · February 6, 2010, 12:00am

This:

try
{
Application.Run();
}
catch { }

is just a bandaid. Having an exception handler at this level of the program is merely a “catch all” for any exception that occurred deeper in the program. I am not sure how useful it would be to catch at this level since you are at the main entry point of the application (what recovery would one do?), but it would catch any exception that occurred deeper in the program. I think at this point, maybe just close out gracefully instead of JIT exception dialog.

I believe that exceptions should be handled as quickly as possible in the program and not allowed to “bubble up” through the different layers or call stack they are on. “Bubbling” exceptions are the work of bad design and programming.

Suppose I call this mock method in another class in the .Net framework:

void doSomething()

Now, this should return nothing (void right), but depending on the implemententation, it may throw an exception back at me. If it does, I’d better handle it.

Several class inside of .Net Framework have these cases which many times are not handled by implementors which lead to crashes.

For example:

if (file.exists(pathtofile))
{ 
   openfile = file.Open(pathtofile)
}

This is better than just doing file.Open() as the Open can throw a fileNotExist exception. Doing these checks can avoid the exception and maybe this is what Mr. Farley was referring to in his previous post before checking any known conditions.

In any case, one should do this:

try
{
   openfile = file.Open(pathtofile)
}

in case something unexpected DOES happen.

In this example, file.Open() may return a file stream or it may return an exception, even through it’s signature says it will return file stream. Perhaps this is a bit confusing because ANY method could return an exception back to you.

I prefer that exceptions are handled in the method they originate in and then send a return value rather than throwing an exception which must be handled. Then the caller can decide what to do with the return value but they don’t need to deal with any exceptions that originate from the method. You can generate your own exceptions too, but this leads to complex catch blocks

try
{
doSomething()
}
catch (Exceptiontype1) {}
catch (Exceptiontype2) {}
catch (Exceptiontype3) {}
catch (Exceptiontype4) {}
catch (Exceptiontype5) {}
catch {}

I never cared for this style very much although I have seen it on several occasions.

Also, final (no pun intended) advice: Finally can throw exceptions too, so be careful what goes in finally!

Jon_Raynor · February 6, 2010, 12:00am

Hi Vinzent and ICR-

The case of the file dissappearring in the few nanoseconds between the exist check and the actual open is a extremely rare case. But it could if another process was trying to do something with that file, in that case, you would still need the try catch nad maybe have more issues with design if multiple processes were trying to access and delete the file at the same time.

Here’s a few examples of handling exceptions at the point of which they occur. With this, button1 and button2 events will not crash the program. Also, you can see the exception handling logic adds many more lines to the code as you have mentioned previously.

If my previous example was unclear, then hopefully these are better.

For handling different types of exceptions, your could modify the friendly message to see something more specific about the error, like could not find file, directory not found, etc. and store that value in a member variable or maybe a static class which stores the last error message. Catching the different types of exceptions will allow for this instead of one generic catch {}.

Let’s say I have a button on the a form and when I press it I want to open a file and process it. Here is code:

private void button1_Click(object sender, EventArgs e)
{

  FileStream fs = opensomefile();
  
  if (fs != null)
  {
    //Dosomething with file

    //Close it after using it.
    fs.Close();
  }
  else
  {
    //Alert User With Friendly Message
    MessageBox.Show("Unable to open some file.  Check log for details.");
    //Do anything else here
  }
}

/// summary
/// Opens some file and returns it back to the caller
/// /summary
/// returnsA filestream, can be null if file fails to open./returns
private FileStream opensomefile(string pathtofile)
{
  FileStream fs = null;
  // Open the stream
  try
  {
    fs = File.Open(pathtofile, FileMode.Open);
  }
  catch (FileNotFoundException ffe)
  {
    //Do any specific FileNotFoundLog Logic and Log Error 
  }
  catch (PathTooLongException ptle)
  {
    //Do any specific PathTooLong Logic and Log Error 
  }
  catch (DirectoryNotFoundException dnfe)
  {
    //Do any specific DirectorNotFoundException Logic and Log Error 
  }
  catch (Exception ex)
  {
    //Log any generic error
  }

  return fs;

}

Another example, lets say I have a button that going to do something specific to the file.

private void button2_Click(object sender, EventArgs e)
{
  if (!dosomethingwithfile("PathToFile"))
  {
    //Alert User With Friendly Message
    MessageBox.Show("Unable to dosomething to file.  Check log for details.");
    //Do anything else here
  }
  else
  {
  //Successfully processed file do any additional work here.
  }
}

private bool dosomethingwithfile(string pathtofile)
{
   bool success = false;
   // Open the stream and do something
   try
   {
     using (FileStream fs = File.Open(pathtofile, FileMode.Open))
     {
       //Do Something with FileStream
       success = true;
     }
   }
   catch (FileNotFoundException ffe)
   {
     //Do any specific FileNotFoundLog Logic and Log Error 
   }
   catch (PathTooLongException ptle)
   {
     //Do any specific PathTooLong Logic and Log Error 
   }
   catch (DirectoryNotFoundException dnfe)
   {
     //Do any specific DirectorNotFoundException Logic and Log Error 
   }
   catch (Exception ex)
   {
     //Log any generic error
   }

   return success;

}

Both of these examples show non-bubbling of exceptions. Although trivial, I think they are valid.

You could also let exceptions bubble up if doing a multistep process, but they would not escape to other parts of program:

private bool dosomething()
{
bool success = false;
try
 {
   //dopart1, dopart2, dopart3, dopart4 may throw exceptions which 
   //will need to be caught because other callers may not handle them
   //properly.
   dopart1();
   dopart2();
   dopart3();
   dopart4();
   success = true;
 }
catch (Exception ex)
 {
   //Log Error
 }

return success;
}

This way you ensure that other parts of program are not responsible for any exceptions that occur in dosomething. They get a return value telling of success of failureand then may act accordingly.