Timhm
Network BDE/Paradox apps corrupt data all the time - it’s our number one cause of support calls. I’ll be glad to be rid of it.
Network BDE/Paradox apps corrupt data all the time - it’s our number one cause of support calls. I’ll be glad to be rid of it.
Jeff,
It seems to me the fail-fast article you linked to is not referring to all software development. Rather it is referring to the development of in-house software for corporations. In such an environment you can be informed of - and respond quickly to - “fail-fast” crashes. In this situation it is a really good tool for flushing out bugs that made it into production.
However as a general principle of software development, it may not be quite so good. Shrink-wrap software probably needs to have more effort made to catch errors and intelligently recover from them.
I work in an embedded system which operates with money (no, its not an ATM). We design software before writing it, we test software before giving it to QA, QA tests it before giving it to the regulators, regulators test it before giving it to the client, and clients test it before rolling software out.
Crashing at any of these steps, and we’re back to square #1. Buggy software is not allowed to escape.
Problems with software and bugs is never technical, it is all financial. If money can be made by releasing buggy software, then that is exactly what will happen. If it costs more money to release buggy software than to spend the time to actually do it right, then good quality software gets created.
At the end of the day, the choice is up to the customer. Dont settle for second best.
I Disagree with the fail fast whatever… you may isolate a copy of the data the user working on and still let them work they way out by them self.
still waiting for your “Why Developers hate Software” post
smallstepforman - the problem is that there are rarely first best options. It’s hard to vote with your wallet when there’s nothing worth voting for.
Anyway, I think one issue is the relative crappiness of most exception systems. What is really needed is a system which enables higher up code to catch exceptions and delegate to a lower level handler. In other words, the higher up code specifies the policy with regards to the exception, while the lower code specifies the specifics. I believe something like this exists in Common Lisp.
The way this might manifest itself gui wise would be a dialog notifying the user of the pertinent info, and with a button allowing selection of a strategy to deal with it. A checkbox with “backup project to new file” might be a good idea
Sure - it’s hard to write a bugfree application, but you can - by design - make sure that errors don’t cause data corruption or failure. And testing, QA and unit testing will help you to get you to a point where the two or three litte errors at least do no harm.
I am - as most other who commented here - a software developer for embedded systems. Crashes or degrading behaviour is not an option. My, and our life might one day depend on code I’ve written.
One trick to get stable software is to hande errors as they occur and propagate them up to the caller. You will at a higher level of abstraction have the chance to handle hard errors such out of memory. Had bad luck displaying an incomming message? Maybe next time you have better luck because some background process has finished and freed some memory. If not - give up after x tries but keep your data intact.
How does the alternative looks alike? You got a null-pointer, didn’t handled it. Some code calling you expected that everything went well and writes some data into nirvana. This could cause funny behaviour or as well crash the entire system.
All vital subsystems shouldn’t use dynamic memory or write to files anyways. They are ore or less autonom as long as noone passes garbage to them or corrupt their data via bad pointers. What could go wrong on his level is just data corruption due to writes from the outside (use asserts here during development time) and floating point NANs creeping up and propagating themself into the guts of your database.
You have to check for those! Errno and low level exception handling is your friend here.
It looks like a difficult and time consuming process to follow these rules, but it is not. If you do it from the start it’s quite easy, and after a month or two it becomes second nature.
Uh, smallstepforman got there first, but I’ll reiterate his point anyway. A program that crashes should never make it through testing. From a testing standpoint, a program that crashes soon at an obvious place is much easier to diagnose and fix than one that tries to recover, messes up and then crashes, obscuring the real cause of the bug. You’re right that the end-user should never see a crash: a program that buggy shouldn’t even reach production!
Crashing is good. It is an essential reminder that despite our self-made ideas of supremacy over technology, it is indeed a minor miracle that most any of this crap works much of the time.
One benefit of a web app is that it makes error reporting so much easier, as you are pretty much assured they have a way of communicating with you (the Internet).
I think in light of this, mutating the previously suggested logging errors into a silent error report could be a very powerful tool for the future (indeed it’s already used in some areas of serving websites). Where you would normally fail fast in a debug build, you could now feasibly, in production builds, send these as minor error reports and then try, if it makes sense, to recover. Somewhere close to the best of both worlds I feel.
I would agree that people probably try to recover too often when they shouldn’t. If you don’t know exactly how to recover, don’t try.
I don’t think the article ever talked about releasing programs that fail as soon as you start them, of course you wouldnt release that.
However bugs will happen eventually in all programs and do you then try to hide it or crash so that it’s easier to find the actual bug ? That’s what this article is about.
Bugs will happen no matter how much QA you have, it might be something totally unrelated to your program such as a hardware failure on the clients computer but if you try and hide a problem by naively thinking you can fix everything then you will have numerous other problems and a really hard time to find the orginal problem.
Fail fast is the only methodology that makes sense, compare it to any other system in the world, if a car engine would somehow attempt to fix a serious issue with itself you would surely make the whole car life threatening. Why would you think software should attempt to fix serious issues ? Fail fast and then diagnose and fix the real issue.
I’ve read that article a few months ago and I have to agree. Catching exceptions can be problematic and cause other issues to crop up later. And a well debugged program shouldn’t need to patch or hide it’s errors. they should be shown in their full glory, that way everyone knows the deal and it can be dealt with quickly.
Here’s one I’ve never been able to recover or decipher. I have a web application that I’m developing. After a few days of running I get random errors. This only occurs on my machine. And the error is too many elements into an array. The only way to fix it is to logout and back in. Thankfully it hasn’t been reproduced in production but I think of the hours I lost tracking this ghostbug. bad memory sector perhaps?
There’s a mnemonic for ‘fail fast’: the code must live by the code of samurai honor. If the life isn’t perfect - time for seppuku :-).
I’ve switched two relatively large projects to fail fast. In both cases, there was a resistance from users, QA and developers at first; more like a disbelief “What?! Are you actually telling that crashing is good?” But it quickly diminished when they saw the quick turnaround on bug fixes and disappearance of tricky bugs and data corruption.
It is very easy to explain to users/QA, in my experience they understand it well. Just tell them that the alternative is corrupted data and tricky bugs taking forever to fix. Versus one crash that they will not see ever again.
My comment is not superbly relevant, but this reminds of the philosophy of “degrading with grace” when it comes to CSS design for websites. I stumbled upon that many moons ago when I read a tutorial on CSS by one of the guys at Webmonkey (I wonder if that’s still up).
I would just add a number 7 on the list:
7. Application crash/error causes harm or physical damage (for instance through the machinery it controls, or in medical devices, or for instance a power outage)
As a few has pointed out, failing fast and visibly is very good during testing, where as maybe not as good during production use (depending on what “production use” actually is). Anyway, assertions, available in most programming languages is a good tool for that job. Fails fast and visibly during debugging/testing, but usually not even activated in production use.
What the?
Hasn’t anyone heard of “log files”. When something goes wrong you immediately log the error message (one meaningful to the developers) and then try your damnest to recover from the problem. Then when/if the application does crash, instead of crashing somewhere obscure, you have a detailed trace of the problem.
I think the more sensible solution is:
In general, this fixes all errors. And, no, I would NOT want my car to “fail fast”! The timing gets jostled a bit on a country road and it just stops? No thank you! I want my car to (as it does) “overcome” any issue it can, AND let me know about it both effectively and non-disruptively (throwing it into Park while hurtling down the freeway at 70 mph is very effective communication, but slightly disruptive). That’s why it makes funky noises when something’s wrong, and lights come on, and little diagnostics get recorded to the onboard computer for a technician to decode with his horrendously over-priced code reader.
Log. Log. Log.
A theme I see growing more and more is using exceptions to control program flow and then sometimes failing to properly deal with the exception.
I am surprised how I have to constantly make the statement “If a condition that would cause an exception can be caught and dealt with, then it should be caught and dealt with before it throws the exception.” I find this very CS101. Exceptions should be left for the unknown, not the known.
Too, I think sometimes developers fall into the trap of thinking that they do not need to validate data at deeper levels because a higher-level method would have “found it already”. Every piece of data has to be validated at every level. This will slow down the program a bit, but it’s much better than relegating your end-users to sitting around and twiddling their thumbs while you go out on bug patrol.
And one more thing…
The excuse of “we weren’t given the time to do it right” doesn’t wash.
Giving your manager a 90% bug-free app and being overdue will get you a better performance review than giving your manager a 50% bug-free app on-time.
I would rather be labeled as a slow programmer than a bad (or worse, a half-a$$ed) progammer.
As always, clarity is king. Silent failure is the killer. Fail fast, and transparently.
I’m a big fan of the Pragmatic boyz and Ron Jeffries in this camp:
Patrol your borders well. Include exception handling where there are no consequences. Treat errors deep in your code like the end of the world. Unit test to back up your assumptions. (Particularly boundary conditions.)
Your app should definitely fail immediately if the database suddenly disappears, or if a packet of XML gets lost in the ether… unless you can try for it again… If the user picks a bad date range though, ask them to fix it, or correct it behind the scenes if possible.
Personally I think the push to make computers the perfect user experience is overblown. The commute to work is fraught with flaws: potholes, detours, wrecks, late trains, broken escalators, and faulty air-conditioning. There’s a price/performance scale that people have for quality. (A friend of mine calls it the “5, 500, F*** IT! Rule”.) Look at how hard NASA has to work to guarantee slim margins of error. As long as they have benign failure, the perfect world is something most people don’t care to finance.
“Giving your manager a 90% bug-free app and being overdue will get you a better performance review than giving your manager a 50% bug-free app on-time.”
Roger Farley, what planet are you from? Please say. I want to move there. I want to move there yesterday if not earlier.
One thing that you need to take into account when you talk about “failing fast” or “failing slowly” is the nature of your application – how is the state of the application coupled to the state of your data?
In most web apps, when someone gets an error, they simply know they can hit the back button most times and retry. With a windows-based app, that’s (of course) not the case – but does it have to be that way?
For business applications, my WinForms-based apps tend to use stateless middle tiers. This makes it darned tough for any front-end problem to seriously corrupt the state of my precious data, whether that be a database, queue, or something on the server’s filesystem.
That does make it a bit safer to write slower-failing clients (albeit ones which notify the user and log their error before allowing them to retry).
Some comments:
You have to decide at what level your error handling mechanisms should be handled; “at every level” is a not a good answer.
You have to decide very carefully around retry loops. http://blogs.msdn.com/oldnewthing/archive/2005/11/07/489807.aspx
MessageBox is a perfectly horrid error handling mechanism. Instead, you should try to get error data to the people who can handle it. So: if you don’t think a problem is severe enough to crash, but would still like to get it, put instrumentation in to send it to yourself. (At Microsoft this is called “Generic Watson.” http://blogs.msdn.com/chris_pratley/archive/2004/02/04/67276.aspx)
This point really can’t be overstated. Having code to talk to yourself is how you catch and fix odd customer errors.
You should understand completely how your components can fail, and should wrap the cases where there are complex failure conditions. Take the registry as an example: even when the function doesn’t fail with an error code, it can have really wierd postconditions.
Test your error cases.
I would like to add a plug for logging errors if you try to fixup. This is actually what automotive systems do (contrary to the comment above). That is what the check engine light is about. As I understand it, there are several processors that cna pick up if another fails. “Fail fast” is the opposite of what the company wants.
“Fixup” may mean restart or abandon a thread or transaction (with error return back the the client).
Jeff, what about logging errors when they happen as a compromise to merely “failing fast”? This would be transparent to the user. After logging, the app can provide the typical “just in case” error handling we all know and love…
This would give the development staff metrics to go by as they attempt to gauge how severe an issue is, how frequent it crops up, etc. while still allowing the user’s app to hobble along in the meantime by attempting to recover from its errors until the developers can fix the bug.
In our eCommerce apps, I’ve had good success with an approach of, when faced with an exception condition:
of emailing an alert message to our small development team with the relevant information to identify the issue, httprequest params, stacktrace, userkeys, etc.
do whatever you can (e.g. use default values for the missing data that caused the exception) to return the user a valid page along the lines of what they were expecting. It may not have all the information that would have returned had the exception not occurred, but the user continues with a sense that the application is still working.
As a developer, you are highly motivated to fix an issue that is filling your inbox. It’s much more visible than an exception in a log file.
(Note: It’s also a good idea to implement such developer alert sending methods with some safety valves that stop sending emails after a certain threshold. I’ve returned to work once (once!) the following morning after a new build to discover several thousand emails in my inbox. During that period, not a single user perceived an error or logged a support call.)
The truth as usual is somewhere in between ‘fail-fast’ and ‘corrupt-data.’ I have personal experience of a program that goes thru a list of items and on the first failure exits - leaving many items waiting that could be completed. Another program continually fails with null object errors and the logic of the application precludes fixing this problem without a re-write. It is important to distinguish between real failure and ephemeral conditions. Never lie, never ignore, never throw an exception as an alternative to control flow and validation, never believe that a failure can’t happen, never check to see that the computer has power!
Never lost any data? You didn’t just say that, did you? Your data is going missing and soon. The software gods are listening.
I do embedded. Always fail fast, but save a trail. We added “Asserts” to our product. The assert saves the state of things, and stops the applications. It covered the “This could never happen” and default cases. They show up all the time. Many hard to find and unexplainable bugs were rooted out.
Yes, marketing and the customers do not want to see an assert. You have to remind them that an assert is better than a weird problem that never gets fixed.
Now, two years later, an assert is so rare, when one happens people don’t know what it is. The quality of the product is better.
Admitting there is a problem is the first step.
But what does “fail” mean in context of fail fast?
With a generic application exception handler, “failing” could mean absolutely nothing in terms of UI. Yes this is bad for the end user and is probably never implemented this way, but you can fail fast as well as later.
Your logging framework, or whatever you have in place to deal with the unknown, can catch the failure immediately. It can log details about state of the machine at this point. Yet the application can continue on. The end user need not see anything more than a simple “oops” message on the screen.
If your application is structured properly, the impact to the end user will be minimal. Your “oops” message could even include a suggestion to restart the application, if you don’t trust yourself (and why should you, you just released software with this bug!)
I’ve seen this practice used in very popular commercial software.
Steve:
Point taken! You’re absolutely right. It’s truly scary what end users “work around” when they should simply ask for a fix.
The important thing is to realize that what’s good for the development process is not necessarily good for the users. That’s why we have debug builds. I think it was in Steve McGuire’s excellent (if dated) “Writing Solid Code” that I first came across the “fail immediately” idea, and it’s presented squarely as a development philosophy. In release builds you try to recover as best you can, in debug builds you want everything to blow up the moment one of your assumptions isn’t met.
Maguire’s argument was against defensive programming. If there’s a bug but the program doesn’t crash then there’s a temptation to keep going and fix it later. This is pretty much always a bad idea. In development you always want to know immediately if your assumptions about what your code is doing are wrong. That means you don’t understand what’s actually happening, so it’s just plain luck if anything seems to work. (And your luck is guaranteed to run out at just the wrong time…)
Agreed wholeheartedly with “fail fast”. Code which does something like
Thing* t = createThing ();
if (t == NULL) return; // grrr!
t-doSomething ();
is not wise defensive programming. Let it crash, even in production – at least you’ll get a nice diagnosable crash dump that way.
There’s also one annoying problem with crashes, slightly tangential to data loss, and that’s a loss of the recent configuration changes, when the application naively believes that it’s the best to write out the settings on its exit.
Namely, it’s a pain with firefox; proxy settings which don’t get saved, new accounts (like, for s3 organizer) which get lost if the main executable will terminate by crash or forced kill.
Not the greatest thing to keep in mind, to restart the application after configuration changes for them to get really saved, just in case.
I have to agree with the “fail fast” philosophy, especially for a lot of embedded software.
I worked at a place where we had our own in-house kernel, about 50KLOC of kernel space code. The system was an embedded system, which could not be serviced at the site at all.
The guiding principle was: if the error points to memory corruption in the kernel, or a really bad hardware failure (bad parity on PCI for example): kernel panic immediately.
The system is already designed to recover well from an unexpected boot (power failures, kernel panics) and it will boot into a special, minimal kernel mode after such a failure.
This is by FAR much better than to continue working on an unstable kernel, which can cause data corruption or even destroy your hardware (yes it can!).
The result? When the system is deployed, there were no kernel panics at all. In fact a kernel assert was so rare even during development, that if it happened every kernel developer would drop everything and come and look at it.
On the other hand: your software really needs to be robust. Asserts should NOT happen in embedded, safety critical or value critical systems. You really do not want your X-ray machine software to assert before turning off the radiation. The aircraft aviation software should not assert during flight, if it cannot recover from such.
Something which I think is worse than any of those (aside from data corruption) is a program which refuses to die. Regardless of how rare the bug occurs, it completely unacceptable. This post is quite timely as earlier today I was burning a disc with nero and in the disc reading process nero froze. I couldn’t shut it down. Not even through task manager. I literally had to restart windows to close the application.
It seems that some people misunderstand the notion of of fail fast.
The fail fast idiom does not come instead of very careful, robust design, thorough testing and coding with discipline.
If you can recover from the error, of course you should. If you can design the system so that such errors cannot occur - even better.
The idea is to catch the “impossible” errors, from which you cannot possibly recover:
The idea is to those things that cannot possibly fail, you cannot recover from safely. If the fxn table is NULL, you can bet other important system data is corrupt. No recovery is possible at that stage. But if you fail fast, and your system is designed to recover from such failures, your software is more robust.
Again, fail fast is for those extreme conditions that should NEVER happen. But they sometimes do and you better be ready.
Some systems should never ever reboot, but such systems often have other fail safe mechanisms not available to most (hw redundancy, special watchdog timers, etc.), and go to extreme lengths and expense to assure reliability and availability.
@Roger Farley:
“I am surprised how I have to constantly make the statement “If a condition that would cause an exception can be caught and dealt with, then it should be caught and dealt with before it throws the exception.” I find this very CS101. Exceptions should be left for the unknown, not the known.”
That depends very much on the language, actually. In many languages, exception handling is expensive, which leads to the idiom you describe. However, in Python, exception handling is cheap, which leads to the opposite idiom, using exceptions for conditions that are special but not unexpected (like reaching the end of a file while reading). It’s preferred for functions to raise exceptions rather than return error codes, as well.
Agree with M
By virtue of the fact of typing ‘a program should be able to recover from an(y) error’ is saying that you already know what the error is and thus how to recover from it. Not recovering from said error is therefore not an exception, but just a lazy programmer. E.g. writing to a file that is set as read-only without first checking.
An exception is truly that. You didn’t expect it would happen when the method was written.
Fail-fast appears to be the best we can do because it tries to catch as many errors upfront in development and testing. It’s not perfect, but it’s better than exception gobbling software that continues to run and fails to quit despite unknown levels of corruption. This isn’t however license for software to just go poof and disappear. There is absolutely the need for proper logging and tracing to nail down what happened, especially for shrink-wrapped software. Also, different software environments dictate different approaches, and that’s life.
I write a lot of code that throws exceptions, and barely any that handles them. Usually there’s just a top-level handler that does the logging.
PS Analogies are lame.
I work in the embedded storage industry, and the worst application is an app that runs fine but writes corrupted data. You can’t trust the app any longer and is a fatal blow to your companies reputation.
Customers will be RUNNING AWAY from your app and switching to other vendors. What’s more worse that data corruption bugs are downplayed so much (or not advertised at all) that a customer really doesn’t know if a software update contains a key fix.
So data corruption avoidance is priority #1.
This is why industrial-grade filesystems like ZFS include CRCs along with files because you really can’t trust the underlying storage hardware to do the right thing.
On another note,
You could have a perfect applications that reads/writes data to files without guaranteeing data has been written to disk (i.e. data is still in file cache). So if your OS crashes before committing your writes to disk, you are toast.
Kashif
I think 90% bug free is setting the bar a bit low. During the good old days of text editors, compilers and linkers, I have written 6000 line apps without any bugs.
It has to do with methodology. Of course we weren’t plagued then with the horrendous implementations of object programming.
There’s another category 7: System causes crash or data corruption.
In a system with neither parity nor ECC memory, soft memory errors are neither detected nor corrected. An instruction which becomes an illegal opcode will crash. Otherwise, instructions just do the wrong thing and bad bits in the data just stay that way.
Many computers and networking components have buses and memory that are not error checked.
The draft article below from 2006 is entirely my own opinion. It has not been reviewed or approved by my employer.
I am surprised how I have to constantly make the statement “If a condition that would cause an exception can be caught and dealt with, then it should be caught and dealt with before it throws the exception.”
That seems equivalent to saying, “Exceptions cannot be used for error-handling. Only return codes can be used.” Lots and lots of people disagree with that.
I find this very CS101. Exceptions should be left for the unknown, not the known.
It’s very much CS101 not to realize that “the unknown” is, at best, hard to recover from reliably, due to your program entering an undefined state.
Exceptions in languages such as C++ are for dealing with expected errors in correct programs, not dealing with bugs in incorrect programs.
Defensive programming is a form of defensive design intended to ensure the continuing function of a piece of software in spite of unforeseeable usage of said software. The idea can be viewed as reducing or eliminating the prospect of Murphy’s Law having effect. Defensive programming techniques are used especially when a piece of software could be misused mischievously or inadvertently to catastrophic effect.
Defensive programming is an approach to improve software and source code, in terms of:
Jon - If you were using:
try
{
openfile = file.Open(pathtofile)
}
What happens if an OutOfMemoryException or something occurs? It’s failed silently and problems will continue to arise later on which are difficult to track down.
What if it’s some form of permissions exception? Maybe the function is in a library and the person using that library wants to be able to display a useful message to the person about changing permissions. A contrived example, but similar legitamate ones exist. You’d suggest explicitly catching this and turning it into an error code? Why not just let the exception bubble up?
Whats so different from:
try {}
catch (A) {}
catch (B) {}
catch (C) {}
and:
switch(errorcode)
{
case A:
break;
case B:
break;
case C:
break;
}
It’s just as much code with a minorly different syntax without the need to wrap exceptions up in something less detailed and informative.
Apologies if I’ve misunderstood what you meant.
Lastly, very often problems caused by exceptions in a finally block are caused because a using block should have been utilised instead.
For example:
if (file.exists(pathtofile))
{
openfile = file.Open(pathtofile)
}
This is better than just doing file.Open() as the Open can throw a fileNotExist exception. Doing these checks can avoid the exception and maybe this is what Mr. Farley was referring to in his previous post before checking any known conditions.
In any case, one should do this:
try
{
openfile = file.Open(pathtofile)
}
A perfect example for totally wrong error checking code and wrong assumptions of the programmer.
In the first snippet: The file can disappear between the check for its existence and the actual try to open it, so you still have to handle the exception then (apart from the fact that there are more reasons why you won’t be allowed to open a file rather from its mere non-existence). What you did here is to make sure the QA won’t ever catch the error you made here, because it won’t be able to delete the file right between running the two statements, but it will happen at the customer site.
The second snippet is also wrong, because of the assumption that the only error would be a non-existent file. This leads to those funny error messages where the software keeps telling the user that the file “does not exist”, although it does. Here it may be only a minor problem, but - wrong assumptions are the reasons why fail-fast usually is better. It kills your dreams and actually tells you what happened and not what you thought would happen, if…
Oh well, who wanted to make error checking easier by adding complexity to a complex system[1] with “higher up code” delegating error handling to “lower level” exception handlers? What’s that good for if the average programmer’s assumption about the reason why any error could occur is wrong already?
[1] And counting the lines of code just needed to handle errors, error handling is one of the most complex tasks when writing software.
No. Adding complexity to an already complex task[1] does not make it simpler. It’s amazing how much of the code we are writing each day is just there to handle errors. Funny thing: This is code that almost never gets tested, because those errors never occur during the test[2]. Most problems I have ever seen with error handling code is not that it isn’t there at all, but handles errors under wrong assumptions about why and where the error occured. Programmers only handle those errors they can think of[3] and that doesn’t change due to a more complex exception handling system.
[1] Look at the average code and check how many lines you could get rid off if no subroutine could ever fail.
[2] Besides more subtle problems like for example: How do you save the user’s data if the error in question was the equivalent of a “disk full” from the OS?
[3] Back in the ages where we only had return codes and goto instead of nested exception handling, who ever cared checking the return code of printf()?
I’ve heard the standard work user’s experience described as “sitting down in the morning, and randomly clicking at buttons and dialogs until they all go away and they can leave for the day”
Sadly, I have not yet found any real reason to disagree. Ask any normal user what they just clicked/pressed before a dialog appeared, and even if you ask them just after it appears on the screen, and just about every single one I have ever asked replies “I don’t know”
I can’t believe the level of non-awareness most people put into their computing experience. It’s like some bizarre whack-a-mole game for them that results in a pay cheque.
LOG EVERYTHING, it’s the only chance you’ll get. (Now you know why people have drives a million times larger than they’ll ever fill up - room for logs!)
Jon - The examples you showed are a specific case - that of a helper function in the application area that handles and recovers. Most of it would be acceptable, and I can see where you are coming from now with returning null. However, I still believe that swallowing the base exception like that and returning null to reflect not just “I’ve encountered an error, but don’t worry - I’ve dealt with it. You continue however is best for you now.” but that “Something has gone wrong. I don’t really know what, but I’m not going to bother to tell you.” can lead to problems and this is an example of where it should fail fast. It also limits the end user of the function to only being able to handle what you think they will want to handle. If any other odd ball exceptions are thrown (which are numerous and I will deal with this topic later) you run the risk of not letting them handle it when they legitimately might want to. If you really want to try and shelter the user of the function you can always catch the base exception type of the operation you are performing. In this case, you would catch and log an IOException, which would take care of any form of IO operation but still let others such as (as is my favourite example) the OutOfMemoryException.
Of course these criticisms really apply to reusable libraries (and if it’s not reusable, why is it in a library generally?) but they can be extended to functions in the programs themselves.
However, as previously said, I do now better understand what you mean by returning null and would agree with all except suppressing the base exception in the sort of cases used in your examples.
But consider if this was a library; you don’t want to, as I allude to previously (but the implications are more serious) simply hide what the actual problem is because you can’t fix it yourself. In this case, exceptions provide a great way of passing the problem up until somewhere, someone has the right knowledge of how to handle it appropriately and does so whilst also providing a lot of the information one might need to deal with it.
The problem comes in that as a programmer you have little idea what will be thrown with any given function because the code path is so complicated. This leads to people not catching things (at the higher level where it is appropriate to know what to do with it) because they don’t think that these problems could arise. A lot of the .NET libraries are actually fairly well documented with the exceptions that are likely to be thrown as a cause of the operation (rather than all sorts of ones such as OutOfMemoryException) but most programmers don’t know that this information exists. It’s in the documentation, but who really has to/wants to look at the documentation every time they use a function? Intellisense and the likes has drawn us away from that and we really need that sort of information - what exceptions are likely to be thrown - in the editor at our fingertips.
Take File.Open for instance. There are 9 different exceptions that could be thrown that you realistically will want to be able to handle, and if you use the pre-emptive check method (using File.Exists etc.) there are none that you should actually need to catch (aside from maybe the IOException). However, it does give you a list of things, a reminder if you will, of what checks you need to do.
If we had that sort of information in the editor I firmly believe programs would be written more robustly.
(I realise that you probably know and maybe even agree with a lot of what I just said. However, I think some people are too black and white or don’t consider that there are different practises for different areas and would apply a “return null on failure” at the lowest levels of a library where I would argue it’s inapropriate.)
I led a team about 15 years ago that built a Sprays Control System for an Alcan aluminum mill. The team was rookie at the software dev game; I was not (I was a rookie in the aluminum biz instead). We had code running on 3 different platforms, written in 3 different languages, connected with a 10 Mb/s network - debugging it was going to be tough. After many long debates, I convinced the team that Fail Fast was the way to go, despite the fact that most real-time factory systems do the other thing and try really hard to stay running. But because we squashed sooo many bugs during dev testing, the production system was on time and almost on budget. Worked like a champ first time out too, running 36 hours before a database table filled up due to a faulty sensor (quick easy to fix). But the victory was obvious to us, and now two of the team members who continued on with software careers are are believers in Fail Fast. And I remain one, of course.
Here’s a nice primer on troubleshooting data corruption.
http://valhenson.livejournal.com/9540.html
Gives you an idea of how hard that kind of troubleshooting really is, too…
A story
Our Win Forms app was a data entry app. Generally, users would enter 100s of records into the app, one after the other, before finishing the job. While keeping data in an in-memory dataset, it was also written to the database religiously. (Every record navigation assumed to save the data to the database.)
One update released a bug, that, in certain untested circumstances, data could/would not be saved to the database, but, would not alert the user that this happened.
Imagine the frustration when, after 800 records, they navigated back to find only 800 empty records.
I’m a bad programmer, and I can always do better.
Certainly logging is nice, but failing fast adds a certain level of immediacy to the bug fix. Every bug is an application crash. I spent a few years doing QA and any dev can explain away a bug that merely spits out an error message (oh just move that file over there…fix this registry setting etc). Once the app stops working though, you have a moral imperative to harp about it until its really fixed, or made extremely user friendly. There’s way too much of catch exception and log out there - after all it makes that red squiggly go away.
That said, once your customer becomes someone that can’t reach out and grab a dev, you have to have some outside handler that can take any error and make the failure as friendly as possible. Assertions aren’t quite ideal, but it gives you a little of both worlds when they can be turned off/replaced at runtime.
try
{
Application.Run();
}
catch { }
Problem solved.
Seriously, though, for all the .NET programmers here, the Exception framework gives you an InnerException property for a reason. Use it, and whenever your application crashes, make sure you dump (to a log, not to the user) all of the information for every exception in that hierarchy.
For some odd reason (laziness?), people don’t like to create custom exception classes. They should, though, because it’s by far the easiest way to give meaningful feedback to the user while still keeping all your debug info intact. When you’re writing the UI code, good luck trying to handle an InvalidOperationException and figure out whether or not it’s actually fatal. A NotLoggedInException, on the other hand, tells you exactly what you need to do (show the login form). Throwing a generic exception is the quickest way to shoot yourself and your users in the foot.
This:
try
{
Application.Run();
}
catch { }
is just a bandaid. Having an exception handler at this level of the program is merely a “catch all” for any exception that occurred deeper in the program. I am not sure how useful it would be to catch at this level since you are at the main entry point of the application (what recovery would one do?), but it would catch any exception that occurred deeper in the program. I think at this point, maybe just close out gracefully instead of JIT exception dialog.
I believe that exceptions should be handled as quickly as possible in the program and not allowed to “bubble up” through the different layers or call stack they are on. “Bubbling” exceptions are the work of bad design and programming.
Suppose I call this mock method in another class in the .Net framework:
void doSomething()
Now, this should return nothing (void right), but depending on the implemententation, it may throw an exception back at me. If it does, I’d better handle it.
Several class inside of .Net Framework have these cases which many times are not handled by implementors which lead to crashes.
For example:
if (file.exists(pathtofile))
{
openfile = file.Open(pathtofile)
}
This is better than just doing file.Open() as the Open can throw a fileNotExist exception. Doing these checks can avoid the exception and maybe this is what Mr. Farley was referring to in his previous post before checking any known conditions.
In any case, one should do this:
try
{
openfile = file.Open(pathtofile)
}
in case something unexpected DOES happen.
In this example, file.Open() may return a file stream or it may return an exception, even through it’s signature says it will return file stream. Perhaps this is a bit confusing because ANY method could return an exception back to you.
I prefer that exceptions are handled in the method they originate in and then send a return value rather than throwing an exception which must be handled. Then the caller can decide what to do with the return value but they don’t need to deal with any exceptions that originate from the method. You can generate your own exceptions too, but this leads to complex catch blocks
try
{
doSomething()
}
catch (Exceptiontype1) {}
catch (Exceptiontype2) {}
catch (Exceptiontype3) {}
catch (Exceptiontype4) {}
catch (Exceptiontype5) {}
catch {}
I never cared for this style very much although I have seen it on several occasions.
Also, final (no pun intended) advice: Finally can throw exceptions too, so be careful what goes in finally!
Hi Vinzent and ICR-
The case of the file dissappearring in the few nanoseconds between the exist check and the actual open is a extremely rare case. But it could if another process was trying to do something with that file, in that case, you would still need the try catch nad maybe have more issues with design if multiple processes were trying to access and delete the file at the same time.
Here’s a few examples of handling exceptions at the point of which they occur. With this, button1 and button2 events will not crash the program. Also, you can see the exception handling logic adds many more lines to the code as you have mentioned previously.
If my previous example was unclear, then hopefully these are better.
For handling different types of exceptions, your could modify the friendly message to see something more specific about the error, like could not find file, directory not found, etc. and store that value in a member variable or maybe a static class which stores the last error message. Catching the different types of exceptions will allow for this instead of one generic catch {}.
Let’s say I have a button on the a form and when I press it I want to open a file and process it. Here is code:
private void button1_Click(object sender, EventArgs e)
{
FileStream fs = opensomefile();
if (fs != null)
{
//Dosomething with file
//Close it after using it.
fs.Close();
}
else
{
//Alert User With Friendly Message
MessageBox.Show("Unable to open some file. Check log for details.");
//Do anything else here
}
}
/// summary
/// Opens some file and returns it back to the caller
/// /summary
/// returnsA filestream, can be null if file fails to open./returns
private FileStream opensomefile(string pathtofile)
{
FileStream fs = null;
// Open the stream
try
{
fs = File.Open(pathtofile, FileMode.Open);
}
catch (FileNotFoundException ffe)
{
//Do any specific FileNotFoundLog Logic and Log Error
}
catch (PathTooLongException ptle)
{
//Do any specific PathTooLong Logic and Log Error
}
catch (DirectoryNotFoundException dnfe)
{
//Do any specific DirectorNotFoundException Logic and Log Error
}
catch (Exception ex)
{
//Log any generic error
}
return fs;
}
Another example, lets say I have a button that going to do something specific to the file.
private void button2_Click(object sender, EventArgs e)
{
if (!dosomethingwithfile("PathToFile"))
{
//Alert User With Friendly Message
MessageBox.Show("Unable to dosomething to file. Check log for details.");
//Do anything else here
}
else
{
//Successfully processed file do any additional work here.
}
}
private bool dosomethingwithfile(string pathtofile)
{
bool success = false;
// Open the stream and do something
try
{
using (FileStream fs = File.Open(pathtofile, FileMode.Open))
{
//Do Something with FileStream
success = true;
}
}
catch (FileNotFoundException ffe)
{
//Do any specific FileNotFoundLog Logic and Log Error
}
catch (PathTooLongException ptle)
{
//Do any specific PathTooLong Logic and Log Error
}
catch (DirectoryNotFoundException dnfe)
{
//Do any specific DirectorNotFoundException Logic and Log Error
}
catch (Exception ex)
{
//Log any generic error
}
return success;
}
Both of these examples show non-bubbling of exceptions. Although trivial, I think they are valid.
You could also let exceptions bubble up if doing a multistep process, but they would not escape to other parts of program:
private bool dosomething()
{
bool success = false;
try
{
//dopart1, dopart2, dopart3, dopart4 may throw exceptions which
//will need to be caught because other callers may not handle them
//properly.
dopart1();
dopart2();
dopart3();
dopart4();
success = true;
}
catch (Exception ex)
{
//Log Error
}
return success;
}
This way you ensure that other parts of program are not responsible for any exceptions that occur in dosomething. They get a return value telling of success of failureand then may act accordingly.