Twitter: How Not To Crash Responsibly

In yesterday's post on Crashing Responsibly, I outlined a few ways to improve your application's crash behavior. In the event that your application crashes -- and oh, it will -- why not turn that crash into something that:


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2008/05/twitter-how-not-to-crash-responsibly.html

Could you please not mention twitter so often? I think it’s the most annoying thing since crying children =)

You do realize downforeveryoneorjustme is … well … down, right?

It seems to me that time is ripe for a diaper change at Twitter.

What, no Frankenstein on the error page?

Users should think happy thoughts until twitter-team reboots or gets back on track with rails (sic!)

What are your thoughts on felonious silence from Google during the 7+ weeks that Google groups was down?

One website I visited had a very simple wiki page for its error page. This wiki was hosted on a different server with a different domain name. When the primary site was down people could leave notes about what was working, what wasn’t, if there were any known work-arounds and of course developers could add quick updates. I think the most important thing was that it gave the community a chance to hang out and chat rather than be completely severed by the programmers mistakes.

Perhaps the error page reflects the attitude of the company running the web service. You can literally imagine how that company works: In an event of a server disaster, the first thing is probably the server guys reporting to the head, while attempting to fix it “right away”. And as it usually turns out, the fix takes longer than the technicians think.

Unless the company is one cohesive team, the technician is usually afraid to give time estimates, as it will either be a wrong prediction or a conversative prediction, which leads to loss of confidence in him either way.

On the other hand, the attitude of “not claiming responsibility yet slightly blaming the user” is very Apple-ish. The whole team in the company might really think that the users are at fault. And in this case, even if the web manager is eager to update the status of the crisis, he might not be permitted to do so.

Nevertheless, I agree that they should be more transparent about the errors, especially when users are starting to lose confidence. During good times perhaps they can just smug and brush the error behind, but now it’s a bit ironic that the Twitter guys don’t use Twitter themselves to report on their status.

By the way, Twitter loses a lot of money with SMS. They have yet to find a way to earn money. In the times of a recession, services that are yet to be monetized will probably go first.

I think the most important thing was that it gave the community a chance to hang out and chat rather than be completely severed by the programmers mistakes.

That’s awesome – exactly the kind of thing I’m talking about.

I don’t frequent Digg, but I happened to visit there recently and they were down for maintenance. Rather than present a generic “we’re down for maintenance, kthxbye” page, they provided a giant list of “recommended links” from each Digg employee.

I can tell you that the only thing that drives me more nuts then when a server goes down or an important site goes down, is when that site goes down with no explanation of the fall out.

It happened to me yesterday with my server provider…I actually had to call them up before I found out they were scheduled for maintenance (at 10 am no less!). Drives me nuts!

Psychologically speaking, it can be better to make a user think it’s not just their problem.

When people feel they’re just one of many being affected, it lowers their individual expectations and they just wait it out. Most people will automatically think they’re not so special, and if it’s affecting everyone, then surely the company is on the case.

If I know it’s just me that’s having issues, I want resolution and I become much more vocal and chatty, because nobody’s going to fix it if I just sit there.

However, if you’re doing it all the time, your users will start to cotton on. BT used to do this, citing ‘your entire exchange is affected and an engineer has been assigned to the issue as we speak’. It was all bollocks though. If you have problems often, and they use this excuse, it starts to backfire and you think their infrastructure is crap. In my case it turns out it was a faulty BT telephone in my flat, but because of all their BS excuses like ‘your entire neighborhood is affected’ and ‘a problem at the exchange’ I never thought to check. Aw shucks.

A downtime page isn’t an error page. It’d be nice if this was informative, but typically the site’s down. How do you make it informative? Well you write code, but if the site is down, then what? Hit the database for the info, but that’s down. In change-controlled environments, you can’t exactly just go modifying files because change control managers start to turn purple. There’s a reason 404 and 500 pages are static html with inline styles, and they’re all surprisingly generic.

The alternate status website is a good idea.

A generic error page has the same issues as a downtime page. Sometimes you don’t know why you got there, and the worst thing an error page can do is error out. Again, they’re typically static html with a ‘very sorry’ message because they assume if a user’s reading it, something very unexpected has happened.

No user needs to see voluminous crappy stack traces. Security-wise you want to hide the information, as a l33t haxx0r can use that information to subvert the site. You can redirect to the generic error page in a number of circumstances, even when there’s no real error (eg detecting a user is manually altering query strings).

Is this the Twitter Junkies Hell?

When Twitter is down we all gather here to Bash the Shit out of it! LOL

I’ve always thought they could be a little more specific in their error messages. I appreciate Digg.com’s error/down messages because they generally list a reason and I find their links to other things to do endearing.

All the talk about Twitter’s downtime is too much. I’m sure they’ll get scaling down fine and in the meantime maybe everyone can do something else. It’s like creating a new subdivision that gets super popular super fast. The plumbing and electric weren’t up to meet the demand and it’s a pain for everyone as it’s built out. People should stop complaining…especially considering it’s FREE!

The only thing I can see wrong with Twitter’s error page is that it doesn’t really admit there is an error and doesn’t apologise for it.

Their ‘technically wrong’ stance sounds more like a denial of the fact that they can’t run their site properly. And the picture doesn’t help either.

As the application is running on their server, they can log what the problem is and whatever triggered it. Why should they burden me with those dirty details when they are the only ones who can fix them anyway?

I don’t think that an error message or code would help me in any way. Perhaps it’d give the hardcore geeks some peace of mind, but for the remaining 98% of the population it’d just be gibberish.

But at least there’s a pretty birdy on it. It could be worse.

Leave them alone.

I have to ask, what is with that image on the crash page?

And do you think they might be so overwhelemed with so many reports that they just “leave that for later”?

It is a shame that many people seem to have missed Jeff’s point. Jeff isn’t saying that the user should be shown what went wrong (he assumes this is done behind the scences as it should be). He is arguing that the user should be given more information that is useful. Like how long to wait before coming back.

Many responses here come from an inside out development viewpoint, with people thinking about what the developer needs, rather than what the user needs. Yes it might be hard to provide the reassurance and information a user needs, but it is vital to do so.

Also consider that Jeff was using a recent Twitter crash? to illistrate his point.

And yes twitter is free, and yes you shouldn’t expect too much, but Twitter could never charge for a service when they have an availability issue. Just because you are offering something for free doesn’t mean that you don’t strive to be better. If Firefox or Apache didn’t “strive” to be better they wouldn’t be as widely used today.

Alasdair

The thing is, this downtime was scheduled wasnt it?! I did get a flash of a warning on the web saying that it would be down for an hour or so for maintenence…

Put simply, if they had a page up saying “down for maintence” wouldnt that be a whole lot more positive than “technically wrong”?

Even better, rantFINALLY FIX THE DAMN PROBLEM ITS BEEN DOWN ENOUGH!/rant

As you said above

What is the error (even vaguely)
It is just my error (or everyone)
Is is just temporary (should I just reload/try again)
How long will it last (even vaguely)

This error message answers none of these … so is bad