Creating User Friendly 404 Pages

If you’re using WordPress, you can also parse semantic URLs and create search results with your 404 page. I like this approach. Here’s how I did it:
http://www.douglaskarr.com/2006/12/23/wordpress-page-not-found-try-these-links/

I always wanted to have a cool 404 page and finally put one up on my site.

Ok, it’s not that useful but some people find it fun amusing and it’s hit on purpose quite often. Eventually I will come up with an idea for a ‘prize’ to award the user for winning.

http://www.appsapps.info/this/page/doesnt/exist.html

  • Don’t drop the “404 Error”. It should be prominent in the page text near the top so that users understand they have reached an error page and can take appropriate action. Most users that spend any time on the web at least understand this error message.

  • Don’t redirect the user unless you know for sure what page they were trying to reach (i.e. casing issue, htm vs. html, site reorganization, etc.) The user needs to be able to fix the URL.

  • By all means offer the additional information if you want, but the page should not look like the rest of the site… it should look like an error page so that the user or linking website will fix their mistake!

  • Offer links to Google Cache and the Wayback machine in case the page has actually been moved or removed.

Well, me problem is this, lads: I don’t host my own domains (the wife doesn’t want that much computer hardware in the house), so how am I supposed to provide custom 404’s?

It’s great to have a helpful error message, but I think we really need to examine the root of the problem: URLS are being used to pages that don’t exist. There are only two reasons why this can happen:

  1. Typos
    1a. User made a typo entering the URL manually
    1b. Someone made a typo when writing an a href=… element in an HTML page.
  2. You used to have a page at that URL but deleted or moved it.

The only way to address 1a. is to provide some helpful suggestions to the user on fixing his typo or finding the page he really meant.

The only way to address 1b. is to fundamentally change HTML and HTTP, or for every web author to use tools that automatically do their links for them.

But, 2. is unexusable. Once a URL exists, it should Never Ever go away, except for a good reason-- and then, ideally all references to it would also be disabled; or the URL would still resolve, it would just go to a page explaining why the old page is gone, or redirects automatically (see Jeff’s previous entries about URL rewriting for the best way to do this).

Remember, a URL does not point to a “file” on your webserver- it is an identifier for a document that will be sent to the user’s browser. That those documents happen to be in files that are in directories, and might have some kind of filename extension like “.html” or “.htm” are implementation details that should be hidden.

"Is there an IIS or Apache add-on that goes through server logs and reports 404s (and/or 500s)? "

Yup. “grep” :slight_smile:

And any kind of web logs analysis software really ought to have it too. This includes Google Webmaster tools and/or Google analytics, I think.

w00t for 404 :x

Whatever you do, don’t drop the HTTP 404 header response code!. Even if you don’t want your users to see it (with which I disagree to, although not as strongly), please let the poor robots notice that this page is an error, so that they can remove the invalid URL from their indices and such. The so-called “soft 404s” are the bane of the HTTP ecosystem.

I particularly enjoy the page not found customization at
http://www.campluther.org/testhis

Not sure if it’s been linked yet, but another good collection of 404s:

http://www.plinko.net/404/area404.asp

The 404 page at www.homestarrunner.com, mentioned above, is one of my favorites.

Maybe not user friendly in the typical sense, but it’s got several things going for it. It’s obvious, it explains the problem, it’s not overly technical, and while not aesthetically pleasing in the traditional sense the whole thing is one big awesome fan-service.

As for getting it indexed because it returns 200 OK… I’m sure the Brothers Chaps aren’t too broken up about it. It’s not just a “page not found” error message. It’s another piece of comedy and fan-service they’ve authored.

I hate to say it, but I love it when a website just sends you to the main .com instead of even showing the error, 99% of the time you’re one or two clicks away from what you meant to type the first time.

I may have stated my position a little too unequivocally. It’s fair to have the error code in the message somewhere, as long as it’s not the focus of the message.

The IE7 “friendly” error page is a good example of this-- see the little HTTP 404 in the upper right-hand corner of the screenshot?

Don’t forget that it’s better if your 404-error page actually returns the 404-error code in the HTTP header, might be useful (for Google Analytics and other stats services for instance). If you want to check this, you can use this site: http://www.rexswain.com/httpview.html among others.
Great article by the way!

@Reed: You forgot the third reason for 404s. Intentional exploration to find hidden pages (URL-hacking). For these, you definitely don’t want to help the user along and perform expensive searches.

Quite funny…

http://www.mrcranky.com/movies/404.html

I cant believe no-one mentioned the language issues yet

404 works in any language, even if someones reading the page through google translate

i found this 404 today and bookmarked it just because it makes me laugh

http://www.xpresit.net/mario.zip

Tarkeel wrote:
"@Reed: You forgot the third reason for 404s. Intentional exploration to find hidden pages (URL-hacking). For these, you definitely don’t want to help the user along and perform expensive searches."

Yes, thank you, I’d forgotten about that. In effect, though, it ends up being the same as 1. I wasn’t proposing doing an automatic search for every nonexistent URL, but it could be an option on the custom 404 page.

Spidering, though, is another issue. We can detect antisocial spidering by putting a script in robots.txt, and then blacklisting any sites that access it regardless.

Anyway, the primary problem remains that we don’t use good URLs and it’s too easy to break URLs already in use simply by renaming or moving files on the webserver or switching to a new CMS system.

Babar K. Zafar is the kinda of guy who buys a new dictionary, and the first thing he does is look up the word dictionary.