HTML Validation: Does It Matter?

The web is, to put it charitably, a rather forgiving place. You can feed web browsers almost any sort of HTML markup or JavaScript code and they'll gamely try to make sense of what you've provided, and render it the best they can. In comparison, most programming languages are almost cruelly unforgiving. If there's a single character out of place, your program probably won't compile, much less run. This makes the HTML + JavaScript environment a rather unique -- and often frustrating -- software development platform.


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2009/03/html-validation-does-it-matter.html

As a web user, it doesn’t matter to me if it doesn’t validate. I wouldn’t know unless I go through the trouble of passing it through a validator. Why would I spend time doing this.
What matters is the site should be functional and easy to use. like stackoverflow. I don’t care if SO has 200 errors.

However it probably matters to people with non GUI browsers…

Jeff, what has always intrigued me is that programmers are able to create valid RSS and ATOM feeds for their websites, but come up with every reason under the sun why they can’t create valid XHTML for their websites.

XHTML is just XML with a couple extra rules about what elements can go inside what other elements and what attributes are allowed - as you’ve noted above. No biggie. Coding Horror’s RSS Feed validates. StackOverflow’s RSS feed validates. CNN’s feed, etc. etc. etc. What’s the problem? My thoughts:

http://iamacamera.org/default.aspx?section=developid=73

Brilliant!

Your problem is that… you’re doing it wrong.

First - target=… is BAD! If I want it in a new window, I’ll do it myself. Don’t try to force anything on me, thankyouverymuch.

Second - You’re writing a html page. Why don’t you write it correctly from the start?
Comparing to programming - do you write the whole program and then add patches until it kind-of works, multiple bugs cancel each other out and memory leaks aren’t critical? Or do you write small parts and test them so that the whole program is correct?
You don’t have to cleanup the mess if you don’t make it in the first place.

Jeff, since you are the self-proclaimed ShamWow (attributed) guy of Coding Correctness, it must sting to see FAIL next to that validator report.

Now you have to re-python the whole thing or the Scrummy world will implode.

I may not be the most precocious web developer out there, but I really found the process of converting my own CMS (if you can call it that) to output strict XHTML valid CSS2 pretty satisfying. The real pain in the neck for me is filtering the output of the RSS feeds aggregated on my site so that they, too, are XHTML-strict.

Making your site XHTML-strict is good for you. Think of it like… flossing.

It is not too much to ask that all browser render valid html the same way: according to spec.
But it is too much to ask that all browsers make the same guess how you want your invalid html to be interpreted.

Only when we write valid html can we expect html to be cross-browser compatible.

I am amazed that those who are so opposed to the target attribute have not found the simple solution:

Keep the tag in the spec, but let the user agent ignore it if that is what the user wants.

That way, people could live in the dark ages if they want, but those of us who understand tabs and windows can benefit from a web author’s suggestion that those features would be useful when following a particular link.

The problem with the standards is that they keep changing.

So any attempt to standardize right now may be wasted effort when whatever is done is then undone.

I understand that is the idea of versioning, but to spend 2000 manhours standardizing your HTML only to find out it’s no longer the latest-and-greatest when that was your selling point for the project might cause you to lose your job.

Not a wise risk to take.

Of course, for those of you who work for a company where you can waste 2000 hours for no reason, have fun!

I would add that following standards on new content is a good idea. But for refactoring old: You have to consider the cost/benefit ratio and make an informed decision.

That goes for all of these QA points. Seriously. The solution is not ALWAYS refactor nor is it NEVER refactor. (replace refactor with standardize for this specific blog post) The solution is to do so when the benefit outweighs the cost, abstain when the benefit is less than the cost, or when analyzing the cost is more expensive than the benefit.

@Practicality - the latest and greatest standards are ten years old, plenty of time to learn them properly so that one does not need to spend 2000 hours making something valid.
Making code validate is VERY easy. The important thing is to remember, that valid tag soup is still a tag soup.

if you have a ton of user-generated content like we do, you can pretty much throw any fantasies of 100% perfect validation right out the window.

I think that’s true in general, but a href=http://validator.w3.org/check?uri=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FFixing_Broken_Windowscharset=%28detect+automatically%29doctype=Inlinegroup=0Wikipedia validates/a, and it’s emall/em user generated content. (Not necessarily a counter point, but it’s interesting.)

I’ve been working on a specialized a href=http://simple.wikipedia.org/wiki/Main_Pagesimple wikipedia/a editor; and Wikipedia’s use of XHTML over HTML makes it more straightforward to pull information, like edit-tokens, out of their webpages. So I think XHTML has it’s advantages for letting people build things that interface with your webpage in ways you wouldn’t expect.

if you have a ton of user-generated content like we do, you can pretty much throw any fantasies of 100% perfect validation right out the window.

I think that’s true in general, but Wikipedia validates ( http://validator.w3.org/check?uri=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FFixing_Broken_Windowscharset=%28detect+automatically%29doctype=Inlinegroup=0 ), and it’s all user generated content. (Not necessarily a counter point, but it’s interesting.)

I’ve been working on a specialized editor for http://simple.wikipedia.org/ ; and Wikipedia’s use of XHTML over HTML makes it more straightforward to pull information, like edit-tokens, out of their webpages. So I think XHTML has it’s advantages for letting people build things that interface with your webpage in ways you wouldn’t expect.

Brilliant! ( posting THIS article immediately after BikeShedding :slight_smile:

BUT why not make things better? I’ll wager that if browsers only displayed valid (X)HTML everyone would create valid (X)HTML. So I’m soooo glad XHTML Strict are really anal about validation!

This is technically accurate, in that obviously, everyone would only consist of those people who could still write web-pages. So it’d be a smaller everyone by a factor of hundreds of thousands.
However, it misses the point that the web would be limited to providing for the kind of people who obsess over XHTML. The true power of the web would still be slowly rendering Captain Janeway pr0n, and arguing over whether bang is a silly word to use for the ! character.

target, a perfectly harmless attribute for links that you want to open in a different browser tab/window

Ahem. As the user, I am the one should dictate where the links open. If I want it to open in a new window, then it shall. You need to ditch the target attribute. Now.

If you want Strict - then stick with strict, and don’t use a target attribute on your link.

If you want target, use the Frameset DTD. HTML 4 has 3 fully legitimate DTDs you can pick-and-choose from, and just like your favourite fundie, whatever you pick-and-choose is the right one.

Why did you pick Strict for Stackoverflow anyway?

You should always leave 1 mistake in your HTML to retain your humility.

I have to say that HTML validation (especially using the 4.01 Transitional) is not difficult to attain. Pages that don’t validate are either due to a misunderstanding of the basics of HTML, either to pure laziness.

Let’s take for example out beloved codinghorror.com:

6x document type does not allow element LINK here : That comes from a misunderstanding of the way HTML works. Tags such as link and a handful others (br, area, link, img, param, hr, input, col, base, meta) don’t need to be closed. Actually it is wrong to explicitly close them because they are implicitly closed, closing them manually would be like closing them twice.

12x+ end tag for element INPUT which is not open: Same error as above: misunderstanding of the basics of HTML

1x: end tag for element TD which is not open: Laziness… There is no table anywhere close to that TD…

and 30 more warnings about not encoding to amp; in urls as it should be. : Misunderstanding of HTML basics. is not a regular character in HTML, it is used to reference entities that are declared in the DTDs that apply to the current document in the form entityName; The HTML 4.01 Transitional declares a crapload of entities. Entities are NOT limited to the ones declared in the HTML 4.01 DTD as you could supposedly attach more DTDs to your document.

Hence using ‘’ verbatim in your HTML is similar to using an unescaped in your C#/Java/whatever source code, you should escape it with amp; .

Seriously, how can we even expect browsers to be somewhat standard compliant if we keep feeding them that kind of crap.