It's a Malformed World

Bill de hra recently highlighted a little experiment Ian Hickson ran in August:


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2006/11/its-a-malformed-world.html

By the way, Google’s homepage is up to 79 errors now. Don’t ask me. I did notice that the URL changed to www.google.co.jp. Hmm. That seems odd to me, but it must be a DNS/load balance thing. The validator said 49, but the list counted out to 79. Why didn’t the validator see the other 30 errors? Is this an XHTML/XML/CSS conspiracy?

Oh, well.

I firmly believe in well-formedness, and get FRUSTRATED when VS 2003 reformats my markup to its ingrained Frontpage/Visual Interdev derived crap. At least you can turn off reformatting (for the most part) in VS 2005. I’ve still run into some autocomplete issues in VS 2005. One thing I cannot do without is Intellisense and autocomplete.

Again - Oh, well.

At least it doesn’t mess with my code-behind too much.

Ok, I didn’t account for info and wrning messages. My mistake. It’s 49.

Keep in mind that the HTML validator at w3 is itself broken. I find it reporting perfectly valid HTML as broken because it can’t tell the difference between tags in script from tags in HTML, and so on. Not sure I believe this statistic until the validator itself is valid.

Tim B-L knows the validator is broken, too. :slight_smile:

“The Reason why XHTML is so error proned is the Tools have never been standards based and do not do things with the standard in mind.”

This is true, but it’s not the only reason. The entire publication chain has to be XHTML valid-aware. Also to get valid XHTML means pushing back hard on authors at all points in their editing flow, or using highly constrained interfaces - my experience is that authors will put up with neither. And we haven’t even talked about accessible markup, or encoding. This is not a simple problem.

To try to keep this brief, I’ll just rephrase the question slightly: “The compiler doesn’t care if you use spaghetti logic, the customer doesn’t care if you use spaghetti logic, so why should you?”

Or: “The compiler doesn’t care if it emits 5000 warnings during build, the customer doesn’t care about warnings emitted during build, so why should you?”

I believe these are equivalent because warnings and spaghetti logic do not preclude the app from working properly.

Writing standards-compliant HTML makes that code more predictable in layout and organization. For example, deleting a div tag will not delete an opening tag and leave the ending tag, because tags must be cleanly nested in standards-compliant HTML. The browser may not care about the orphaned tag, but when something unrelated breaks on that page, that tag is going to be a red herring that wastes my time.

In addition, properly nesting the tags allows me to use a tool such as tidy to quickly bring the layout (i.e. indenting) of the file into agreement with it’s logical organization. Which, as the saying goes, is a very good thing.

The major browsers all have bifurcated rendering pipelines: ‘quirks mode’ and ‘standards mode’. Invalid markup get rendered in quirks mode, and all of the major browsers handle that slightly differently – which can lead to discrepancies in the most unexpected places. Standards mode is much more consistent across browsers. So, assuming your goal is a web app that behaves and looks consistent across the major browsers, standards compliant HTML will move the starting line closer to the finish line.

Standards compliant HTML is a lot like a good keyboard – you can’t tell the developer used it by looking at the finished product, but it made the developer’s life a lot easier nonetheless.

For the same reason all of my C programs will “return EXIT_SUCCESS;” at least somewhere, when “return 0;” or using “void main” (instead of “int main”) will work just as well.

Using XHTML Strict is equivilent to the --pedantic-errors gcc flag, and forces people to typecast tyings to their typedef, such as time_t; where on most systems just using a long will work.

If the standard is specific enough, and if my code conforms to it, and the compiler conforms to it, then any future compilers advertising compliance should give me a binary that functions the same.

The Reason why XHTML is so error proned is the Tools have never been standards based and do not do things with the standard in mind. Adobe Premere uses Font tags. Frontpage (all I need to say) Dreamweaver works ok but leaves a lot to be desired when working with scripting languages like ASP.NET and PHP

Microsoft Expressions is the first I have ever seen that cooperates with all DOCTYPE declarations. The CSS tools are omg good.

But browsers do care about compliant markup, yes? Don’t they switch to a faster standards-compliant mode if a valid doctype and valid code is detected?

Personally, I care because I want forward compliance and cleaner markup, and because if you make your code W3C valid, then you’re already a long ways towards having the most accessible code you possibly can.

I suppose accessibility can be achieved without valid markup, but I’ll bet it’s a lot harder that way.

Your browser doesn’t care if your HTML is well-formed. Your users don’t care if your HTML is well-formed. So why should you?

Well formed looks better with syntax coloring in the IDE- so, my IDE cares! I would go ahead and say the semantic content should be well formed, but the presentation content is always going to be hacked to the idiosyncracies of the containers.

And as for Google- it looks like a lot of their validation errors are calculated to save on bandwidth.

that’s one of the biggest challenges for browsers, and that’s the same reason Internet Explorer got so popular, because it allowed a lot of malformed html to show up nicely

Maintenance, maintenance, maintenance.

If all pages were well-formed, you’d need to write a lot less code to make a web browser.

BTW yes my domain does end with dot info and no, I’m not a spammer.

Just checked my old site (about 9 years old).

About 7 years ago it passed validation with flying colors, no warnings, no errors (was proud about it, because I worked for it). Was also looking good in IE, Netscape and others, on Windows, Mac, Linux (still does).

Now the validator complains about doctype, frames, and what not.
One should really redesign a site every 3 years just because someone did some improvements somewhere, and had a feeling that frames are illegal, and doctype is mandatory?

A doctype should be mandatory. Just put in HTML4 with frameset, that’ll allow your crufty old frame code to pass with flying colors.

You’ll be proud to know that coding horror falls into the 93% block of websites.

3 errors for the homepage:
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.codinghorror.com%2Fcharset=%28detect+automatically%29doctype=Inline

16 (as of now) for this page:
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.codinghorror.com%2Fblog%2Farchives%2F000723.htmlcharset=%28detect+automatically%29doctype=Inline

David

BTW yes my domain does end with dot info and no, I’m not a spammer.

You really ought to move out of that bad neighborhood. Gentrification could take decades.

http://chris.pirillo.com/2006/08/17/info-domains-are-dead/

As the tools improve, the standards compliance will follow.

I disagree. The tools are irrelevant (except to developers); it’s the renderers that matter. But if well-formedness makes your life easier as a developer, then that’s a valid reason to go that route. It reminds me of the static vs. dynamic typing argument, really. Which one is correct? Both, depending on what you’re doing.

Hmm, yeah I’d like to be compliant, and things like in the new VS2005 where at least it tells you what you shouldn’t do it (even if it can’t always tell you what you should do instead).

However, part of my problem is that, for one reason or another, IE6 renders incorrectly when I correct old mistakes and make it xhtml compliant, sigh.

Yeah IE6 is horrible, but we all don’t have to luxury of picking and choosing what our users use, and I don’t get paid for rewriting an entire website that works ‘just fine’ already.

“Your browser doesn’t care if your HTML is well-formed. Your users don’t care if your HTML is well-formed. So why should you?”

I don’t