Revisiting the XML Angle Bracket Tax

I once worked on a project which was a questionnaire program. The answers were stored in a comma-separated text file, which was then imported into MS Word and used in a mail merge to create a 200-page document. Over time, new questions were added. The answers were stored anywhere that had room in the CSV file to prevent breaking the mail merge macro. This caused a lot of problems in the long run because the questions and answers were out of order. It was very difficult to diagnose errors when they occurred.

One day I suggested we change the data format to XML. This would allow us to reorganize the questions and answers without breaking anything because the names of the nodes would stay the same forever. It would also make the data file more human-readable to diagnose entry problems. Though I don’t know for certain because I haven’t tried, I would suspect that parsing fruit=orange would be more difficult than fruitorange/fruit. Particularly if the word fruit was given as a value somewhere else in the file. (eg. food=fruit)

Hmmm,

this is also XML and might be easier on the eyes :

doc
fruit is=‘pear’ /
vegetable is=‘carrot’ /
topping is=‘wax’ /
/doc

This tends to drive XML believers nuts for some reason :wink:

T.

I think we all have to be honest with ourselves, and realize that most of our time is spent fixing things when they don’t work.

The data format (XML) doesn’t matter when things work. It’s when your trasmission fails, and you have to go trudging through the raw data to find the error that it really matters.

I find them all difficult to read in different circumstances. JSON and YAML when represented without line spaces (say, in your ajax debugger) are nearly impossible to read, and XML does ok. Custom formats actually do better! (Say with, something crazy, meant for joining text together—like pipes?)

However, a raw dump of a lot of data in YAML is easy to read, not so much in XML.

JSON excels when you have to parse it. It’s already in your array! Just use it. Beautiful. It works like that on both ends.

Anyway, I think what we have here, is XML is the right concept, wrong implemention. Theoretically speaking, XML is awesome. It’s standardized, easy to learn. Practically speaking, it’s a beast. Those angle brackets are terrible. Terrible things! Use something else!

I don’t think Yaml is the solution, but I think it’s a step in the right direction.

I think to come up with a solution, we need to interface with someone who specialized in how to make text easy to read, and that would be somebody who is NOT a programmer. Come on great universities of the world, THINK!

This just in, Jeff STILL hates XML, apple pie, America, and your mom.

Oh wait…that’s not what he said. Disregard. Or maybe its the smackdown learning model :wink:

It all comes down to convention for me. Once the first person writes some configurable project data in xml, that’s it, you’re locked in. Or you can maintain 15 types of text files on a single project. I think it’s the same for most team choices, the project was started in java, so you can either continue writing in java to keep the maintenance down, or you can buck the trend and write in VB. And you’re all-star enough to make that happen, because its what you carefully analyzed, considered, and decided was best for your situation. Then poor Joe that has to add one friggin field to a report is cursing you to high heavens because he has to load java in his brain for the server side and vb for the client. Similarly, I don’t want to look at 5 kinds of markup for different aspects of the app. Don’t put it all in one monster file by any means, but please don’t roll out a different superior solution because of a braindead choice we made early on. Especially for text, it’s not worth it.

Jeff, you really should change the (no HTML) remark by the comment window to read (no HTML, but please remember to encode as amp;lt;, as amp;gt; and amp; as amp;amp; or the blog will eat most of your post)

@Konijn: yep, see what I said above about bad XML being easy.
In my experience evil XML like that is all too common.

@Graham Stewart

What? You mean you don’t read the entire page as raw source code?

I thought XML was easy to read!

Just having fun. :wink:

There is a very real mental cost to parsing even a few short lines of XML.

I would suggest that the mental cost of parsing proprietary data files with no markup at all is much worse.

How many obscure configuration files did you scratch your head at in the pre-XML world? It usually went something like this for me:

“Hmm, how is this data laid out? What the heck does a colon mean as opposed to two periods?? Ok, I think I’ve got it. Now, applying this potential layout I think I understand, where might the data I’m actually looking for be? No, I think I misunderstood the two periods afterall. Wait, there’s the data!”

XML gives readers a hint about the format, because at least you know the UNIVERSE of organization you’re working in.

I’m not arguing that XML solves all problems, but I don’t buy into the “XML is hard to read” camp. If anything, it makes it easier simply because more folks are familiar with it and it’s documented.

I’d personally like to see you provide a comparison of a web.config in XML and YAML side by side.

I agree that there is too much pain in dealing with that file, but I completely disagree that this is a result of it being in XML. The fact that half of the file is boilerplate that I never need to touch does it all on its own, and almost all of it is poorly structured does it all on its own.

@Andrew:
They AREN’T using ASP.NET WebForms, they ARE using ASP.NET MVC. So it certainly does look like alternatives were considered.

@Mike:
You’re actually going to use the summer to learn more about what XML can do for you? Might I suggest the other 59 days be spent up learning o/r mapping, or python or lua

Funny, if XML’s style and visual parsing ‘expense’ is a matter of taste, then I wonder why programmers seem to be quite happy writing code like:

int j = 1;

rather than
variable name="j"1/variable

or
variable name=“j” value=“1”/variable

or
variablejvalue1/value/variable

Tools and technologies should be used when they provide a net benefit, and not solely because they are a ‘standard’ or ‘fashionable’ (unless of course, standards and fashion are the sole criteria on which you measure gain.)

XML is useful. So are other formats. Use what makes sense for the particular need…

Thats the bottom line. Get used to it. If you’re having a problem
reading xml, learn how to do it better.

I agree totally. People were created to make things easier for computers, not the other way around.

Heck, lets bag XML and move back to using binaries! Those can be read by a human just fine with a hex editor, a calculator, and enough time. It may be a PITA at first, but you’ll get better at it. Quit being a bunch of lazy whiners.

“I try not to get emotionally involved with the tools and technologies that I use, if I can avoid it.”

Umm… go back and re-read like almost all of your blog posts that relate to .NET, Microsoft, or Windows Vista and see if you still feel the same way.

Sorry, but you are incredibly emotionally involved with the tools you use.

“As a Visual Studio ecosystem programmer, XML is pervasive, in every nook and cranny of a project.”

If you hate bracket syntax enough to post twice about it, and you want to encourage use of other tools, then maybe it’s time to try something else?

That brings up the question: What exactly is a standard? I always

With YAML, it’s… the yaml-core mailing list. The copyright for
the specification is held by three individuals.

That’s roughly how all the RFC’s are done, and the internet is pretty much built on those standards.

?xml version=1.0 encoding=UTF-8?
post
nameNick/name
website/website
captchaorange/captcha
message
![CDATA[
I agree.
]]
/message
/post

How many obscure configuration files did you scratch your head at in the pre-XML world?

If the config files in /etc were all in XML, I’d go on a killing spree. Thank god most of 'em aren’t.

XML is just another example of creeping verbosity being palmed off as “better.”

In some cases, XML is demonstrably better. If you have to represent a tree structure with text, I really do prefer XML.

.Net, javascript, et. al, all have the same problem. Consider the number of lines and characters necessary to write the code to read in a file. I’ve lost track of the number of languages in which I’ve had to rewrite that darned “ReadFileIntoString(strFilePathName)” function. Why do this? So I don’t have to remember the human factors nightmares that most programming languages impose for consistency.

More, whether it’s XML or dot notation isn’t always better. Simple things should be simple. Complex things should be possible.

Whoops! Jeff already talked about XML comments a few years ago:

http://www.codinghorror.com/blog/archives/000130.html

Personally, I don’t think he was nearly harsh enough. I think the angle-bracket tax makes even the most clear and concise comments difficult to read and extremely tedious to maintain. And then there’s the fact that you can’t write things like “0 = x limit”, but instead have to write “0 amp;lt;= x amp;lt; limit”, or the strange-looking “limit x = 0”.

And what would be appropriate “romantic overtures toward their significant other?”