Revisiting the XML Angle Bracket Tax

The free Liquid XML Studio is great for working with XML. It will even generate some C# sample code for you. On the other hand, I recently wrote some code to parse Google’s gdata XML in PHP and that was ridiculously painful.

Yes, I totally agree with you and I am facing this problem since the EPA decided that all data submissions would be in XML form. Even point-to-point inside business ones.

The operative words here are:
NO ADDED VALUE

Couldn’t agree more, what is so hard about picking the right tool for the job in IT? Sometimes XML is the best choice, sometimes it isn’t. You wouldn’t (probably) use a screw driver to hammer in a nail, why do that when you code? ‘Religion’ in IT is ridiculous…

Do programmers really like any markup languages? I don’t. The tags annoy me. The syntax looks ugly and crowded to me.

I really don’t get why you are focusing on the mental cost of processing XML. It’s not meant to be human-readable. You are assuming that this is the PRIMARY purpose for the structure of XML. It is not. Interoperability is the primary reason for the structure, human readability is further down the line.

“I think the angle-bracket tax makes even the most clear and concise comments difficult to read and extremely tedious to maintain.”

If you need comments in a file format that is mainly designed to be read and written by machines, you’re doing something wrong. You’re not one of those stupid people who decided it would be a good idea to replace good old .ini-files with some XML-counterparts?

BTW, has anyone shot the Ant-Developer yet? Makefiles as XML. Now that was a real stupid idea.

maybe if you kick their dog they will forget about the whole xml post?

Verbose or not, I’ll take XML over comma delimitted files and fixed length structure anyday.

There may be instances where you can get by with either, but I hate having to account for encoding of special characters in my strings to serialize.

I don’t find XML that un-readable… certainly more readable than fixed length or comma delimited if you ask me. But then again, why do I even want to read XML? There should be some app consuming it and all I should care about is that that app can read it and let the app present it to me in a readable format.

These last two posts have probably been the best I’ve ever read on your blog yet. Keep it up!

@Aaron G

Some conterpoints.

  • Well formedness. All this means is that the XML document conforms to XML’s syntax rules. If it doesn’t the XML parser will fail, just like a incorrectly formed JSON or XML document

  • Schema and Metadata, you don’t get this unless you use and DTD which is built in to XML or XSD which is a whole different standard. Right now I am working on a project that is using XSD and I have to say, I hate it. THAT is hard to read, and since we are still working on the document it just gets in the way.

  • There are tools out that allow you to create schemas for YAML and JSON. Kwalify being on of them. (http://www.kuwata-lab.com/kwalify/)

Personally I think that defining schemas and metadata would be better handled in some sort of Domain Specific Language that is not tied to any one technology. Human and machine readable, but focused soley on defining the entities of a system and their relationships to one another It could be used to generate UML, XSD, SQL DDL etc. In the design stage of a project that would help greatly.

Finally, I have seen lots of places where GB of XML data are used for data exchange. Yes, JSON and YAML would be as large, but not as large and for large datasets that can make a big difference. And I would drop Unicode if it is not required. As to 4 digit years, I wouldn’t bother, unless you have a data file of nothing but years they don’t add nearly as musch overhead as XML.

In the end I think that XML has it’s place. It is great for creating structured documents. XHTML is a greate application of XML. I am sure there are other that run along similar lines.

But XML falls down when used for raw data exchange and config files. There are many other formats that can handle that better, and most of them have as many tools to work with as XML.

It feels like the people that want to use XML for everything are similar to the people that want to use one programming language for everything.

Don’t apologize.

Standards ARE tax. It’s the price you pay for interoperability. That doesn’t mean you have to use it. You can decide if you want to pay the tax or not.

With that said, I wasn’t very fond of XML in the beginning, and I’m still not. If people want to use it, fine, but it’s really nothing special. The most “special” thing about it is that magically everyone has agreed on something. It’s more about timing than anything.

@Vinzent Hoefler: “If you need comments in a file format that is mainly designed to be read and written by machines, you’re doing something wrong.”

Weeble was talking about using XML comments in C# source code, which is the “correct” thing to do, but is a pain to use.
Take a look at http://msdn.microsoft.com/en-us/library/b2s063f7.aspx

“anyone shot the Ant-Developer yet? Makefiles as XML. Now that was a real stupid idea”

Actually I quite like Ant.
Makefiles are full of magical archaic syntax (e.g. the automatic variables like $%, $^ or $(?D)) which can be a real pain to mentally parse if editing a makefile is something that you very rarely do.

well, here’s a smackdown for you: how is this a revisiting? This is a reiteration of your original point with very little evidence. You might as well have titled this “shitstorm part ii”. What’s next? “Another 50 reasons PHP sucks” article? Seriously, the discussions I have about coding horror these days with other programmers are now along the lines of “Hey, coding horror seriously started going bad once Jeff went pro blogger. Jumped the shark, completely”.

I think we all realize that your employment is now tied to your blog and you now need to regularly make posts that get traffic, but you’re alienating your readers with this constant linkbait. Don’t tell us what “sucks” – I think we all have opinions about what sucks or what is cool. Just tell us what is cool. You like YAML? Great, write an article about YAML. You don’t have to simultaneously make the point “Oh, XML sucks”. If you truly believe in using the right tool, XML DOESN’T suck, it’s just sometimes the wrong tool – and YAML isn’t good because XML sucks, it’s good because sometimes it’s the right tool. YAML is good for situation X, and XML is good for situation Y. Tell us about both of those situations. Without the linkbait language.

I’ll agree that XML can be a bit “wordy” at times, but the given example is simplistic. For a simple standalone list, XML probably wouldn’t be my first choice either.

But consider a more complex example, perhaps the inventory of a car dealership.

inventory
car
manufacturerChevrolet/manufacturer
model
nameCavalier/name
year2006/year
/model
colorBlue/color
powertrain
engine
cylinders4/cylinders
horsepower400/horsepower
/engine
transmissionautomatic/transmission
four_wheel_driveno/four_wheel_drive
/powertrain
/car
/inventory

Now that can absolutely be made less verbose by using XML attributes, but XML is very helpful for representing the structure in a human-readable form.

I wouldn’t recommend storing an actual car inventory in XML (that’s what databases are for), but it’s very definitely useful for structured data, such as a configuration file, that a human might need to read. (As opposed to some of the old binary files stored via the MFC serialization mechanism.)

Is it the solution for everything? Absolutely not! Using XML (or any other new technology) everywhere is phase one of adopting a new technology. XML isn’t intrinsically bad, but just like any other technology, it can be misused.

The three phases of technology adoption are:

  1. Refactor the entire system to heavily overuse the new technology, especially in manners where it was never intended to be used and/or is completely ill-suited.

  2. Refactor the system again in attempt to remedy the problems caused by the previous refactoring.

  3. Refactor the system with the next hot technology.

http://thatblairguy.wordpress.com/2008/03/10/technology-phases/

@anonymous coward
"You like YAML? Great, write an article about YAML. YAML is good for situation X, and XML is good for situation Y. Tell us about both of those situations."
That was a good point atrociously expressed. I would like to learn more about YAML. How about it, Jeff?

I hate XML. It just ain’t pretty.

This just seems silly. If it bothers you, do something about it.

The simplest thing I could imagine in 10 seconds would be to write your files in YAML and convert it to XML. A quick search of the nets returns this utility:

http://search.cpan.org/~ingy/YAML-0.35/bin/xyx.PL

which may or may not work, but if it doesn’t just write one–it’ll take all of 2 hours. As the first step of compiling, place this in your makefiles. Problem solved.

If this is too much, find or write an editor plugin that presents XML as YAML, you’ll never see another

YAML was made to be hand typed–XML was mostly made to be machine to machine–you’re right to feel uncomfortable editing XML directly–why would anyone do that?

Hey, as long as we’re here–can I suggest a topic for thought? Those of us who are programmers have the ability to make our computers do anything (at very little cost). What is with programmers who don’t feel it’s proper to make their own tools?

Sometimes you have to make a little parser to change 30 paragraphs of repeated code to 30 lines of data. JUST DO IT! Move everything you possibly can from your code into data, then write something to input the data into your program.

And by data, I don’t necessarily mean XML–data could be something as simple as a large string defined at the top of the file that you write a parser. In Java, array initialization happens to have a nice, short syntax–use that to get the data into your program. You can even put method names in your data and use reflection to link them at runtime–giving Java elegance similar to that of any dynamic language if you like.

Any code (no matter how large) you have to write just once beats the hell out of any code you have to write/modify any time you have additional data.

And if you have to use XML as part of your process–write a damn tool that takes the simplest input possible and outputs XML… I doubt the initial designers of XML ever thought it would be edited by hand except in emergency situations. Before XML, it was just about always binary streams or binary files. Is that what you’d prefer to be using?

I use XML, and don’t consider it too hard to parse. If you use a decent editor, with proper formatting and it can be quite readable.

I never used to use XML, just doing name-value pairs, until I started to notice a pattern of problems I would run into, writing parsing and validation routines to ensure correctness of a file.

In Jeff’s example how do you denote multiple fruits in the same file? You have to create rules in your parser to say ‘Start a parsing a new fruit each time you see fruit as a name’.

How do you deal with a fruit having multiple toppings defined? Is the first one correct, is the last one? The default behaviour (depending on code) is to use the last value so you will survive if you don’t think of this one. But this brings up another problem…

In Jeff’s example how do you deal with having multiple ‘bugs’ on a fruit. Do you number them?
fruit=pear
vegetable=carrot
topping=wax
bug1=fly
bug2=aphid

If you number them you have a maintenance problem. If you just leave the name as bug, then you have to write a specific rule saying ‘if the name is bug, creating a new bug on my fruit’ otherwise you will only store the last bug.

XML can solve these problems by using a Schema and letting your language xml library code parse and validate the input.

But like Jeff said in the podcast, use each tool for the proper job.