XML: The Angle Bracket Tax

PaulG · May 12, 2008, 12:00am

XML has been around for so long and it’s so pervasive we’re probably stuck with it for a long time. A few developers using my language have created “easy XML” subroutines that do a lot of under-the-hood formatting and parsing. If we have to live with something we might as well make the best of it. Automate it and forget it.

Reshef · May 12, 2008, 12:00am

I didn’t go through all the comments but I didn’t see DSL mentioned. Take a look at this http://www.ayende.com/Blog/archive/7268.aspx for example how to simplify configuration.

JPLemme · May 12, 2008, 12:00am

One thing XML gives you is an ability to randomly access data inside the file without loading it into a database. That can be handy for populating a catalog page in InDesign or building a web page on the fly.

But for something like a config file where you typically read the entire thing in at once it’s a useless feature. And for batch-processing scenarios where the receiving system is always going to process all the data in sequence it’s a useless feature with a performance penalty.

N1133 · May 12, 2008, 12:00am

I like XML, honestly, for small things where you don’t overuse attributes and all sorts of other junk.

Sort of like your simple examples:
books
book
titleCoding Horror for Dummies/title
/book
/books

But once you start to factor in XSL, XSD, XDSLXSLDX – I just find that it all gets horribly bloated and against the … well let’s just say that I find using simply structured XML files easy and to a degree NICE to use – but that XML quickly crosses a line from being ‘enjoyable’ to ‘painful’.

Damo · May 12, 2008, 12:00am

+1 Aaron G.

If you are swimming in a sea of angle brackets perhaps you are doing something wrong. For most developers, especially those in SOA land, it’s invisible under-the-hood plumbing that (mostly) Just Works™.

N1134 · May 12, 2008, 12:00am

(Sorry, my example doesn’t show because of the inclusion of the brackets…)

Jheriko · May 12, 2008, 12:00am

XML has its place, but lazy programmers use it for everything.

Its a new Windows registry or DLL manifest - something we never really needed, but makes complicated stuff easier (or possible for the more ignorant coder). However, as with all such RAD tools/standards bad programmers like to use it by default without thinking.

The .NET data controls output “horrible XML files” by default for instance… this is where I blame M$ and draw a parallel to the registry… but that would be unfair. As usual its the programmer’s fault for choosing the wrong method to store/retrieve his/her data.

Its easier to not think than to think… and we are all bad programmers after all, so I can forgive it.

Hartmut · May 12, 2008, 12:00am

this link is broken.

Do you know what the XML alternatives are?

Shawn · May 12, 2008, 12:00am

I’ve been digging into YAML recently and I must say it’s a lot easier to pick up on, parse, and write than XML in my experience. It just seems more natural to say
Name: Shawn
Rather than
nameShawn/name

Kelly · May 12, 2008, 12:00am

Now if only we could get BizTalk to speak YAML. Sigh…

Max · May 12, 2008, 12:00am

I’ve been wondering about XML for a while. I only recently began to get serious about developing software, and XML was entering its halcyon days right when I started learning. For a long time, I trusted in the ostensible greater wisdom of the collective and assumed that XML really was what its ubiquity implied: The greatest thing since peanut-butter Nutella sammiches. Recently, though, I really got to wondering about what the point was.

Clearly, XML is no fun to write by hand. The main argument I’ve heard regarding its verbose plain text format is “it’s easy to debug”, which makes me want to barf. This is what I’m really wondering: XML is meant to be a data transfer format. Take RSS, for example:

High-traffic sites serve tens of thousands of RSS feeds, formatted in XML, every day. In situations like this–where every spare pound of fat on your data becomes inflated ten-thousandfold until, like the grotesque beast at the end of Akira, it is suffocating the entire known universe with its pustulent girth–shouldn’t we be using a data format that’s as thin and possible? Shouldn’t the common symbols in a data file be encoded and compressed within the file itself? Which has a smaller bandwidth footprint? This:

SomeDocument
SomeParagraphXML sucks/SomeParagraph
SomeParagraphno really/SomeParagraph
SomeDocument

Or this:

1=SomeDocument;2=SomeParagraph;12XML Sucks2no really

The second one is pretty terrifying, but it would be TRIVIALLY EASY for ANY modern editor to translate it into something that doesn’t rape your eyes (like YAML). Aren’t we actually wasting TERABYTES of bandwidth every day by transferring human-parseable cruft in files that no human should ever see in the flesh anyway? Or am I missing something?

bobby17 · May 12, 2008, 12:00am

“One thing XML gives you is an ability to randomly access data inside the file without loading it into a database.”

Er, that’s exactly what it doesn’t do, hence terrible performance relative to binary, or simple textual data.

John · May 12, 2008, 12:00am

Allow me to express my utter indifference: meh!

I work with XML roughly daily as a developer, and it ain’t no big thang. It’s at least 12 parsecs farther along than the obsolete flat files we’re unfortunately still dealing with.

Show somebody XML, even a total bonehead, and they’ll figure it out in a few minutes. There’s little magic to it, few assumptions made. Can it be abused and misused? Certainly, just like anything else in computer science. Is it largely redundant? Absolutely, but that can also serve to enhance readability in very large files.

Compare to what came before this: inscrutable binary files, INI files consisting only of key-value pairs, fixed-width flat files, delimited text files… Let’s not forget our past, folks.

It’s computer-readable, computer-writable, and it’s more-or-less human-readable and human-writable, even if it makes you a little crosseyed. Which makes it way better than the tarpit we just crawled out from. JSON or YAML or whatever is probably on the horizon, but let’s not say “XML sucks” when it was still a huge step forward.

Max · May 12, 2008, 12:00am

Oops… looks like your comment filter clobbered my examples. I forgot that it’s never safe to assume “no HTML” means everything will be politely escaped rather than thrown in the trash. Here they are again, manually escaped like God intended:

This:

SomeDocument
SomeParagraphXML sucks/SomeParagraph
SomeParagraphno really/SomeParagraph
/SomeDocument

Or this:

amp;1=SomeDocument;amp;2=SomeParagraph;amp;1amp;2XML Sucksamp;2no really

Bill107 · May 12, 2008, 12:00am

I’m just thankful developers have turned to XML instead of undocumented binary files. We don’t want to return to those years.

Miles1 · May 12, 2008, 12:00am

XML is by no means perfect, but why do XML detractors always compare inefficient instances of XML with otherwise terse competitors? For example, the memo shown in XML is a case in point.

memo date="Thu, 14 Feb 2008 16:55:03 +0800 (PST)"
from="The Whole World us@world.org"
to="Dawg dawg158@aol.com"
Dear sir, you won the internet. http://is.gd/fh0
/memo

Just because something is marked up with XML doesn’t mean you must mark up every single possible bit of metadata for the purposes of constructing a strawman.

Justin · May 12, 2008, 12:00am

XML isn’t bad for many things, but space efficient/easily read by normal computer users it is not. Before XML was used for config files, INI files were standard. They have limitations, but you can parse them VERY quickly, they serve the purpose (configuration) perfectly, and they are easily read and edited. I will never understand why XML took off for configuration. As for tabular data, the CSV standard was much better in my opinion. Once again, easy to parse, editable in many apps including excel, quick imports, and a small footprint. When it comes to more complex data, I believe XML is a good solution, but YAML/JSON is better in many cases for obvious reasons. The key is to use a standardized format that is supported by other major technologies. It really doesn’t make a huge difference for most things. However, Microsoft added a binary format for datasets in .net for a reason. Sending huge XML files over webservices was slow, and adding a “tighter” format was a huge improvement.

James · May 12, 2008, 12:00am

I wonder if you may have also seen JDIL at jdil.org?

they say:

However, unlike XML, JSON provides no direct support for namespaces - and thus no standard way for avoiding name collisions when mixing data from diverse sources. Something like a namespace mechanism is required to lift JSON to the level of a data integration platform, as opposed to a data exchange format only. Also lacking are standard ways of naming objects so that they can be referenced from elsewhere, and for representing properties with multiple values.

If these concerns are addressed, JSON’s reach will extend over more of the domain currently occupied by XML, while bettering XML in the cardinal virtue of simplicity.

RobertR · May 12, 2008, 12:00am

I’ve become a big fan of JSON for 100% JavaScript web applications. You can convert XML from any online data source into JSON using Yahoo! Pipes. However, JSON is not very readable. It is a nasty mess of brackets. I have to use a JSON Viewer to figure out the object structure. Recently I used ASP.NET’s JavaScriptSerializer.DeserializeObject to deserialize some JSON data into objects. This is totally undocumented and proved to be very difficult to figure out.

claire_rand1 · May 12, 2008, 12:00am

for added sillyness…

a program i wrote was written around a custom text parser.

this ended up being used within a large data analysis program, that needed to store settings etc.

I modified it to read a ‘script’ at startup, a script that could contain variables and other settings the program used.

perfectly human readable, since the plain text ‘comments’ were ignored, and it was childsplay to get the program to write the config file.

sledgehammer to crack a nut for most programs, but since this included the parser i figured why not

personally i don’t care what the format is as long as its plain text of some sort, and thus easy to backup and copy.

essentially sod anything in the registry or some hidden binary file. i love the idea of the unix based ‘dot’ hidden config files. put them in the program directory (defaults) and the users directory for everything else.

btw what was wrong with .ini files? have a standard user dir system dir for them and it works… problem? never understood why they moved away from that