Revisiting the XML Angle Bracket Tax

It’s just a standard like anything else. It’s also very expressive, easy to use, and has loads of tools support out there.

Why on fords green earth would you hand-edit Xml or Html just to write a document, a post, or a comment on a website. It’s better as a structured storage format, that’s coincidentally really easy to send across the wire (because it’s already text). How your users input data usually doesn’t have anything to do with the storage format.

Instead of the argument being ‘Use Xml’, perhaps the better argument should be ‘Why Are You Not Using Xml’. Why are you trying to reinvent the wheel (and by reinventing it, wasting your time, and wasting the maintainers time and everyone else’s time).

Xml is not a UI, so Xml haters, come up with a better argument.

Maybe us ‘xml fanboys’ react because the alternatives are only superficially better. Maybe we don’t want another markup language for the sake of it. Maybe we don’t want to change existing code to suit the flavor of the month.

I have no doubt that Xml will be surpassed in time, but until there’s something better out there, I’ll stick with Xml. It works, so I don’t have to.

XML took off because it was the first simple, recursive generic data format which is both machine and human parseable/editable.

That’s it - there’s no magic to it and it’s certainly not the best solution many times but it can be fitted to near any need, thus its omnipresence in VS.NET and other enterprisey systems.

XML allows you to build DSLs really quick and simply. The user does not want to edit Java or C# code to configure their application but having them edit XML is often acceptable since its syntax is much simpler. And that’s why XML is rarely ever used in a language like Ruby - Ruby can be made clean enough that the user doesn’t even know they are editing Ruby. Only the quotes below are a clue you are in a programming language.

Ruby:
hostname 'http://foo.com
port 50

XML:
hostnamehttp://foo.com/hostname
port50/port

So to great extent its all about cleanliness and ease of data expression in the language you are using.

More here:

http://www.mikeperham.com/2008/02/09/dsls-and-xml/

I’m definitely not trying to shill for Ruby here. All of this applies to any other language with lightweight syntax requirements so the code can be made to look very close to English.

There’s nothing wrong with XML. If you don’t find it legible, take an hour and write a program to display the data in whatever format you want. It’s not going away, so deal with it! Another lame post.

I might ask you a similar question: why learn anything beyond exactly what is required?

For low-hanging fruit such as data persistance and configuration you should absolutely not care about going beyond the minimum required because these frivilous things don’t make your app better.

What are the benefits? A little less pain in deciphering the meaning of foobar/foo vs. foo=bar? At the cost of…

  • Additional training
  • Buggy/immature apis
  • Unknown performance
  • Mental cost of switching between XML and the flavor of the month text file format

You mentioned a lot of topics on this article, but didn’t get to discuss any to any interesting extent. Paradoxically, it was a nice read.

@JoeOsborn … here’s a link you may like:

"XML is not S-Expressions"
http://www.prescod.net/xml/sexprs.html

Anyways, most of the time I’m just using Xml files as an easy way for end users to have complex configuration files without me having to come up with a novel way of representing them. It’s so trivial to write a class X, populate it the way you like, and chuck it into an generic Xml Serializer (or load it with a generic deserializer). Job done … next… I never even have to write the parser or parsing code.

“YAML is based on a standard, too”

That brings up the question: What exactly is a standard? I always thought it was something that a large number of people have agreed upon.

With XML, it’s the XML Core Working Group, part of the W3C, a multi-national consortium with a large member base, including quite a few major tech corporations. The copyright for the specification is held by the W3C.

With YAML, it’s… the yaml-core mailing list. The copyright for the specification is held by three individuals.

Now, XML is a pain in the ass to deal with, but… would you really consider the YAML specification a standard?

I think that in the case of XML, having a standard at all is much more important than having the best standard possible. The cost of getting the entire industry to convert to a less verbose standard for document markup would be far greater than just dealing with the angle bracket tax. It’s a case of: just pick one and be done with it, so we can get on with our jobs!

Many people are complaining about the “mental cost” of (manually) reading XML.

Come on! If you really dislike reading XML, can’t you write (once) a simple XSL transform that will convert any XML document into some format that doesn’t burden your mind so much?

If you can’t write that transform, then the data probably couldn’t have been stored in your “light on the mind” format anyway, which means that XML was a good choice.

I never really understood the fuss about XML tags. For me, opening and closing tags are simply a concept. Sure, currently we often go on and store and transmit those tags verbatim, which is wasteful. But with some simple tools we could process and operate on XML and tags at a high level without those tags having to exist physically at a lower level.

If anyone want’s to see a classic example of angle bracket tax, try reading and writing XAML, it’ll make your eyes bleed.

I’m with ya Jeff. I’d go so far as to say I hate working with XML. As such, I keep my use to the bare minimum and usually will look at things like JSON, etc, before I’ll settle on XML.

And I think that’s the point you’re making. XML all the time is just bad behavior.

Do you really need XML to define your ORM mappings, or would a compilable fluent interface make more sense?

Is there a real need in your application for XML configured Dependency Injection?

Does your webservice really need to return a complex and strongly typed XML file, or would a JSON file work just as well?

I could go on.

And don’t even get me started on XSLT…

I have no problem with using XML for relatively small buckets of data - config files, individual transactions, etc. Anyone who uses XML for large buckets of data is drinking the kool-aide or smoking crack.

The problem with moving or storing large amounts of data via XML is that there is no easy way to locate subsections. Anything you do requires that the XML document be parsed from the very beginning. This isn’t a big deal when you are dealing with a small document. A few k of characters can be parsed pretty quickly whenever you need it. When the document gets a little bigger, you parse it once and use the DOM model. What happens, however, when you need to process a document that contains several hundred MB of data, or even several GB of data? There is no practical way to handle these volumes. And yet I see people try to do this all the the time.

While record oriented techniques are clumsy in certain ways, they are far more easily scaled to large data sets. It is trivial to navigate to any particular element in a fixed field file, no matter how large. A csv file is almost as easy to handle. The data sets can be broken down into subsets quite easily. These sets can be streamed or paged easily. Arbitrary parts of the data set can be accessed without accessing every other part.

XML is perfectly fine for a lot of things, but programmers should consider scalability in the context of their projects. XML doesn’t scale nearly as well as traditional record oriented approached to data.

I have no problem paying the “angle bracket tax”.

Integrity of data is a bigger concern to me than whether its the easiest possible format on the human eyes.

What happens when you start having to handle non-standard data? E.g. things need to fit on multiple lines, or contain odd characters? Then you’ll have to invent an escaping or encoding scheme.

Newsflash, XML already has this. Why reinvent the wheel? I’m sure as hell happy I don’t encounter custom CSV or other arbitrary delimited file formats that much any more, since 99% of programmers don’t think about the exceptional conditions, and their crappy invented file formats can’t handle them.

iIntegrity of data is a bigger concern to me than whether its the easiest possible format on the human eyes./i

a href="http://www.w3.org/TR/REC-xml/#sec-guessing"From the spec/a:

iThe XML encoding declaration functions as an internal label on each entity, indicating which character encoding is in use. Before an XML processor can read the internal label, however, it apparently has to know what character encoding is in use—which is what the internal label is trying to indicate. In the general case, this is a hopeless situation. It is not entirely hopeless in XML, however, because XML limits the general case in two ways…/i

So, your choice for data integrity is a spec where determining what encoding it’s in is, in the words of the authors, “not entirely hopeless.”

At my place (a research lab), people barely throw at me rocks because I use S-Expressions in place of XML for my various numbers crunchers. Basicly, my S-Expressions are used to setup factories, which in turn build-up objects that are then tinkered by my application. A real life example :

embryo(
material(
name = 'steel’
density = 7860.0
max.strain = 0.1
young.modulus = 210000000000.0 # newton/meter square
)

control.model(
control.model.A(
nb.chemicals = 1
damping = 0.1
nb.cell.neurons = 1
nb.edge.neurons = 4
)
)

template(
beam.template(
load = 6000.0 # newton
radius = 0.001 # meters
width = 2.0 # meters
height = 1.0 # meters
nb.hrz.patches = 8 nb.vrt.patches = 3
)
)
)

Imagine the same in XML : it would be less readable. Writing a parser of this ? Hey, mine fits in barely 300 lines of C++, does syntax error checking with gentle exception do give error messages. It’s a LL(1) grammar, so a finite state automaton and a stack and you’re done… XML grammar is lot more demanding.

S-Expressions

  • readable
  • lightweight

XML

  • not very readable out of tiny files with less than 3 levels of imbrication
  • heavyweight in ressource

You say that the example you pick has a large difference but when you have 100s of that to send over the wire its gigantic!

I’m writing a mapping application and guess what? I have to load around 100s of point of interests to overlay on a map at once and god forbid, the data comes in XML… imagine parsing all that using the browser’s javascripts.

There’s more to verboseness and parsing headaches, there’s also the space and network bandwidth and CPU cycles tax and oh… I can think of a lot more when everything is in XML.

totally agrees w/ you.

Just like Lucas said - why not try JSON instead? I also liked Douglas Crockford’s assessment of it being pretty much XML without all the crap in it.

After many years spent dealing with XML and RSS in particular, I’m going for JSON in my future projects. It’s either that, or I having to come up with an even better way.

I’ve been doing some .NET and SharePoint development the last year, and it’s the best way to start hating XML: Handling it is very clunky (e.g., having to define a namespace manager even if there’s no namespace defined), it’s used in the most ridiculous places (CAML is just verbose SQL), it’s bastardized (ASP.NET should be put to sleep) and it’s used even for name=value content like web.config. If that’s your exposure to XML, no wonder you’re thinking twice about using it.

quote. Every time I look at my web.config XML file, there’s a mental cost of me having to parse all these tags in the file. end quote.

Thats the bottom line. Get used to it. If you’re having a problem reading xml, learn how to do it better.

If the intellectual overhead of using XML is so low as to be insignificant for ANY task, as some fanboys claim, then why are there still non-XML formats out there? Why do we not write Java or C++ or C# or Python code in XML format?

Simply put, Jeff is right, XML is not a panacea. Even if it is still your “go to” choice as a data format if it is your one and only choice in any and all circumstances then you must accept that you are limiting yourself. Think, use your judgement, weigh the pros and cons of using XML and of alternatives, then decide.

There’s a marvelous book called “Conceptual Blockbusting” which talks about many aspects of creativity and problem solving and one of the most useful things I got out of that book was the term “satisficing”. When problem solving there are two general categories for the methodologies used to arrive at a solution. On the one hand there are all of the methodologies which find a solution which is workable and then stop and move on to implementing that one solution. On the other hand there are all of the methodologies which continue to look for other, perhaps better, solutions even after one has been found which may be workable. The first strategy is called “satisficing”, and many people do it without thinking. It’s the reason why refactoring and redesigning and rebuilding things is often necessary. People working on a solution didn’t stop to evaluate their design or consider alternate designs and instead just jumped ahead to implementation. This is the difference between buying a car by putting it on your credit card and buying a car by financing it using a low interest automotive loan. Assuming your credit card limit is high enough, both solutions might be “workable”, but they are not of the same character and I think it’s clear that one of these solutions is in almost all situations vastly preferable to the other. So the next time you’ve come up with a solution to a problem ask yourself whether you’re cutting yourself short by satisficing, maybe you should spend some time (but not too much time) trying to come up with other solutions so that you can compare various solutions against each other and determine which one is the best, it may be your original solution, or it may not. A few minutes of forethought now can often save hours or eons of pain later.

The same applies to XML, this should be common sense.