Revisiting the XML Angle Bracket Tax

Sorry, but there is no sacred cow that has as much fervent devotion as SQL. Try to convince a group of DBAs that SQL is a bad language (which it is) and watch for unidentified flying objects.

Which should make tonight interesting for me. I’m going to a SQL Server usergroup meeting. :slight_smile:

@Chris Brandsma

Great comparison! And similarly to XML, hundreds of programmers have decided that SQL is teh suck and have decided to wrap it behind a construct they like. In the end it’s the same thing, some people want to work with tool x, so they do, others want to avoid x at all costs so they avoid it or wrap it or find a wrapper to make it “safer/easier/better” to deal with. Those that have embraced tool x look at those who use the wrappers as lazy (or terrorists), while the anti-x crowd sees the other group as stodgy troglodytes.

I like XML for documents. You know, those things with lots of words and a little markup. Periodically I see someone make the groundbreaking rediscovery that XML can be just as well represented as S-expressions, and then instantly try to apply that to HTML. Which is silly, because if there’s one place I want those big ugly redundant tags, it’s in the middle of a document where it’s otherwise easy for them to get lost in the noise. If I opened a tag five hundred words ago I don’t want to have to flip back to the beginning just to see what tag this paren or close-bracket or whatever is closing.

On the other hand, using XML for key-value pairs is equally silly. Especially when it’s a file that doesn’t need any kind of well-defined i18n story.

Do you honestly think that this issue (naming of tags) wasn’t discussed when XML was being standardized?

Rather than guessing what the XML standard bodies think, or simply assuming that they’re just not thinking because they did something you find odd (how could they be so stupid as to not agree with you, after all), it might do to go figure out what the rationale behind the design decision actually was.

I do know this much: without named closing tags, it makes it much harder to verify the structure of a document until you reach the end of the document, and the parsing error is almost always going to show up later than the actual error.

Let me toss my cents into it.

First, XML wasn’t made for humans or computers. XML is derived from SGML, which was designed to be a “Standard Generalized Markup Language”. Ie, it was supposed to “markup” (add metadata to data), and to be sufficiently generic to be able to handle anything whatsoever.

XML was derived the following way: ok, we have this very flexible thingy. It’s too flexible and complex to use. Let’s get a subset which will be able to handle hierarchical data, which will solve a specific subset of problems we have.

For some reason, no one is allowed to do that anymore. Ie, say it’s too flexible and complex.

Frankly, if your answer is “XML is here to stay, deal with it”, then go do anatomically impossible things to yourself. Not everyone is happy with XML and XML will not pervade everything. Deal with it.

If your answer is “if you think this is too complex, add more complexity (eg, specialized tools, XSLT) to hide the complexity”, then… ah, hell, I’ll wait for understanding to dawn on you. Or not. Basic concepts can’t be explained, and KISS is a basic concept.

If I have a table with many, many rows and a limited number of columns, I’ll use CSV. If I have hierarchical data I’ll use XML. I might use JSON or YAML too, but I’ll probably settle on XML. XML can’t handle non-hierarchical data, so I’d have to go for YAML or something else.

And if the data has to be accessed many times in many different ways, or be constantly updated, then I’ll use a database.

I’ll bet people who advocate “XML or death” probably won’t see the possibility of using a language other than any of the mainstream languages (for their own definition of mainstream). It’s a mindset, emphasis on set.

Hi Jeff,

I’m not sure, but why are you trying to look at XML code? Why not to open it in a web browser and have all the markup magically disappear, leaving only values? And if you supply some CSS, you may get a decent formatting for your data at little cost, e.g. output them in different colors.

XML may be an overkill for a simple task like storing a list of twenty keys and values, because it can do more, much more. It’s really a powerful tool to keep documents that are both human- and computer readable. E.g. one can have an invoice that can be (with some help from CSS) printed and read by human and at the same time be precisely understood by a database application. And then it may be modified by some workflow app that augments the invoice with its own markers without breaking the invoice: it will remain printable and the database app will be able to read it just fine.

And then it may be transformed with some formula and the formula happens to be also a XML document. This gives us a nice closure that is not possible in, say, SQL: in SQL you can write a formula to take any tables and make any derivative table from them, but the formula itself won’t be a table. So you cannot generate SQL code with SQL. With XML, that is XSLT, you can. That is an order of magnitude more powerful than SQL, and it works with XML documents, which are also much more complex structures than mere tables.

All other standards you mention may do a great job about keys and values, but they don’t even come close to the full power of XML. And since XML can do all key-and-value stuff at little cost, there’s no reason not to use it for this too :slight_smile: In most cases this is simpler and more compatible because there are ready-to-use tools on nearly every platform.

Man there’s a lot of morons in these comments. I guess that’s what the smackdown model is good for: people lack basic reading comprehension so there’s no point in trying to bring across any nuance. Just where exactly did Jeff say XML sucks? Oh humanity, how I weep for you…

Ack! The never-ending argument! End the war! Peace. Love. and non-standard conforming markup!

Who knew that XML could arouse so much… sentiment…

@Graham Stewart:
XML is wicked easy to pick at a medium level. Spend a day on w3schools (yes, that level is enough to start on) and with some books, and you will be at medium level. Try creating some files, try thinking about it and quickly you won’t be making massive beginner mistakes. Because there aren’t that many beginner mistakes you can make in xml - the language is too simple for that.
But of course you’re right: people can and will make horrible abominations named .xml because they don’t think about what they do. That is no different from any other language. And I’d say it happens no more often than in any other language - so I definitely wouldn’t hold it against xml.

As for all the following technologies: no, you do not need to know xsd, xslt, xpath, xquery or any other language to make use of xml. Sure, they can help a lot and really add a lot of power to xml. But it’s not necessary to know any of them before getting to know xml.

Regards
Fake

“There is a very real mental cost to parsing even a few short lines of XML”

For you maybe, definitely not for me. MSBuild files, they really makes my head asplode…

“the mental cost of that insignificant effort times the number of developers in the world, times the number of projects in the world?”

So, what are you going to do with that time once you saved it? How is this metric useful?

@Fanboy?:

That author lists a bunch of stuff that XML has, eg XPath, XSL.

That’s great, but if he had ever learnt to use Lisp/Scheme, he would know that those extras are already part of Lisp/Scheme (syntax, macros, etc).

IMO XML is simply sexpr’s and the rest of the XML technologies are simply a ripoff of what exists at the core of sexpr-based languages.

I read somewhere: + marketing = ()

I think that should be: () + marketing bs =

Cheers

leppie

I recently had a case where a someone in my office needed to store a list of customer ID’s to disk. There instant thought was to just serialize the collection in XML!

If we thing about it it makes what could be a simple CSV in to a giant file containing hundreds of String12324/String (not to mention all the data at the top of the XML). But the case is most people don’t and jump for the quickest tool.

I believe most technologies are there for a reason and each case should be taken to choose the correct technology. I think XML has it’s place but it should not be the default choice.

What I am hearing from Jeff here is not: “Replace XML with this YNM which is always better”. It’s more like: “When you decide to use XML, make sure you know WHY you are using it, and please be aware that there are alternatives”.
br
All the critisism about XML having feature X and standard Y and handles everything - that is valid and is a reason why XML sometimes is a good solution to a problem. It is also the reason why it is sometimes a BAD solution. Know the difference. Think, then decide.

I’m amazed at the fuss some people make over readability. If you think XML is unreadable, try using something other than Notepad to read it.

I know XML is not ideal, but at least it means you don’t have to worry about (a) parsing, (b) encoding all possible characters, © representing strongly-typed values. Pick any other format and you have to implement some of those yourself.

For god’s sake. Jeff, fix the comment form before making another article about XML. I keep seeing people comparing bar with foo=bar; my bguess/b is that they intended to compare foobar/foo with foo=bar.

And I’m not even sure if this comment will show up correctly.

And don’t even get me started on XSLT…

I personally can’t stand editing/reading XML, but I dearly love XSLT. It’s a brilliantly designed language. I once heard it described as “the wonderful language with the horrible syntax”.

I have actually written a DSL embedded in Python so that I may write XSL transformations without having to write XML, and I love it. I actually prefer it over nearly any template language, now that the XML pain is removed. Well-formedness guarantees are a wonderful thing!

bleh – fixing that now. It is really annoying, annoying enough to make me edit Perl code. That’s how bad it is.

@Aaron G: Want to see a 2GB XML file?

http://setiathome.berkeley.edu/stats/

There are dozens of stats websites downloading the files in the above URL daily. I’ve been told from the developer of one of them that, during the daily updates, the XML parsing uses more CPU than inserting the parsed data into the SQL database. (although querying the database is the bottleneck the rest of the time; after daily updates are done)

[?xml version="-1.0" encoding=“UNICEF”?]
[procondocument name=“my take on the good and the bad stuff with xml”]
.[list type=“pro”]
…[arg]everybody can do it[/arg]
…[arg]global standard[/arg]
…[arg]it can be used for almost everything[/arg]
.[/list]
.[list type=“con”]
…[arg]just because it can everything, it does not meen it should[/arg]
…[arg]DRY, with XML you repeat yourself over and over[/arg]
…[arg]terrible to look at[/arg]
.[/list]
[/procondocument]

message mood=Yay!I have finally escaped HTML entities in comments/message

Let the XML-ization begin.

I’m sorry I didn’t do this years ago. My bad.