Revisiting the XML Angle Bracket Tax

goatslayer · June 25, 2008, 12:00am

Intriguing comments, but surely as human readable as XML is supposed to be, no one in their right mind is going to try and parse catalogues of stuff in xml?

Use the right editor, something that reduces XML to it’s DOM, expand it as you wish.

You can produced structured better tools for dealing with XML, and that’s how you should use it.

Sure, parse it in your mind yourself, but bear in mind, human readability is an advantage of XML, not its purpose.

Breakfast · June 25, 2008, 12:00am

XML is great for heirarchical data and as a non-binary storage format for information that you may wish to use in lots of different ways - XSLT is weird to use but very powerful.

It’s a nightmare for storing name/value type information and also actually for things like the configuration files I’m working with at the moment that, for reasons best known to Beelzebub, stores regular expressions as xml tag attribute values. As you can imagine this creates an escaping nightmare as you have to do your regular string escaping, your regular expression escaping and then your XML escaping before it can be added to the document, giving you many layers of possible (and indeed probable) fail when you are editing it by hand. I’m working on a simple editor that gets around the whole problem ( I only just started here ) but I’m amazed that people have done this for years and never apparently questioned it.

Douglas · June 25, 2008, 12:00am

Jeff,
I am an administrator (doing Linux right now, but I have also been a Windows admin in th erecent past). I have been reading your blog for about 2.5 years now.
I am amazed at how many people seem to miss the point of your post – to reiterate (again) THINK!.
Please, do not make me parse XML if a simple key=value list will work. People keep commenting “just use a parser” or “syntax highlighting will make it readable”. Drop the arragonce people, the data is not just for you to read. Us poor admins need to parse many of these files also.

btw - Keep up the excellent work (I may not always agree, but I always find it interesting and thought provoking).

Robin_Day · June 25, 2008, 12:00am

I use one of three methods of storing data. CSV, XML, SQL Server.

I consider this a fairly extensive toolkit for doing almost everything I need to do from the simplest to the most complex.

I do look at other alternatives, and seriously consider using them. However, I try to limit the tools that I use in order to be more efficient with them. The expression “Jack of all trades, master of none” springs to mind were I to use many more methods.

Therefore, until I come accross something that I truly consider worth adding to my toolkit or replacing an existing item then it will remain as it is.

As for XML itself, if the data I need to store is fairly simple then I may well hand crank it. I personally find xml extremely clear and easy to read if I’ve written it myself.

Alternatively though, I will quite happily produce fairly complex XML where a system didn’t quite justify or require the extent or performance of full SQL database. These files will never be hand cranks but will be written and read entirely in code. Seeing as I mainly write in C# then it couldn’t be simpler!

Whilst the XML format may cloud the actual data, or as you say, “only half of it actually matters”. The main thing is, whilst human-readable might be wrong, coder-readable is certainly true. The main point with using XML as my data source means that should push come to shove and I really need to get at or change some small item of data, I can spend 15 mins with notepad and do just that! This is not something I do on a regular basis, its just nice to know that it’s something I can do if I really need to.

JakubN · June 25, 2008, 12:00am

IMHO good example of XML abuse is requiring XML (and XML parser) if you want to push to server via HTTP(S), i.e. for WebDAV. WTF XML does here? HTTP uses nice configuration mini-language (headers), why not use it?

Jaster · June 25, 2008, 12:00am

Many of the comments here seem to be like “There’s nothing wrong with XML. Learn to read it”

They are missing the point, XML is designed to be read by computers not by humans, if you are forced to read it or write it then your program is broken, if you are often having to write it your interface is broken

If you want to use it to modify settings/parameters etc … then use a human readable format… or use an interface, you should never force people to read/edit raw XML

MattH · June 25, 2008, 12:00am

Yeah, XML has a standard so we should just stick with it.

“That’s how we’ve always done it.”

BahriG · June 25, 2008, 12:00am

I am currently developing an application that has a project file. The requirement of the project file are:

User may want to edit project files directly, i.e. it should be editable by a general purpose editor.
Edited project files should be consistent, no unmeaningful content is welcome.
Project files content should be easily mapped into objects.
In case of a version (content) change of the project file, previous files should be migrated.

And the choice for the project files’ format is XML. For first three tasks I only write an XSD schema and use JAXB to generate classes that map to xml files. For the migration task I use either XSLT or DOM.

I don’t think any other technology (such as YAML) or in-house built code would be as mature as XML for my requirements.

Dan · June 25, 2008, 12:00am

Thinking is key… You should always be thinking about using the right tool for the solution. XML is pretty cool stuff, but it does bear a tax. For you PHP folks out there, think of what the php.ini file would look like and bloat to if it were XML. Think of how much additional annoyance and frustration you would have while walking through the file to adjust your settings. Even the simplest of tasks, commenting a line, for example, suddenly becomes an ordeal of angles and dashes.

GrahamS · June 25, 2008, 12:00am

@Weeble:

Yep, the XML comments in C# are a pain to use. I think they were a very poor choice. They feel incredibly bloated, particularly if you want to add anything useful like referencing other methods/classes etc.

For those that are unfamiliar, consider this simple comment block:

/// summary
/// Closes a Thing instance, previously opened by see cref=“OpenThing”
/// /summary
/// remarks
/// Will fail if paramref name=“toClose” is already closed.
/// /remarks
/// param name="toClose"The instance of Thing to be closed./param
/// returnsTrue if paramref name=“toClose” was successfully closed, False otherwise./returns
public bool CloseThing(Thing toClose)…

Compared to a slightly saner (but still parsable) human-readable syntax:

/// Summary:
/// nbsp;nbsp;Closes a Thing instance, previously opened by [OpenThing]
/// Remarks:
/// nbsp;nbsp;Will fail if {toClose} is already closed.
/// Params:
/// nbsp;nbsp;toClose - The instance of Thing to be closed.
/// Returns:
/// nbsp;nbsp;True if {toClose} was successfully closed, False otherwise.
public bool CloseThing(Thing toClose)…

In the XML example, there were 161 characters for markup and 194 characters of “real” content - so around 45% of the comment was ‘noise’.

bobby11 · June 25, 2008, 12:00am

I get the feeling this article, however many times it’s repeated, is doomed to wage a holy war within 1 comment. However, some comments of my own:

“Integrity of data is a bigger concern to me than whether its the easiest possible format on the human eyes.”

XML does absolutely nothing to ensure integrity of data. It ensures integrity of syntax. In some cases, the greater simplicity of - say - “name=value” is worth it, especially given that the syntax check is trivial.

“… if everyone wrote their own programs to reach the same result that the XML version would be more consistent then the non-XML version because it’s based on a standard?”

CSV is a ‘standard’, in that it’s a well-recognised format with defined rules, shared by countless systems around the world. HTML and CSS are ‘standards’, yet almost every web browser out there implements them slightly differently. Saying something is a ‘standard’ is really saying very little; the important thing is that the data format is open, rather than proprietary.

“Once again, the issue is that XML isn’t meant to be parsed by a human.”

Really?? In that case, why not optimise the hell out of all of your DTDs - no more “Student” element when you can make do with “S”. Providing a GUI layer in between the human and the data is all well and good, except when that layer goes wrong, or you don’t have access to it. I guess, by your argument, XHTML isn’t meant to be parsed by a human, and we should all be authoring our web sites in Frontpage. The benefits of readable textual data are massive - try reading ‘The Art of Unix Programming’ if you’re not sure why.

Fake51 · June 25, 2008, 12:00am

Should I ever get to work on any of your code, Jeff, you better remember this: http://www.codinghorror.com/blog/archives/001137.html

XML is wicked easy to pickup, and soon as you have: you’re reading, writing, editing all sorts of files written in xml. It doesn’t mean that you SHOULD but you CAN. So drop the nonsense about learning a million new formats and standards just because “you should learn the minimally required”: well, if I know xml, isn’t THAT the minimally required? Any OTHER language/format/standard just requires more.

Apart from that, it’s a truism that you should use the right tool for the given job. XML is bad for some things and great for others. Leave it at that.

Regards
Fake

Jay · June 25, 2008, 12:00am

I’m just going to go back to reading INI files

     Private Declare Function GetPrivateProfileString Lib "KERNEL32" Alias "GetPrivateProfileStringA" (ByVal lpApplicationName As String, ByVal lpKeyName As Any, ByVal lpDefault As String, ByVal lpReturnedString As String, ByVal nSize As Long, ByVal lpFileName As String) As Long
     Private Declare Function WritePrivateProfileString Lib "KERNEL32" Alias "WritePrivateProfileStringA" (ByVal lpApplicationName As String, ByVal lpKeyName As Any, ByVal lpString As Any, ByVal lpFileName As String) As Long
     Public Sub WriteINI(wiSection As String, wiKey As String, wiValue As String, wiFile As String)
         WritePrivateProfileString wiSection, wiKey, wiValue, App.Path  "\"  wiFile
     End Sub
     Public Function ReadINI(riSection As String, riKey As String, riFile As String, riDefault As String)
         Dim sRiBuffer As String
         Dim sRiValue As String
         Dim sRiLong As String
         Dim INIFile As String
         INIFile = App.Path  "\"  riFile
         If Dir(INIFile)  "" Then
             sRiBuffer = String(255, vbNull)
             sRiLong = GetPrivateProfileString(riSection, riKey, Chr(1), sRiBuffer, 255, INIFile)
             If Left$(sRiBuffer, 1)  Chr(1) Then
                 sRiValue = Left$(sRiBuffer, sRiLong)
                 If sRiValue  "" Then
                     ReadINI = sRiValue
                 Else
                     ReadINI = riDefault
                 End If
             Else
                 ReadINI = riDefault
             End If
         Else
             ReadINI = riDefault
         End If
     End Function

How do I call an API in .NET?

GrahamS · June 25, 2008, 12:00am

@Fake51: “XML is wicked easy to pickup”

Actually I think that is a pretty common misconception.

Most XML-abuse situations I’ve seen have come from developers who found XML “easy to pickup” and converted their existing file format by randomly adding a few angle brackets to produce structureless blob that impossible to validate.

In reality there are quite a few aspects to XML to master before you have really picked it up (XML Schema (XSD) or DTD, XSLT, validation, XPath, XQuery, XPointer, UTF-encoding) not to mention the more philosophical issues involved in designing a good schema.

I’d say “Bad XML practises are wicked easy to pick up. Good XML takes time and practise.”

Erik17 · June 25, 2008, 12:00am

“not XML the religion”

That’s the point, exactly!
Seems some people need religion but fail to empty
their religious cache in church (“What church?”).
So religion pops up at places where it doesn’t belong.
XML, LDAP, Agile Development, …

I’m really p*ssed when things like “Agile Development” suddenly
get this religious monumentum and people start to use
new toys because the believe in them instead of relying
on scientific data.

Nice reads:
Terry Pratchett’s “Small Gods” was an eye-opener for me
when it comes to church and religion.
[ The “Science of Discworld” books are nice too ].

I also recommend Alfie Kohn’s “Punished by Rewards”.
It will crash everything you believe in about the school
system, performance payments, stick and carrot by providing
scientific data.

I have a strong mathematical background so I will never understand
why people replace a properly working (mathematical beautiful) relational database system with “crap” like XML or LDAP.
The objects in a database are tables. When you operate on a database
you get back a table, so you can use this output as input again.
Try that with XML or LDAP.
This isn’t religion, this is fundamental mathematics (-Algebra)!

Dinah · June 25, 2008, 12:00am

In praise of sloth - my never-ending rant but no one reads this far down the comments so it doesn’t matter:

Other than the fascination factor, the reason I began coding was that it saved me time in math classes. I could solve the same equation over and over or I could write a program once and never have to solve the equation again. Seemed like a no-brainer to my lazier proclivities.

A primary reason I continue to be interested in most technologies has largely to do with convenience. To that end, I typically opt for the one with the most support and best features. I used IE until FireFox had a really huge base, even though FireFox had almost always been better. All mp3 players in my house is are iPods because accessories and support are abundant. And I use XML over all other alternatives because there are libraries everywhere for it. My programming languages of choice have never included the new sleek ones that are so hot at the time and whose whole support base consists of a few thousand fickle fanboys, little to no real documentation, and no professional experience.

Thank you for raising awareness and getting us to think about WHY we use what we do. In my case, the above is why I still chose XML. If I woke up tomorrow and I was tripping over YAML blogs, how-to articles, support, plug-ins, libraries, and billions of man-hours of experience – I’d switch.

Raymond · June 25, 2008, 12:00am

It’s an ongoing battle and a good topic for conversation. My only qualm is with the reader/commenters who are writing this off as a trivial matter or ‘It’s here to stay. Deal with it.’ You are the reason we’re having the problems. XML isn’t here to stay any more than Fotran was here to stay (read: stay=in the mainstream).

All technology is subject to change and as programmers, designers, and salesmen of the technology we love…you have to ask yourself every once in a while if it is the right thing. To reject reflection on the toolsets and the future is the killer of innovation and discovery, not to mention bordering on the “stay the course” attitude that we’ve all come to love in our political representatives.

There are serious problems with XML, but it’s among most impressively empowering tools available to the modern programmer. Readable? Efficient? Effective? Enabling? Perhaps! Realize two things: 1) XML is approaching ubiquity in mainstream applications 2)We disagree on some properties about XML. These two statements result in a non-trivial argument.

Rich · June 25, 2008, 12:00am

I’ve only recently started reading your missives, but on the subject of XML, I completely agree with you.

I fail to see how (say) a simple text configuration file is improved by wrapping up all the baggage of XML tags, who’s only purpose seems to be to make life harder when it comes to parsing the file.

Yes, there are “standard” XML parses for pretty much all mainstream languages these days, but that doesn’t make it right. Why use a big parsing library for something that could be done in a few lines of [insert your language here] if only the text file was simpler? It doesn’t make sense.

stonemetal · June 25, 2008, 12:00am

Yeah, lets stick with punch cards they are standard and everyone can read them.

As the main point of the article is THINK don’t do stupid stuff. It is rather funny that most of the comments are don’t make me think let me have my crummy old XML.

Jheriko · June 25, 2008, 12:00am

xml, eh? don’t get me started.

if only the old win 3.1 developers would have just kept their .ini files in the right places… now we have windows registries and xml files everywhere…

thats the thing with stuff like xml, once it gets used to solve a problem which stemmed from bad design/implementation to begin with… it is automatically being used for the wrong reasons.

my opinion anyway…

personally i’m all for raw binary data in a specified format… its most efficient to store and read, and unless the developer makes a mistake you won’t have any problems. in practice though using an xml library is faster, even if it produces an inferior quality result for the end user…