XML: The Angle Bracket Tax

I think the one thing that’s missing from that argument, however, is that XML is much easier to validate. If you didn’t have xsd, I’d agree with you, but without it there’s no way of validating data in a file (or stream, or anywhere else you get plain text data) without manually parsing and validating it. In code (as far as I can tell).

And then if your validation criteria changes, you’re back to unpicking your predecessor’s hokey undocumented parser and validator and then trying to spot-weld in your extra logic. And then recompile. And then (depending on what kind of change-controlled environment you work in) jump through the hoops to get it deployed.

I could well be wrong, though.

XML is like violence: if it doesn’t solve your problem, you’re not using enough of it. :wink:

XML is also, just like violence, something the world could do without :wink:

Well said.

I think that XML isn’t really appropriate for many of the applications for which it is being applied. One of the problems is that XML is flexible enough to be turned into about anything, whether that makes any sense or not. There are many good uses for XML out there, but I’m afraid that the many poor uses will prejudice people against it.

"I have to work with a 10000rows wsdl file.

And sometimes i have to look at SOAP messages that contain very simple info, but the XML makes my eyes burn and head explode…

This YAML seems to be quite interesting and human friendly"

If you actualy have to look at a wsdl file you are doing something wrong. If you are manually parsing SOAP message you are doing something VERY wrong.

I think the problem is Jeff and a lot of the people on this board have never programmed enterprise applications. They are used to programming simple web2.0 websites that are self-contained. You have your little stock ticker program that needs data asynchronous from a stock web service. Sure, for this simple problems JSON is a better solution. But what if your web service also needs to be consumed by a data processing application. Going to still use JSON? HaHaHa.

we are designing an entire reporting system around XML control files. One engine to “rule them all” and small XML files containing everyting we need in the report. So far, it has been a nice solution and we wrote a generator to create the XML. No more digging in code looking for where to change the header width or column order… just load the XML into the generator, make your change, and BAM! instant report update.

Of course it isn’t released or even in alpha yet, but “Works on My Machine”

There’s also JSON notation, which some call the new, fat-free alternative to XML, though this is still hotly debated.

There’s also another cool thing: JSON is mostly a subset of YAML (there are a few small differences, see http://redhanded.hobix.com/inspect/jsonCloserToYamlButNoCigarThanksAlotWhitespace.html, but it’s overall compatible). This means that it’s fairly easy to start with JSON and jump to YAML if the structure is too complicated for JSON.

At least we’re not stuck with ASN.1.

ASN.1 is ok, as long as you don’t have to create or parse it by hand. But then again, ASN.1 is not supposed to be hand-parsed. And you’ll note that XML is the same, it’s just that XML is (supposedly) human-readable, and every language has XML serializers and deserializers while Erlang is one of the few languages with an ASN.1 encoder/decoder smack right in its stdlib.

XML became the default because of its flexibility in data formatting. And, because it has become so ubiquitous, almost all programming languages have built in ways of easily parsing XML. In fact, I do almost all of my web output using XML and then use XSL style sheets to transform it into HTML. I remember some blogger, can’t remember his name, blathering on about MVC and how you should make your output “skinable”. Well, if you produce XML output, your webpages are extremely skinable.

The problem is that XML maybe very computer friendly, but is not too human friendly. Most people will easily agree with that. However, there are dozens of GUI oriented XML editors that make reading and writing XML much easier. I’ve even written a 10 or so line Perl script that converts files from YAML to XML and back. (Yes, Perl. What do you expect from someone who uses VI as their main program editor).

XML is not really the problem. It is an excellent and extremely flexible data format. The problem is our attempt to read and write directly from XML when there are many excellent tools that can help us with the task. After all, you don’t expect to read and write Microsoft Word documents using a standard text editor. Why should XML be all that different?

I’m not a fan of using YAML as a data formatting tool because it doesn’t go far enough to solving the problem. YAML becomes unreadable when your data becomes more complex and there are very few development tools that can parse YAML files. It’s silly to come up with another inferior data format to XML which doesn’t really tackle the main issue of human data readability when there are few programming tools to read and write it. You’re better off using one of the wide variety of GUI XML editors that can make your task much easier.

After all, how many developers use IDEs to help them program even though almost all programming code is in text and could be done (in theory) using Notepad?

I’m afraid I couldn’t disagree more. No, XML isn’t the easiest to read (by humans) of all the infinite number of alternatives out there. No, XML isn’t the most efficient in terms of space. And yes, perhaps it has been forced into places it was never intended to go. But you miss what I think is the most important point: it is rapidly becoming a standard way of representing information. I would argue the value of having a standard far outweighs the inefficiencies in most cases.

Take a simple example of a configuration file that some application will need for saving user information. We’ve all been there, making up an ad hoc scheme for saving whatever needs to be saved. Then building a little parser to read and write the data in that form. And over time our little config file grows and changes. Someday, a new programmer joins the team and has to deal with this file. What are the construction rules again? Where can a new item be added that won’t break the little parser? How much time has been expended over the life of the application in building, modifying, and fixing that bit of parser code as things needed to change?

There are numerous XML parsers available that are robust and free. They all work pretty much the same way (with a few exceptions that I’d call bugs in implementation). I don’t want to write little parsers anymore. I want to use something that is already written and works.

The same argument can be made with respect to the other tools that are widely available to deal with XML-encoded data:
– XSD can be used to insure the integrity of the XML file before your program starts to slurp in the data in the file. This can be critical in B2B situations like banking or ordering from a supplier.
– XSLT can be used to do arbitrary transformations on the data (in a standard way) to produce files of any format that is convenient on the data consumer’s end of the exchange. I do a lot of this sort of transformation work–none of it for web pages–and I can vouch for the power and convenience of having a standard transformation language.
– XML/XSL authoring and editing tools abound. There are tools that will produce an editable visual representation of a schema (a real boon if you need to capture complex data in a text file). Most of these tools will do much of the work of editing XML files and will help you to construct correct XML with prompts and intellisense-like prompts.

I’m a big fan of XML. No it certainly isn’t the very best that we could do but it is a quantum leap better than what we had before–custom representations for everything. If there is one single improvement we could make to advance the art of programming today, I’d vote that it’s STANDARDS. We don’t have to wait for perfect standards to emerge (they won’t) but we do have to get to the point where we can agree. XML is a step in the right direction.

Thankfully, someone will be implementing YAML into Boost.Serialization this summer.

http://code.google.com/soc/2008/boost/appinfo.html?csaid=BE3EEB904A90B03A

but without it there’s no way of validating data in a file (or stream, or anywhere else you get plain text data) without manually parsing and validating it

Actually,

  1. Even in XML there are other (far better, especially on the readability front) schema languages/systems than XSD (RelaxNG, Schematron)
  2. Schema languages/specs are starting to appear for e.g. JSON (Cerny, json-schema)
  3. JSON documents are very often orders of magnitude simpler than their XML counterparts, thus validation becomes almost trivial and often doesn’t require a full-blown schema language.
  4. Manually parsing and validating a JSON document isn’t really hard with a dynamic language.

Thank you for addressing some of my concerns regarding this sacred cow!

“it is rapidly becoming a standard way of representing information”

It is hardly more a ‘standard way’ of representing information than ASCII (or UTF8, UTF16, etc.) Yes, anyone can write a file with lots of angle brackets, and parsers can easily turn that back into tokens, but the semantics of the file remain application-dependant in almost every example of (bad) XML usage I’ve ever seen.

“Take a simple example of a configuration file that some application will need for saving user information. We’ve all been there, making up an ad hoc scheme for saving whatever needs to be saved.”

Er, YOU might have been, but the rest of us are familiar with a small number of pretty common configuration formats that are trivial (i.e. easier than XML) to parse.

“XSD can be used to insure the integrity of the XML file”

Yes, for a very limited meaning of the word “integrity”.

“XML/XSL authoring and editing tools abound”

And text editors are ‘abounder’.

“We don’t have to wait for perfect standards to emerge (they won’t) but we do have to get to the point where we can agree. XML is a step in the right direction.”

OK, if we can get the billion different languages floating around reduced to maybe less than a hundred or so, I agree with you :slight_smile:

I’ve also been critical of XML ever since i had to start working with it. I’m coming from Lua where a configuration file is simply a Lua script. If you got an error in the script, you’ll get an error message from the Lua interpreter.

Now, if you have the same configuration file in XML format, and NO validation as it is usually the case, you can get a list of problems reading this in your code:

  • program crashes
  • program says: “error reading config file”
  • program starts but uses default settings for all configurable features
  • program starts but uses default settings for a subset/single feature
  • something else entirely …

Yes, this is only in the narrow “configuration file” scenario but that’s just one where i think XML is totally overused and/or under-validated.

Btw, what ever happened to INI files? :wink:

For scripting languages it’s handy to have the config files
written in the language itself. For example here is python
config file for a program I wrote:

http://www.pixelbeat.org/programs/Tira-2/toppy.tira2

which can be parsed trivially with: config = eval(open(config_filename).read())

I’m not a big fan of XML, but think it’s OK in some scenarios. Unlike Jeff, though, I’m going to single out SOAP. We already have many perfectly good syntaxes for procedure call. SOAP is a product of the “insane complexity” one of the Google founders talked about. With a million simple, concise syntaxes for procedure call out there, why do we end up with this complex unreadable monster? How about “Currency GetLastTradePrice(“DIS”)”?

But you miss what I think is the most important point: it is rapidly becoming a standard way of representing information.

The problem is that XML is NOT a way of representing information. It’s at best a way of building an information representation structure, XML doesn’t represent anything.

I would argue the value of having a standard far outweighs the inefficiencies in most cases.

XML is not a standard for anybody but marketrod. One of Erik Naggum’s numerous quotes about XML comes to mind here:

Structure is nothing if it is all you’ve got. Skeletons spook people if they try to walk around on their own; I really wonder why XML does not.

Take a simple example of a configuration file that some application will need for saving user information.

Wow, a non-sequitur already? The problem here is not “hey they’re not using XML” but the reinvention of the wheel. There are, and were before XML, numerous formats that could be used for representing a conf file. XML is barely an answer here, and one that is usually misused to insert one more buzzword in a press release.

I don’t want to write little parsers anymore. I want to use something that is already written and works.

Guess what? There are numerous JSON and YAML parsers available for most popular languages. You don’t have to write little parsers if you don’t want to, and you haven’t needed to since long before XML.

XSD can be used to insure the integrity of the XML file before your program starts to slurp in the data in the file.

As I said above, there are schema languages for JSON. And I really don’t understand why every person who talks about XML schema languages just has to pick the most verbose, unreadable and annoying one of the bunch.

XSLT can be used to do arbitrary transformations on the data

So can any regular language, the only advantage this crippled, dumbed down, annoying language called XSLT has over others is that it’s written in XML.

Wow, paint me impressed.

And yes, I have used XSLT, I’ve spent the better half of my days in it during a whole year. I know and understand the thing, and I still hate it, I’d take HaXml or HXT over it any day of the week if I was the one to choose.

XML/XSL authoring and editing tools abound.

And mostly show how misguided XML is in the first place.

As for XML editors … i’d like to know which ones are considered “good”?

I have tried several and either they are complex beasts of applications that try to satisfy every possible XML need you might have (Altova XMLSpy comes to mind), or they are very simple editors that let you edit the XML as tree and other forms but not much else (forgot the name).

The former simply have too much of a learning curve to be useful for all people working with XML in our company (and too expensive, too). The latter is simply not powerful enough or it’s usability just feels “odd” enough not to encourage people to use it over plain text editors (with syntax highlighting).

I agree in part. There are plenty of situations where XML should never go, and some people use it in incredibly wrong and stupid ways but its not all bad.

Then again it seems software developers are like this, case in point: GOTO

Perfectly acceptable as long as it is done right, developers used it inappropriately and they demonized it as never being the right answer.

For one precious moment it looked as if the world had actually standardized on a data and metadata interchange format

XML is not a format, it’s a format representation, it has no meaning in and of itself and thus nothing was “standardized” for any value of “standardized” worth talking about.

Not to mention, long before the XML marketting blitz by the likes of IBM and Sun, there were ASN.1 or INI file, standards if there ever were any.

I realize it’s not the most ideal tool for your social shopping cart 2.0 AJAX app. You’d rather use REST.

Thanks for showing your incompetence and lack of comprehension of the topic, it’s appreciated.

Just so you know, REST is orthogonal to the documentation representation used, you can use REST with JSON, with YAML, with plain text, with HTML (guess what, you do every time you access a web page) or with XML. Nice try, no sugar.

Ooh look, I have a XML-parser with a read and write method. I can dump all sorts of objects in it, save them and retrieve them again. Hmm, ideal for config files. And high scores. UI definitions. Actually, ideal for pretty much everything I like to store which doesn’t have to go in a database. Uhm yes, my ints come out as int and my lists come out as list, it’s pretty amazing really.

Sure, if it’s a plain textfile then I save it as plain text. And an image for example can sit neatly in an images directory. For everything else there’s databases and XML.

Code comments as XML? That must be a joke and there are plenty of other jokes around. But in general: KISS and don’t re-invent the wheel.