Web Development as Tag Soup

Jeff_Tucker · July 21, 2008, 12:00am

Here’s the solution: learn HTML and quit relying on frameworks to do something that you can’t do yourself. http://agilology.blogspot.com/2008/07/tag-soup-sucks-hey-jeff-heres-better.html

MathStuf · July 21, 2008, 12:00am

There’s another scripting language that I’ve found that Falcon (http://www.falconpl.org/) to be interesting. I haven’t gotten to use it yet (other things are getting in the way ), but the site uses Falcon and you you can see the source of any page. I think it’s very clean. Of course, anything can be made ugly, so it still has to be done right.

Anup · July 21, 2008, 12:00am

A number of people mentioned XSLT. We have used it quite well for a number of years with some large clients. I try to make a small case for it in Views here:

http://www.onenaught.com/posts/8/xslt-in-server-side-web-frameworks

In short, it is not for everyone, but it can be very useful and encourages good application design.

A couple of comments also talked about how verbose XSLT is. Some of the example above were quite verbose themselves and can be a lot shorter making the XSLT far easier to read (e.g. instead of xsl:attribute, just write the attribute itself and use curly brace short hands, e.g. div class={$someVariableName}.

What is nice about the XSLT approach is that you can create a master-template/nested master template kind of thing as well, letting you reuse templates even more.

EugeneK · July 21, 2008, 12:00am

You’re right in that the main example presented is pretty horrible. The syntax coloring makes it somewhat more readable, but you obviously don’t want to rely on that.

I mostly do Django when it comes to web development, which several people have already mentioned as generally having a clean template language. A nice feature of it is the ability to extend templates and override some or all of the various sections.

For example, a sub-template might have:

{% extends base_generic.html %}
{% block title %}{{ section.title }}{% endblock %}
…

Whereas the base_generic.html template would have:

…
head
title{% block title %}Default title text{% endblock %}/title
…

This allows you to at least break up and abstract the tag soup slightly.

JustinC · July 21, 2008, 12:00am

This isn’t just an HTML problem… this is also a code generation problem in general. I’ve been working on an open source project called NBusiness (http://www.codeplex.com/NBusiness) and it is a DSL that uses templates to generate code for you. The templates I am making by default will use NVelocity (http://nvelocity.sourceforge.net/) to generate code for you and it suffers from the exact same problem (perhaps worse because you can’t really make controls).

The other obvious option is to generate all of your code using the CodeDom which sucks so bad it makes tag soup seem wonderful.

Honestly, I think this only becomes a preferable solution when you NEED to control your output precisely. I mean, yes tag soup in ASP MVC is painful but so is designing CSS that is compatible with the crap HTML output of most of the default ASP controls… so it becomes preferable only when you need to control your HTML output.

I think the statelessness of the web really plays a part into the difficulty of generating HTML too.

Christian8 · July 21, 2008, 12:00am

I’m surprised you’re reporting this issue. I honestly am. I actually thought this is one of the problems reasonably solved pretty much everywhere by now. At least my daily work perception is that way.

I’ve searched through the responses on your blog entry for Zope and found a couple of references (some of them wrong, some of them enthusiastic but not directly on the point).

I’m a core developer and ‘user’ of the Zope framework and ‘we’ usually have the perception issue that we’re solving problems that no-one has or providing tools for complexity levels nobody goes to.

We’ve had a technology call (Zope) Page Templates since about 2003 which allows to create pull templates by:

Annotating the HTML/XML with attributes and tags from a separate namespace
Have a parser/interpreter for those attributes that modifies the DOM of the HTML/XML
Serializes it back.

Here’s an example:

p tal:content=string:foo/

li
ul tal:repeat=x python:[1,2,3]
tal:content=x /
/li

The language is probably turing complete (although that’s not its purpose) and allows for more complexity than is usually wanted.

The expressions are written in TALES, the Template Attribute Language Expression Syntax which allow simple statements of various
(registerable) types.

Within the expressions, a few top level variables can be defined. Ususally those correspond to your application’s current model and/or view class.

Using view classes you typically avoid putting complex Python code into the template but reduce the attribute expressions to shorter attribute/function lookups:

dl
ddUser name/dd
dt tal:content=view/current_user/
/dl

This has been the standard of my work environment for about 5 years now.

Kev · July 21, 2008, 12:00am

Shouldn’t tag soup in MVC be less of a problem/excusable because we have server side controls ?

In ASP.NET Forms there is still the possibility of making up a batch of tag soup but we don’t because we use controls.

Or am I missing something important about MVC here?

James_Avery · July 21, 2008, 12:00am

I think there are two answers here:

Write better markup. In the typo example you could move lots of that code to a model or helper, especially the URL concats, date format, etc. If you move all that to the model it is much more readable.
Get rid of the HTML part. I love a simple markup technology called HAML. It started on Ruby but there is a NHaml as well. Here is an example from my MVC app using HAML:

#roomdetails
-foreach(Model.Vip vip in room.Vips)
#roomdetail
%a{href=vip.Url,class=borderit}
%img{src=Network.Current.StaticPublisherAddress + vip.Image, class=left}
.viptitle
%a{href=vip.Url}
= vip.Title
.vipdesc
= vip.Bio
.clearleft

HAML has to be indented correctly btw, the spaces/tabs were removed from my comment though, you can see it properly here:

http://infozerk.com/averyblog/new-lounge-front-end-now-with-asp-net-mvc-and-nhaml/

jheriko20 · July 21, 2008, 12:00am

personally, I’m a PHP developer when it comes to webstuff and my (terrible) solution to this problem is to write a pure block of PHP code which outputs the page without using the (convenient) functionality of writing HTML with PHP interspersed. It makes the HTML harder to read in the PHP code file, but all I need to do is view the page source with a browser or other tool once the page is up if I need to see the HTML. I generally insert tabs and new lines to help this… although I will admit I am a bit lazy as doing a view source on the pages on my website will reveal.

So from my perspective, tag soup is a problem the developer makes for himself… at least for PHP. I’d actually need to know something about ASP to comment on it though…

StaceyR · July 21, 2008, 12:00am

I’ve been working on an internal web application for a client for some time now in php.

I’ve found the easiest way to remove tag soup is write each page entirely in php, using functions like:

property(one, Email, element_input_text(email, $inspector-email))

This function returns all the html tags required to display a nice input for entering an inspector’s email address.

I string all these functions inside functions together inside the function template(), which creates all the html for the basic framework of each page (title, breadcrumbs, menus, header, footer, and buttons).

I indent these functions in much the same way as I use to indent html so it’s easy to see how deep in the html I am.

No tag soup and parenthesis matching in my IDE ensures I don’t leave out close tags.

John_Meyer · July 21, 2008, 12:00am

This is one place I think shows a great strength of using an MVC pattern. If you put your data in a presentation model, you can put most of your view logic in there, then (using ASP.NET MVC syntax here) the only thing you have in your HTML is %= ViewData.Model.SomeProperty % and perhaps an if using a boolean property of the model or a loop using an IEnumerableT property of the model.

Vincent · July 21, 2008, 12:00am

I don’t think mixing HTML and server code is a problem in general ; in fact, if your site isn’t trivial you end up with some complex logic about loops, attributes, content, conditionals, and so on. Using even more abstraction is just plain wrong, because:

you generally want to control what the output HTML will look like, so forget about a ‘builder’ tool, like a series of doc.addParagraph(…), or doc.addTag(P,content,attributes) … it’s Ok for passing data through some XML markup, but HTML documents are generally to complex to keep a clear idea of what the output will be this way. Customizing the layout is going to be a real mess.
You want to use a ‘template’ library to avoid using too much logic inside your markup. This ends up being a joke, because you still want some logic, and you end up re-inventing a worse language that has to do loops, conditionnals, formatting, while your base language has certainly a better, more elegant syntax ( that’s why I think Smarty is generally a waste of time ).
simpler intermediary language, like some XML markup then transformed to HTML through XLST leads you to the same pitfalls. You think you can avoid having logic in your markup, you end up reinventing another language, having to learn that language, learn some other technologies, to find out you get even more complexity.

The problem with tag soup is just the very same of any language in any situation when you lazily prefer writing spaghetti code than creating meaningful functions, explicit variable names, clear indentation, well divided problem solving, and so on. Any programming language embedded in HTML is OK for me, anything else is most probably a waste of time. What you really need is to do all the complex stuff outside that embedded HTML file : do the business stuff and read all the records you need in the controller, and when you need complex formatting or logic rules, do it in helper functions that you’ll define in a library accessible from your embedded html. In that embedded HTML, use your primary language, except everything must be very concise and trivial. Anything non-trivial as to be in outside helper functions with meaningful names. How do we program in general ? When we write a method we make sure all the code is about the microproblem you’re trying to solve through that method, and anything too complex has to be in other methods, right ? That’s just the exact same thing happening here. Being in the middle of HTML markup doesn’t mean you can forget those programming rules. It’s lame to blame your programming language when you put yourself in a mess, and that applies here too.

So, for your typo example, it’s not that bad, really. The first row and even/odd rows are recurrent problems that can be replaced by simpler helper functions, where counting and choosing the right string happen. Using ruby blocks it can even be rather more elegant than any half-broken template engine with poor syntax. Those temp variables could have been moved to helper functions. Generating links and labels could have been done more concisely, too.

That typo example is quite messy, fair enough, but that doesn’t mean you can’t clean it up. Don’t throw out the baby with the bath water.

Your room is messy ? Hey, just buy some shelves. Burning everything isn’t a solution, if you don’t get the lesson you’ll just get in the same mess again and again, burying your lazyness in layers and layers of dirty stuff you never wanted but that at least you can blame.

BrianW · July 21, 2008, 12:00am

If we have a ton of logic mixed in with an html (or rhtml or phtml or whatever) file, it might be time to start thinking about giving some of that logic back to the controller and splitting some of your potential view options into separate files, instead of trying to make one template file the do-all for a particular part of your site. If you have a bunch of conditionals in your template file, maybe you need multiple template files, and the code that they have in common can be put into one reusable file. Many times, it seems like we do the opposite.

Chui · July 21, 2008, 12:00am

There’s only so much form factor in text-based source code, that you either take a tag soup (as in ASP/JSP), or attribute soup (as in TAL). Lispish language offers the quasiquote, which comes pretty close to ASP/JSP, but maintaining some semblance of syntax sanity.

If only HTML5 is more like WPF, we’d have a clean separation of machine readable data and templates being processed on the client-side.

Ted · July 21, 2008, 12:00am

Ditto on HAML. It’s HTMLegant!

BrianW · July 21, 2008, 12:00am

To the PHP guys suggesting that you let your PHP spit out your markup:

This is fine if you’re the only person who’s ever going to adjust the view, but it makes the web developer the bottleneck, and it turns something that should be a fairly simple, mostly-static operation into a task for your application instead of a simple import.

Mauro · July 21, 2008, 12:00am

Normally to get around the tag soup I use literals and programmatically add the code there, you can then easily add a literal to a panel and presto, properly structured code without having to inject html in your code behind or inject code in your html.

Matt · July 21, 2008, 12:00am

A lot of tag soup can be eliminated by using functions instead of coding every last thing onto every last HTML element/attribute.

VictorE · July 21, 2008, 12:00am

Ever since I read it, I’ve sworn by 4 Layers of Separation (http://particletree.com/features/4-layers-of-separation/) - Now I put all my content in XHTML files, headers, footers and the like in XSLT files, transformation code in PHP, and JavaScript / CSS in separate files. It’s extremely flexible.

PaulA · July 21, 2008, 12:00am

When it comes to messy mixes of markup and template code, HTML forms can be about the worst case.

In our codebase at work, we write forms in pure XHTML - with little or no template code in them. Then in the controller code, we extract the form from the view output and parse it into a form-centric DOM-like structure. From there, the request parameters can be loaded into the form, the DOM can be freely altered to change input values, add select options etc, and validation rules can be attached to fields. The resulting DOM is then serialized back into HTML and injected into the view - with validation errors when relevant.

I find this is an excellent way to deal with some of the worst culprits of the kind of tag/code soup you’re talking about. It’s pretty much the reverse of the more common form builder approach, which I’ve never found fun or productive to work with.