XML Bad Practices
Not Using A Namespace
XML namespaces are one of the most hated aspects of the XML family. Not even XML Schema has received as much contempt, and it needed a lot more work and far longer specifications in order to get there. Maybe there will be a second version of the XML stack some day, and when that day comes we can hopefully address namespaces in a way that will cause less acrimony. In the meantime, whether you like or dislike them they are what we have. This article is part of a series based the paper on "Designing XML/Web Languages: A Review of Common Mistakes" which I presented at the XML Prague 2009 conference.
One of the biggest mistakes one can make when dealing with namespaces is to not use them. Namespaces are the tool one uses to identify an element (and in some rarer cases other things) as being part of a language. Not using namespaces means that documents in a given vocabulary cannot easily be composed into another as it will then become impossible to distinguish between the inclusion of an another language, an error in the current language, or a future version of that same language. If you think of designing an XML language as being similar to creating a library, not using namespaces is very much like the pollution produced by global variables.
The absence of namespaces also makes querying a mixed document difficult. For instance if
XHTML and SVG were to not have namespaces, they could still be rendered: SVG is always inside
an svg
element when it appears inside XHTML, and XHTML inside SVG is always
inside a foreignObject
element. But since the composition of the two languages
can be done to any depth, if you have SVG inside XHTML inside SVG inside XHTML and so on,
it is going to be difficult to find all the title
elements or all the
font
elements. And since they have different meanings in each vocabulary, getting one for the other
is very likely to cause bugs.
Admittedly, there are cases in which you can forget about namespaces. The parallel is similar to the throwaway script that one writes to perform a single, simple task now and then and never plans to reuse. And if that script does become an important part of a system, starts getting some serious usage, and needs maintenance, it's usually not overly difficult to emulate the old poorly designed interfaces while nicer ones are being shifted in. Remember however that data which is being used is a lot harder to refactor than code. The cost of adding a namespace declaration and writing namespace-aware code is tiny compared to the pain of using a poorly designed data format. So unless it really is for a throwaway document, not using namespaces is a mistake.
What of HTML5 and SVG, MathML, etc.?
The latest plan is for HTML5 to integrate SVG and MathML directly, without use of namespaces. So why is that the right choice?
First and foremost, HTML5 isn't an application of XML. It's its own thing. By that token it doesn't need to play by XML rules, except in the XHTML serialisation.
But perhaps more importantly, its scope can be considered to include SVG and MathML, which were only developed separately because the browser vendors became lethargic after the first Browser War. In other words, HTML, SVG, MathML and possibly a few others should be considered to be just a single language.
Does this entail that all future extensions to HTML should be done in the same manner? If done by W3C, I would contend that yes. If done outside W3C, then I would recommend that people chose whichever extension mechanism they are most comfortable with, so long as they use one, and so long as within the XML serialisation it is namespaces. Any serious extensibility to HTML ought to rely on XBL2 anyway, which in turn can provide the same extension for multiple syntaxes. But that's a topic for another day.