XML Bad Practices

Naïve Versioning

Robin Berjon

2009-12-21

Versioning, as explained previously, is important enough that it deserves to be done right. Yet some languages have taken a rather naïve approach to it typically consisting in a version attribute on the root element or other such simplistic schemes built on the presence of a version indicator. That is fine if the purpose is to die immediately when a given version is not supported (in which case simply changing the namespace would be less verbose and just as effective), but will not produce any useful effect if the intent is to allow processors to work across versions.

Indeed, what is such a processor to do if it sees a version attribute with a value greater than the version it supports? Nothing useful comes to mind, short of warning the user that there may be rendering issues, a message which said user will either ignore, or will cause him to panic, but will not yield any useful result. Conversely, if the version attribute points to an earlier version, should features from later versions be ignored? That would make implementations unduly complex.

Furthermore, if a language is extended in a modular fashion rather than through linear versions, this approach breaks down with the complexity of specifying the modules in use and their respective versions.

When producing content, it is easily admitted that using the smallest possible version that includes all of the needed functionality is a good practice as it will enable the largest usage by older implementations. But doing so properly requires authors to know for a given list of language constructs which is the lowest version number that comprises them all. That is asking a lot, and in practice authors will likely fall back in such situations to using the highest version number that they can get away with. Either way, version information will often be out of synch (either through error, or organic growth, or because the content is composed from multiple sources) with the actual content. This tendency is strong enough that relying on version metadata is largely useless, unless applied to content that is exclusively produced under tight control by programmatic means — a situation that is exceedingly rare for web formats.

But even without consideration that include hard to demonstrate behaviour from putative users, the following very short decision tree can be followed when looking at whether a version indicator would be useful:

Are processors expected to process content across version boundaries?

No.: Then each version is actually a different language (i.e. it is not mutually intelligible). Just change the namespace (or if there is no namespace, any other global indicator such as the root element or media type and extension). You don't need a version indicator.
Yes.: The processors will have to be defined so that they can apply lacunae values, language-level error handling, and other similar rules intended to render unknown constructs sufficiently intelligible. There is nothing which a version indicator could add on top of what they already do. You don't need a version indicator.

As simple as it seems, a zombie of this debate will almost always strike back whenever a new group convenes — and will always lead to there being no version indicator after much acrimony. I anyone knows of a silver bullet that could skip the acrimony, I'm all ears.

This article is part of a series on XML Bad Practices.

I always welcome feedback: @robin.berjon.com, @robin@mastodon.social, robin@berjon.com.