Robin Berjon

XML Bad Practices

Language Error Handling

In some cases lacunae values and ignoring unknown namespaces will not be enough to recover from errors, for instance because an element appears in the language's namespace that the processor does not know about, or because an element that it knows about appears at the wrong place in the tree.

Again this is a case in which if the rules for processing such errors are not defined, implementations are guaranteed to differ. There are multiple approaches here: flag an error and give up, ignore the element and process what is inside it as if it hadn't been there at all (or process some of it), or ignore the entire subtree contained inside that element.

Deciding which rule to specify can be difficult. In order for a system to be extensible and amenable to versioning, one needs to design it so that it ignores at least some unknowns. But consider a system in which there is an element that is meant to effect a payment, and another one, added later or by a third party, that is expected to cause a service to be rendered. If the payment were effected but the service request ignored with no error being flagged, the processing model is clearly broken. Conversely, if the same message sported an element the intention of which was merely to provide some statistics about purchases, it would be a shame to balk on a perfectly fine transaction for such a triviality.

In a rare concession to elegance, SOAP provides this level of granularity with a mustUnderstand attribute that is used to tell such cases apart.

Yet such a solution won't necessarily apply elsewhere, especially if implementation strategies are somewhat anarchic or if conformance is not readily enforceable. HTML grew to use a system in which unknown elements are ignored but their content is processed. This provides for very useful fallback strategies, but did lead to some bumps in the road when script and style were introduced as it caused their content to be displayed in user agents that didn't understand them.

If in doubt, a processing model (such as SVG's) in which an unknown element is ignored alongside its content (even if that content is understandable) is probably the safer bet. An important aspect for the language designer to consider here will be the exact definition of what "ignoring" means. For instance in SVG a rect element found inside an unknown unicorn element will be ignored for rendering purposes (i.e. it will behave as if its display property had been set to none) but it will still not only appear in the DOM but also expose the SVGRectElement interface.

Smarter rules, such as SMIL's switch element always seem appealing at first, but they tend to prove clumsy to use and the granularity and reliability of the testing they offer are often too low to make them widely useful. I would tend to believe that a switch-like ability would be more useful at the end of the spectrum where compliance is more readily enforceable than at the messier end where most of the Web sits.

Whichever rules you chose are up to you to decide, but if you plan to design a language you have to chose some rules lest interoperability dragons bite your head off, followed closely by hordes of angry users and developers who were stuck reverse-engineering error handling behaviours while their friends were having beer.

This article is part of a series on XML Bad Practices.