XML Bad Practices
Language Error Handling
In some cases lacunae values and ignoring unknown namespaces will not be enough to recover from errors, for instance because an element appears in the language's namespace that the processor does not know about, or because an element that it knows about appears at the wrong place in the tree.
Again this is a case in which if the rules for processing such errors are not defined, implementations are guaranteed to differ. There are multiple approaches here: flag an error and give up, ignore the element and process what is inside it as if it hadn't been there at all (or process some of it), or ignore the entire subtree contained inside that element.
Deciding which rule to specify can be difficult. In order for a system to be extensible and amenable to versioning, one needs to design it so that it ignores at least some unknowns. But consider a system in which there is an element that is meant to effect a payment, and another one, added later or by a third party, that is expected to cause a service to be rendered. If the payment were effected but the service request ignored with no error being flagged, the processing model is clearly broken. Conversely, if the same message sported an element the intention of which was merely to provide some statistics about purchases, it would be a shame to balk on a perfectly fine transaction for such a triviality.
In a rare concession to elegance, SOAP provides this level of granularity with a
mustUnderstand
attribute that is used to tell such cases apart.
Yet such a solution won't necessarily apply elsewhere, especially if implementation strategies
are
somewhat anarchic or if conformance is not readily enforceable. HTML grew to use a system in
which
unknown elements are ignored but their content is processed. This provides for very useful
fallback
strategies, but did lead to some bumps in the road when script
and
style
were introduced as it caused their content to be displayed in user agents that didn't understand
them.
If in doubt, a processing model (such as SVG's) in which an unknown element is ignored alongside
its
content (even if that content is understandable) is probably the safer bet. An
important
aspect for the language designer to consider here will be the exact definition of what
"ignoring"
means. For instance in SVG a rect
element found inside an unknown
unicorn
element will be ignored for rendering purposes (i.e. it will behave as if its
display
property had been set to none
) but it will still not only appear in the DOM but
also
expose the SVGRectElement
interface.
Smarter rules, such as SMIL's switch
element always seem appealing at first, but
they
tend to prove clumsy to use and the granularity and reliability of the testing they offer are
often too low to make them widely useful. I would tend to believe that a
switch
-like
ability would be more useful at the end of the spectrum where compliance is more readily
enforceable
than at the messier end where most of the Web sits.
Whichever rules you chose are up to you to decide, but if you plan to design a language you have to chose some rules lest interoperability dragons bite your head off, followed closely by hordes of angry users and developers who were stuck reverse-engineering error handling behaviours while their friends were having beer.