XML's growing pains

XML 1.0 was developed by a roundtable of SGML veterans, the worthy founding elders of the phenomenon that XML has become. The result of their initial burst of energy was the first generation of XML technologies. The elders always said that XML 1.0 was but one part of a foundational trio comprising XML Core (syntax), XML Stylesheets and XML Linking, but it didn't take long for things to spiral out of control.

Some influential early voices wanted vocabulary namespace capabilities, so XML Namespaces were born. The effects of namespaces combined with other considerations to drive the development of an XML schema language to supersede DTDs, so W3C XML Schemas (WXS) were born. The programming folk wanted a standardized API for XML, so DOM was born. The Stylesheet activity split into two groups: one to create an XML transformation language and another to develop a presentation semantics language, so XPath, XSLT and XSL-FO were born. The needs for linking diversified, so XML Base, XInclude, XPointer and XLink were born. People argued that a basic data model was needed to try to make some sense of it all, so the XML Infoset was born.

Despite this unexpected, banyan tree-like growth in W3C specifications, the first XML generation was a sure success. For this, we can thank that generation's stars: XML 1.0, XPath 1.0 and XSLT 1.0. Many of the rest of this generation were deeply flawed by either fundamental architectural problems (DOM, XLink), subtleties that ended up propagating complexity (Namespaces), or just innate complexity (WXS).

The seeds of trouble laid by the initial growth of XML specs have sprouted into a very troubled second generation. Burgeoning complexity is already a worry, and now it's quite out of control. Even the concise versions of XQuery 1.0, XPath 2.0 and XSLT 2.0 are dictionary-sized tomes. DOM, currently up to "Level 3," has enough facets and nuances to entertain a legal clerk.

XML 1.1 is not really much more complex than its forbear, but it has run into problems reconciling its benefits (improved markup support for some language communities) and the inevitably huge costs of any version bump in XML core syntax. A much-disparaged change in the definition of white space, which accommodated IBM mainframe users, didn't help matters.

The second generation of specs really has no stars, and I don't foresee any emerging anytime soon. WXS 1.1, for instance, promises to be a minor tweak of the original, addressing none of the problems of 1.0's rigid complexity. But maybe the lack of stars is a good thing. Many have argued that the second generation has tramped too eagerly on the heels of the first, and that we all could have used breathing room in order to allow industry practice and experience to chart out subsequent specifications with better clarity and perspective.

Many highlight the fact that the stars of the first generation were the work of relatively small groups, such as the elders of XML 1.0 itself, and that the second generation is characterized by enormous committees. XML's success means high stakes for software vendors in trying to write their strategies into the most prominent XML specifications. This swells the committees and leads to the compounding of ugly compromises, sometimes clearing the way for the very vendors who caused the problem to distance themselves from the flawed result.

It's possible that the next successful generation of XML technology will not even come from the W3C. One group serving (modest) notice of a challenge is the working group for ISO Document Schema Definition Languages (DSDL). This is a small group, featuring some of the great elders of XML's first generation, and developing what is ostensibly a composite standard for expressing XML schema. But the standard really serves more as an overall alternate strategy in XML's development. It takes individual specifications from expert individuals or small groups, each of which addresses a very well-defined and well-bounded problem domain. In this way, it avoids rigid complexity within each specification while providing the power to address complex problems. Some parts of DSDL, including RELAX NG and Schematron, already have traction on their own, but most parts are still works in progress. DSDL certainly adds up to a very promising framework that could easily steal the XML show if W3C work continues on its present course.

I don't think XML's growing pains are a cause for alarm for users and developers just yet. There is always the brilliant first generation to fall back on. So, stick with XML 1.0, XPath 1.0 and XSLT 1.0 until you need benefits of the various 1.1s and 2.0s so badly that you can tolerate their great cost. Don't just assume it's better to gravitate to the higher version number. With practice and experience, the XML community will eventually clean house, and probably bring about the next great generation of XML technology.

About the Author

Uche Ogbuji is a consultant and co-founder at Fourthought Inc. in Boulder, Colo. He may be contacted at [email protected].