Columns

XML class warfare

It seems there is no escaping class warfare. XML is a young society, but it is already succumbing to age-old divisions. The most direct roots of XML are in the processing of text documents, and XML is fundamentally a text-processing format. Many XML users, however, come from backgrounds in relational databases and object-oriented development. To some, XML is the perfect technology for managing problems of interoperability and data rot that have plagued programmers for years.

Consider this: If you want to export a floating point value from C++, Java or a SQL database to XML, you can simply place ''1.0'' into the body of an element. But how do you then indicate that this value is actually a floating point value and not a mere string that happens to look like one?

You might wish you had a way to mark such a value to be of the particular floating point-type defined in your original system. The W3C XML Schema (WXS) working group decided to address such requirements by building in to schemas the ability to attach data types to XML data. Rather than cobble together one set of data types that matches Java's native types, another set that matches SQL native types and so on, the WXS Data Types (WXSDT) are defined as a single type library that approximates native types from various languages and platforms. The standardization of WXSDT established the gentry in XML -- those users who prefer XML data to be neatly organized according to fixed classes and types.

But many XML users still come from the document and text-processing world. Some of the most successful XML vocabularies, such as XHTML, SVG and RSS, do not have much to do with data types -- they just deal in plain text.

Even among those who wish to use XML in conjunction with traditional programming systems, some prefer to minimize the coupling between the data in XML and the associated programming language values. This means that when they write ''1.0'' to the XML document, it's just a string as far as they're concerned, and only a few very specialized portions of the processing need to be concerned that the string can be interpreted as a floating point number.

These users make up the faction of XML bohemians. They are more concerned with the text content of XML data than they are with any class or type that might be associated. I count myself firmly among the bohemians.

For a while, these two groups have rubbed elbows in the XML community with a great deal of tension, but with little outright conflict and friction. But this uneasy peace has come to an end. The main battleground is that dutiful engine of so much XML processing, XPath.

XSLT and XPath are both designed from the bohemian point of view. They deal with text and only worry about the type of the data in a few specialized corners, such as the sum() function in XPath and the element in XSLT.

The gentry, however, would like more class consciousness from these workhorse technologies. They reason that if they have gone to the trouble of specifying in the schema that ''1.0'' represents a floating point number, the XPath and XSLT processors should make this information available, and the processor should use such type information far more broadly. The gentry take the view that such capabilities should be built into the foundations of XPath and XSLT.

The XPath 2.0 and XSLT 2.0 drafts are manifestations of this gentrification. They build in extensive facilities for handling WXSDT. Not by coincidence, these specifications are several times larger and more complex than the 1.0 generation of specifications. This added complexity has spurred the bohemians to arms. They are dismayed to see some of the most useful and successful XML technologies compromised by a desire for high-class amenities that not everyone wants, or wants to pay for.

The bohemians argue that the XPath and XSLT committees are out of control, and that, at the very least, added facilities for WXSDT should be separated into optional modules. They also complain that WXSDT should not be privileged, pointing to areas that WXSDT does not cover -- such as geospatial data or color codes -- or covers rather sloppily, such as dates and times.

The bohemians have rallied (for the most part) behind a powerful champion -- RELAX NG -- which is an XML schema standard that competes with WXS. It is designed more for document-style XML than for XML born in programming data. It supports type annotations, but only as separate optional modules (which can include WXSDT). The bohemians insist that the next-generation XML technologies should not only learn from RELAX NG's isolation of class consciousness, but should avoid bias toward WXS, supporting RELAX NG and other alternatives as well. The battle rages on at present.

Certainly, if you want your data to outlast your code, and to be more portable to unforeseen, future uses, you would do well to lower your own level of class consciousness. Strong data typing in XML tends to pigeonhole data to specific tools, environments and situations. This often raises the total cost of managing that data.

Not everyone will decide to join me in the ranks of the bohemians, but it seems clear that XML is most likely to prosper if its tools continue to be tolerant of varying attitudes toward class.


....................

To read more columns by Uche Ogbuji, click here.

About the Author

Uche Ogbuji is a consultant and co-founder at Fourthought Inc. in Boulder, Colo. He may be contacted at [email protected].