XML class warfare
It seems there is no escaping class warfare. XML is a young society, but it
is already succumbing to age-old divisions. The most direct roots of XML are in
the processing of text documents, and XML is fundamentally a text-processing
format. Many XML users, however, come from backgrounds in relational databases
and object-oriented development. To some, XML is the perfect technology for
managing problems of interoperability and data rot that have plagued programmers
Consider this: If you want to export a floating point value from C++, Java or
a SQL database to XML, you can simply place ''1.0'' into the body of an element.
But how do you then indicate that this value is actually a floating point value
and not a mere string that happens to look like one?
You might wish you had a way to mark such a value to be of the particular
floating point-type defined in your original system. The W3C XML Schema (WXS)
working group decided to address such requirements by building in to schemas the
ability to attach data types to XML data. Rather than cobble together one set of
data types that matches Java's native types, another set that matches SQL native
types and so on, the WXS Data Types (WXSDT) are defined as a single type library
that approximates native types from various languages and platforms. The
standardization of WXSDT established the gentry in XML -- those users who prefer
XML data to be neatly organized according to fixed classes and types.
But many XML users still come from the document and text-processing world.
Some of the most successful XML vocabularies, such as XHTML, SVG and RSS, do not
have much to do with data types -- they just deal in plain text.
Even among those who wish to use XML in conjunction with traditional
programming systems, some prefer to minimize the coupling between the data in
XML and the associated programming language values. This means that when they
write ''1.0'' to the XML document, it's just a string as far as they're concerned,
and only a few very specialized portions of the processing need to be concerned
that the string can be interpreted as a floating point number.
These users make up the faction of XML bohemians. They are more concerned
with the text content of XML data than they are with any class or type that
might be associated. I count myself firmly among the bohemians.
For a while, these two groups have rubbed elbows in the XML community with a
great deal of tension, but with little outright conflict and friction. But this
uneasy peace has come to an end. The main battleground is that dutiful engine of
so much XML processing, XPath.
XSLT and XPath are both designed from the bohemian point of view. They deal with
text and only worry about the type of the data in a few specialized corners,
such as the sum() function in XPath and the element
The gentry, however, would like more class consciousness from these workhorse
technologies. They reason that if they have gone to the trouble of specifying in
the schema that ''1.0'' represents a floating point number, the XPath and XSLT
processors should make this information available, and the processor should use
such type information far more broadly. The gentry take the view that such
capabilities should be built into the foundations of XPath and XSLT.
The XPath 2.0 and XSLT 2.0 drafts are manifestations of this gentrification.
They build in extensive facilities for handling WXSDT. Not by coincidence, these
specifications are several times larger and more complex than the 1.0 generation
of specifications. This added complexity has spurred the bohemians to arms. They
are dismayed to see some of the most useful and successful XML technologies
compromised by a desire for high-class amenities that not everyone wants, or
wants to pay for.
The bohemians argue that the XPath and XSLT committees are out of control,
and that, at the very least, added facilities for WXSDT should be separated into
optional modules. They also complain that WXSDT should not be privileged,
pointing to areas that WXSDT does not cover -- such as geospatial data or color
codes -- or covers rather sloppily, such as dates and times.
The bohemians have rallied (for the most part) behind a powerful champion
-- RELAX NG -- which is an XML schema standard that competes with WXS. It
is designed more for document-style XML than for XML born in programming data.
It supports type annotations, but only as separate optional modules (which can
include WXSDT). The bohemians insist that the next-generation XML technologies
should not only learn from RELAX NG's isolation of class consciousness, but
should avoid bias toward WXS, supporting RELAX NG and other alternatives as
well. The battle rages on at present.
Certainly, if you want your data to outlast your code, and to be more
portable to unforeseen, future uses, you would do well to lower your own level
of class consciousness. Strong data typing in XML tends to pigeonhole data to
specific tools, environments and situations. This often raises the total cost of
managing that data.
Not everyone will decide to join me in the ranks of the bohemians, but it
seems clear that XML is most likely to prosper if its tools continue to be
tolerant of varying attitudes toward class.
To read more columns by Uche Ogbuji, click here.
Uche Ogbuji is a consultant and co-founder
at Fourthought Inc. in Boulder, Colo.
He may be contacted at email@example.com.