Columns
Is XML the answer? Depends on the question.
- By Michael Goulde
- June 4, 2001
A common joke around my office is that any problem in the world can be fixed
using some combination of duct tape and the eXtensible Markup Language (XML).
Case in point: A recent television program ran a profile of an inventor who
created a bear-proof suit of armor held together with duct tape. Although XML
wasn't used in this particular example, developers are using the technology
as a component in solutions to some of the more vexing and persistent challenges
facing IT today.
Many of the most difficult tasks lie in organizing and delivering
information across the Web, as well as in exchanging information among incompatible
systems and applications.
XML provides a syntax that can be used to construct messages that can be understood
by any application. This makes data movement much simpler, and requires far
less effort than having to code unique interfaces between each application.
This is possible because XML is text-based, self-describing and gaining acceptance
as a global standard.
One of the reasons XML has captured so much interest so quickly (Version 1.0
of the XML specification was released in February 1998) is that it represents
a parsimonious solution to a wide variety of problems. There are three sets
of users who have a very high level of interest in XML. The first group includes
Webmasters and other designers of Web-based information systems who use HTML
to mark up information for presentation, but have no way to structure the information
they send to browsers. By providing structure to the unstructured Web data in
a standard way, a Web query can deliver a much more useful set of results, increasing
the value of the information.
The second group of users have toiled for years with the Standard Generalized
Markup Language (SGML) to create structured documents such as training manuals
and technical documentation. Although HTML is derived from SGML, SGML in general
is not well suited to the Web environment because it is extremely complex --
something that has also affected its universal adoption. XML is an SGML derivative
that is not only easier to use on the Web, it has garnered wider adoption. These
users have also become very active in World Wide Web Consortium (W3C) working
groups that are hammering out additional specifications and standards to ensure
that XML meets many of the same application requirements as SGML.
The third set of users fascinated by XML are a set who were not originally
targeted by the W3C's XML efforts. But very early on, application developers
building distributed applications -- and faced with difficult challenges around
application integration and interoperability -- saw XML as a way to free their
applications from the tyranny of over-the-wire binary formats that made it impossible
to link applications together in real time. These developers, many from the
Java community, but equally as many using Microsoft tools, quickly realized
that they could use XML syntax in their messages and, because of the self-describing
nature of XML documents, applications could exchange data without having to
be explicitly written or compiled to do so. Freedom! This group has now expanded
to include developers who want to extend EDI, link networks of suppliers and
customers, create dynamic marketplaces, and perform other heretofore impossible
tasks over the Web.
Open standard/closed vocabularies
The XML specification provides an open, standard syntax that can also be used
to create vocabularies -- a collection of tags used to structure documents.
XML itself does not specify the semantics of these vocabularies. Because of
this, incompatibilities can arise, as well as proprietary vocabularies that
can result in incompatible solutions to the same problems. For example, XML
tells us how to use angle brackets to create elements and structure attributes
of elements. But it doesn't tell us what the specific elements in a document
should be. There are two issues here. The first is when the same tag has different
meanings. I may create a vocabulary that uses as a tag for how
fast one should travel, while someone else might use to refer
to an attribute of photographic film. (I've omitted the end tags that XML requires,
such as , for clarity, but don't you omit them!) The other issue
is when two different vocabularies use different words for the same thing. For
example, one might assign a tag to each product while another
might assign a tag.
Fortunately, XML's features make these incompatibilities less of a problem
than when these attributes are expressed in compiled form. Because XML documents
are human-readable and use standard character sets instead of binary encodings,
it is simpler to create mapping layers to convert a document using one markup
language (vocabulary) into a document marked up with a different, but related,
set of tags. So, if my markup language uses the tags and
, and your markup language uses and ,
it is a fairly simple matter to map one to the other. In fact, the eXtensible
Stylesheet Language (XSL), a W3C specification related to XML, provides a mechanism
for doing this mapping. Developers are even creating XSL-based transformation
engines to actually carry out the conversions on the fly. All you need is a
template that shows the mappings. The beauty of this is that you can map an
unlimited number of other vocabularies into your own simply by creating a new
XSL style sheet.
But is this really a good thing? Wouldn't it be better if everyone agreed
on the same vocabulary? Universal uniformity would be ideal, but history teaches
us that universal agreement on things like this is just a utopian dream. XML
isn't Esperanto. It is not a common language meant to replace all others and
be spoken by all. It is more like ASCII, whose character sets can be used to
create text in many different languages. I can recognize all the characters,
but I cannot speak all the languages. (Note that XML doesn't actually use ASCII,
but Unicode, of which the common ASCII character set is a subset.)
So, what approach should you take?
In spite of the efforts of CommerceNet, Commerce One, Ariba, Microsoft and
others, XML will never be used to create a universal business language. Accept
the fact that everyone will take the approach that makes the most sense for
them and use the tools that XML provides to cancel out the differences.
This isn't to say that efforts at standardization are fruitless. There is
a common good to having agreement on a core set of tags whenever possible. Industry
groups in dozens of vertical sectors are working to come to agreement on industry
vocabularies. To the extent they are successful, exchanging data and information
will become far simpler than it has ever been. Efforts underway to map EDI messages
to XML take advantage of pre-existing work in this direction. In the long run,
we will likely see agreement on a common set of tags in applications where data
must be frequently exchanged across enterprise boundaries, and less agreement
where data is less frequently exchanged. But at least we will have a mechanism
for easily handling those differences so that we can invest our time and resources
in solving more difficult challenges -- like how to get duct tape off a wall.
Customers often ask whether they should just wait until all the efforts at
defining vocabularies have settled down before adopting XML. My recommendation
is generally not to wait, but to go ahead and do the work necessary to define
your company's data schema. This effort is not unlike past data modeling efforts
that were often shelved before they were completed -- but now there is one key
difference. Defining an XML schema for business objects that are relevant to
your business processes has a direct application. There are XML servers that
can load those schemas and serve data that is structured using those schemas.
There are also tools for building applications and exchanging data. Previously,
the developed data model sat in binders on the shelf. Today, my XML schema is
something I can use immediately.
As things evolve, industry vocabularies develop or as my business partners
introduce vocabularies that differ from my own, I won't have a problem. Using
XSL, I can easily map the work I have done to the work others have done. In
fact, I can often simply replace my vocabulary with another. XML's simplicity
approaches elegance in the sense that it is a simple solution to a complex problem.
The beauty of XML is that it is almost completely invisible. Tools will hide
the XML syntax from the application developer, the database programmer, the
Web designer and the end user. Unlike HTML, being able to write XML syntax by
hand will not be the sign of a "real" Webmaster. Understanding how to apply
XML and how to structure data hierarchically to best take advantage of XML will
now be the sign of a clever application developer.