Columns

Is XML the answer? Depends on the question.

A common joke around my office is that any problem in the world can be fixed using some combination of duct tape and the eXtensible Markup Language (XML). Case in point: A recent television program ran a profile of an inventor who created a bear-proof suit of armor held together with duct tape. Although XML wasn't used in this particular example, developers are using the technology as a component in solutions to some of the more vexing and persistent challenges facing IT today.

Many of the most difficult tasks lie in organizing and delivering information across the Web, as well as in exchanging information among incompatible systems and applications.

XML provides a syntax that can be used to construct messages that can be understood by any application. This makes data movement much simpler, and requires far less effort than having to code unique interfaces between each application. This is possible because XML is text-based, self-describing and gaining acceptance as a global standard.

One of the reasons XML has captured so much interest so quickly (Version 1.0 of the XML specification was released in February 1998) is that it represents a parsimonious solution to a wide variety of problems. There are three sets of users who have a very high level of interest in XML. The first group includes Webmasters and other designers of Web-based information systems who use HTML to mark up information for presentation, but have no way to structure the information they send to browsers. By providing structure to the unstructured Web data in a standard way, a Web query can deliver a much more useful set of results, increasing the value of the information.

The second group of users have toiled for years with the Standard Generalized Markup Language (SGML) to create structured documents such as training manuals and technical documentation. Although HTML is derived from SGML, SGML in general is not well suited to the Web environment because it is extremely complex -- something that has also affected its universal adoption. XML is an SGML derivative that is not only easier to use on the Web, it has garnered wider adoption. These users have also become very active in World Wide Web Consortium (W3C) working groups that are hammering out additional specifications and standards to ensure that XML meets many of the same application requirements as SGML.

The third set of users fascinated by XML are a set who were not originally targeted by the W3C's XML efforts. But very early on, application developers building distributed applications -- and faced with difficult challenges around application integration and interoperability -- saw XML as a way to free their applications from the tyranny of over-the-wire binary formats that made it impossible to link applications together in real time. These developers, many from the Java community, but equally as many using Microsoft tools, quickly realized that they could use XML syntax in their messages and, because of the self-describing nature of XML documents, applications could exchange data without having to be explicitly written or compiled to do so. Freedom! This group has now expanded to include developers who want to extend EDI, link networks of suppliers and customers, create dynamic marketplaces, and perform other heretofore impossible tasks over the Web.

Open standard/closed vocabularies

The XML specification provides an open, standard syntax that can also be used to create vocabularies -- a collection of tags used to structure documents. XML itself does not specify the semantics of these vocabularies. Because of this, incompatibilities can arise, as well as proprietary vocabularies that can result in incompatible solutions to the same problems. For example, XML tells us how to use angle brackets to create elements and structure attributes of elements. But it doesn't tell us what the specific elements in a document should be. There are two issues here. The first is when the same tag has different meanings. I may create a vocabulary that uses as a tag for how fast one should travel, while someone else might use to refer to an attribute of photographic film. (I've omitted the end tags that XML requires, such as , for clarity, but don't you omit them!) The other issue is when two different vocabularies use different words for the same thing. For example, one might assign a tag to each product while another might assign a tag.

Fortunately, XML's features make these incompatibilities less of a problem than when these attributes are expressed in compiled form. Because XML documents are human-readable and use standard character sets instead of binary encodings, it is simpler to create mapping layers to convert a document using one markup language (vocabulary) into a document marked up with a different, but related, set of tags. So, if my markup language uses the tags and , and your markup language uses and , it is a fairly simple matter to map one to the other. In fact, the eXtensible Stylesheet Language (XSL), a W3C specification related to XML, provides a mechanism for doing this mapping. Developers are even creating XSL-based transformation engines to actually carry out the conversions on the fly. All you need is a template that shows the mappings. The beauty of this is that you can map an unlimited number of other vocabularies into your own simply by creating a new XSL style sheet.

But is this really a good thing? Wouldn't it be better if everyone agreed on the same vocabulary? Universal uniformity would be ideal, but history teaches us that universal agreement on things like this is just a utopian dream. XML isn't Esperanto. It is not a common language meant to replace all others and be spoken by all. It is more like ASCII, whose character sets can be used to create text in many different languages. I can recognize all the characters, but I cannot speak all the languages. (Note that XML doesn't actually use ASCII, but Unicode, of which the common ASCII character set is a subset.)

So, what approach should you take?

In spite of the efforts of CommerceNet, Commerce One, Ariba, Microsoft and others, XML will never be used to create a universal business language. Accept the fact that everyone will take the approach that makes the most sense for them and use the tools that XML provides to cancel out the differences.

This isn't to say that efforts at standardization are fruitless. There is a common good to having agreement on a core set of tags whenever possible. Industry groups in dozens of vertical sectors are working to come to agreement on industry vocabularies. To the extent they are successful, exchanging data and information will become far simpler than it has ever been. Efforts underway to map EDI messages to XML take advantage of pre-existing work in this direction. In the long run, we will likely see agreement on a common set of tags in applications where data must be frequently exchanged across enterprise boundaries, and less agreement where data is less frequently exchanged. But at least we will have a mechanism for easily handling those differences so that we can invest our time and resources in solving more difficult challenges -- like how to get duct tape off a wall.

Customers often ask whether they should just wait until all the efforts at defining vocabularies have settled down before adopting XML. My recommendation is generally not to wait, but to go ahead and do the work necessary to define your company's data schema. This effort is not unlike past data modeling efforts that were often shelved before they were completed -- but now there is one key difference. Defining an XML schema for business objects that are relevant to your business processes has a direct application. There are XML servers that can load those schemas and serve data that is structured using those schemas. There are also tools for building applications and exchanging data. Previously, the developed data model sat in binders on the shelf. Today, my XML schema is something I can use immediately.

As things evolve, industry vocabularies develop or as my business partners introduce vocabularies that differ from my own, I won't have a problem. Using XSL, I can easily map the work I have done to the work others have done. In fact, I can often simply replace my vocabulary with another. XML's simplicity approaches elegance in the sense that it is a simple solution to a complex problem.

The beauty of XML is that it is almost completely invisible. Tools will hide the XML syntax from the application developer, the database programmer, the Web designer and the end user. Unlike HTML, being able to write XML syntax by hand will not be the sign of a "real" Webmaster. Understanding how to apply XML and how to structure data hierarchically to best take advantage of XML will now be the sign of a clever application developer.