XML as a Distributed Application Protocol

  IT'S THE LATEST buzzword in the Internet application arena; XML, the open-standards child of SGML that promises to provide platform- and language-neutral data encapsulation and separate application logic from application data. It's hot. It's powerful. Everyone loves it.

But isn't this exactly what CORBA already provides?

We will explore why XML is so hot, show how XML can be used as a distributed computing protocol, and look at its advantages, disadvantages, and appropriate uses. We will also speculate on some of the non-technical forces driving the XML phenomenon.

XML is being touted as the perfect partner for Java. Java supports the development of Web-aware, platform-neutral applications, and XML is a platform-neutral document description meta-language. Like a Web startup, both technologies have great promise, and maybe greater hype. Small wonder that most of the XML tools becoming available are written in Java.

Brief Background
XML, for those just starting to pay attention, is a descendent of structured document markup languages such as Tex, troff, and quite directly, SGML. It consists of two basic parts, an optional structure definition (Document Type Definition or DTD) and the documents that use this definition to organize their content. Think of HTML where you can add your own tags for a set of related documents to use. This allows the content and structure to be separated. A third facility, the XML Style layer or XSL, allows presentation styles to be applied to any document of a particular flavor.

What are the advantages? Well, let's take a particular example. Suppose that, at present, corporate time sheets are created using an HTML editor. This is great, in that the creator is free to pick the most appropriate tool on the most appropriate platform for him or her. Also, anyone can view the time sheets with ubiquitous technology. But how does the corporation collect the information? Humans infer the day/project information by their placement within tables, but this kind of inference is very hard for a computer application to make accurately. However powerful they may be at crunching numbers, computers are not really very smart.

XML offers the computer help. We can define XML tags that state that we are now in a day with a day number attribute, and have embedded tags in that day for each project and the number of hours spent on that project. We can now define multiple style sheets for the XML document that allow payroll and managers to view the same document with different views. The advantages are two-fold: explicit structure for data manipulation and decoupling of the presentation from the data to allow for different views.

Wonderful, you say. Better structured documents than HTML can give us.

Of course, despite many predictions, XML will not cause HTML to disappear. There are just too many instances where the data being captured for presentation does not have enough "structural value" to justify the added cost of developing the DTD and XSL. Think of your average home page on the world of gum collecting. There are many cases where the information value or need for management is not sufficient to justify the cost of creating a structure and presentation layer. For these cases, HTML, with its predefined generic structure and presentation, is the logical choice. Indeed, HTML may become one of a list of predefined XML instances as it is modified slightly to fit the XML syntax.

The Next Step
Now that we have a platform-neutral document markup language, why not use it as the glue between applications? And because we are in Web land here, why not use it to describe data that must be shared between distributed applications?

Microsoft has taken this approach in the definition of CDF, an XML-based channel definition format that it released in March 1997, when PointCast and channels were all the rage. CDF defines channels and software distribution meta-information such as logos, abstracts, and modification dates. The last modification date is, for instance, an attribute string of format "yyyymmmddhhmmss" that indicates in Greenwich Mean Time when the channel was last changed. It is repeated in several elements because XML does not support element inheritance.

Hold Your Horses!
But wait a minute. Let's not pretend that XML invented platform-neutral data serialization. Since the days of DCE, frameworks have been developed for applications to exchange messages and data. The two most common examples of this are CORBA and RMI. They both seamlessly offer ways of moving an object tree to an application on another machine—and in many ways much better than johnny-come-lately XML. Although RMI allows for self-describing object serialization and is the simplest to use, I will concentrate on CORBA, because RMI, while platform-neutral, requires Java at both ends of the pipe.

It would be a simple matter to define the CDF as a CORBA IDL that supports a reusable date type. In fact, DTD to IDL mapping is much simpler than IDL to DTD mapping, because DTDs only support strings.

Here's the briefest background information on CORBA you're ever likely to read. Basically, CORBA is a distributed object framework. It creates local proxy objects for remote objects that your code wishes to talk to. When you send the local proxy object a message, including parameters, the proxy marshals the data to the remote system, calls the same method on the remote object, and returns the result back to you. In-out parameters, remote exceptions, one-way messages, contexts, security frameworks, naming, trader, transaction, and other services also exist, of course.

The magic of CORBA is that your client can be a Java applet running on a Linux PC, sending messages to objects in a Smalltalk application on a Solaris server. And the parameters can be basic types (int, String, float, etc.) or any object whose class has an IDL structure mapping. What is IDL? Just as XML allows for a DTD to share the structure of the tag sets, the Interface Definition Language is the language-neutral shared definition of the object types (and remote messages they support). It is what it says it is, a language-neutral interface definition language. Tools are supplied by CORBA vendors (and JavaSoft) that create stubs and skeletons for IDL specifications.

CORBA is not perfect for all applications. Although free ORBs are available, to get the services or support required often involves the licensing of both developer and deployed application, with all the license key management overhead and costs that involves. And unlike Java's RMI, automatic behavior marshaling is not supported. That is, if the IDL says an object of type User is being sent across the wire, and you send an instance of Student (subclass of User) into the remote method, the object will morph into a User at the remote end, probably with incorrect behavior for some methods. Not that XML handles this any better. What XML does handle better, however, is:

  • Large volumes of data. Most XML parsers support the SAX interface that allows the document to be streamed in and processed without a full object model being built in memory. Obviously, there are times when this is absolutely necessary.

  • Persistent management of text trees. An XML document is simply a stream of bytes, and so can be stored in a file for later retrieval using the correct version of the DTD. CORBA only exposes object trees, and while these can also be serialized to data files, they are brittle in that changes to class definitions can cause deserialization errors. Java does not allow for class versioning in the way that DTDs may be versioned.

  • Manual intervention. Let's not forget the different routes that have converged here. XML is a child of the human-readable text markup evolution. CORBA emerged from DCE as a way to extend computer processing between machines. If all else fails, an XML stream can be visually inspected and parsed by a human.
XML can be thought of as the text processor's answer to data management. It is designed to encode the human-oriented documents that still drive many business processes. CORBA was designed by programmers to handle distributed system interactions and, as such, suffers from the complexities that go along with lower-level programming issues. In a way, XML can be seen as a revenge on programmers for developing a data distribution system that was too abstract and difficult to program.

Feature Checklist
Table 1 is a checklist of capabilities of both XML and CORBA. I've also included RMI for completeness. By determining your needs, you can use this checklist to help determine the most appropriate technology for distributing data among your application. This is not intended to be a complete checklist of all distributed system needs or capabilities. Caveat emptor.

Table 1. Capabilities of XML and CORBA.
CapabilityDescriptionXMLCORBARMIWinner (excluding RMI)
Platform
language
independent
Can the client
and server be on
almost any OS,
hardware, and
application
language?
YesYesNo (only supports Java)Tie
Handle human readable textCan the document be viewed and edited with low-tech tools?YesNoNoXML
Handle non-string data typesIs there native for integers, floating point numbers, booleans, etc.?No (each DTD may define its own mapping to string-based representation, but there is no validation or parsing support or reuse. Note: Currently, work is underway to add this to XML through the Document Content Definition proposal)string, short, int, float, double, boolean, byte, char, enums, structs (limited O-O support), unionall serializable objectsCORBA
Huge data set managementCan the data be larger than the allowable application memory?YesNo (although with CORBA, data is usually only retrieved on an as-needed basis as part of a remote conversation)No (see CORBA note)XML
Schema versioningCan an application handle two data streams that were created using different versions of the schema definition?YesNoNoXML
Distribution supporttransparency

remote messaging

lookup facilities


security

transactions, etc.
none (simple messaging can be achieved by parsing transmitted XML documents)fullpartial (naming, remote method invocation, mobile code security) EJB, JTS, JNDI, etc., offer most CORBA servicesCORBA
Object-orientedCan the data structure have behavior associated with it? Is inheritance and reuse supported?Nopartial OO mapping; IDL to Java generator creates mobile classes without behavior. Only remote objects can inheritYesCORBA
References If node A refers to node B, can node B refer to node A?

Can nodes B and C both refer to node A?

Lazy pointers

No cyclic support; Support for shared and lazy referencesNosupport for cyclic and shared references; no support for lazy pointersXML
Integration with application object modelDoes the technology allow the application object model to be used as is?No. XML parsers generally support DOM- or SAX-based object models. Use of application object model requires mapping functions or adapter layerssomewhat (the IDL to Java generator creates structure classes that must then be used)full (any application object which is serializ- able can be passed as a remote parameter or returned as a return value)CORBA

Table 1. Capabilities of XML and CORBA.

,Politics
So, if XML has certain benefits but also certain drawbacks for building distributed systems, why is there so much pressure currently to apply it to all problem domains? Part of this phenomenon stems from our natural inclination to try to see how far we can push any new "cool" technology. But this is only part of the explanation.

Take Microsoft, for example. It is always dangerous to speculate too much in the political arena. Note, however, that Microsoft has made XML one of its linchpins in its drive toward Web-based open systems support. Of course there are rumblings that Microsoft is trying to put its own proprietary spin on the technology, but let's give Bill and company the benefit of the doubt.

CORBA has never been endorsed by Microsoft. Microsoft was an early member of the OMG long before the Web was hot. But they always sat on the fence about supporting CORBA and finally decided that CORBA competed too directly with their DCOM (now called simply COM) object model plans. DCOM was the distributed extension of COM, which itself was an evolution of OLE, a technology for allowing windows applications to interact with each other. Some say that Microsoft wanted to control the desktop environment and saw DCOM as a way to extend its desktop presence to a whole network, thereby mandating its operating systems as the required de facto standard. Microsoft couldn't support CORBA too directly without sabotaging their DCOM message.

Assuming that this analysis is true, what happened next? Well, the Web came out of nowhere and threw Microsoft's plans for Blackbird, its proprietary Information Superhighway technology, into disarray. For several months, Microsoft appeared to be reacting like IBM at the beginning of the PC revolution. Finally, they ate some crow, or maybe blackbird, and embraced the Web.

Some say the Web is antithetical to Microsoft's philosophy of controlling the technology behind its products. In the Web world, open is good, vendor-neutral is good, OS- and language-independence is good. So, while ActiveX morphed and limped out of OCX's, Java became ubiquitous. Open source and Gnu public licenses became the flavor du jour. Finally, Microsoft needed to get in the open game when it came to Web-based data standards. The W3C was not going to accept submissions for a channel definition format based on DCOM. But publishing it as a CORBA IDL would be treasonous to its DCOM technology.

Enter XML. It's an open, vendor-neutral, OS-independent standard spearheaded by the W3C. And it doesn't give the image of betraying DCOM. In fact, paradoxically, it is precisely XML's inability to specify a remote messaging interface (the core of both CORBA and DCOM) that make it an acceptable remote data format to Microsoft.

Or so the theory goes. I have no knowledge of Microsoft's decision-making process and so all of this is, obviously, speculation. Flame shields up.

More Politics
This is not the whole picture. Sun seems to add support for every new buzz word to Java as soon as it appears. It has taken a couple of years for Sun to resolve its apparent splintering from the CORBA camp with the release of RMI. Now they are pushing a Java XML standard extension and talking of using XML to make Enterprise JavaBeans (EJBs) more portable. Of course, as we saw earlier, there are valid domains for XML, and XML and Java have hit it off from the beginning. For class-version robust specification of persistent data, such as EJBs configuration, XML is a good choice. So Sun's work may not be as contradictory as it may first appear.

IBM, as well, has come out with a lot of XML support and components. This appears, however, to be more of a support for OS-neutral standards than a push to extend XML into the distributed messaging market. With so many of its own operating systems to support, as well as a stodgy image to update, IBM has a lot at stake in supporting both Java and XML.

Even the OMG has had to support XML, with some obvious reservations. But with the OMG, that is probably more of an effort to appear open to new technologies and directions than any belief that it may be a good replacement for the core CORBA technology. Better to say "XML has its uses, and here's how it can fit with CORBA," than to put your hands over your ears and chant "I can't hear you."

Wrap Up
XML is cool and, as Marshall McLuhan noted, cool is hot. It offers a simplified, Web-friendly data markup language that provides domain-specific tagging and promises to make the maze of documents on the Internet easier to manage, search, and navigate. As a distributed data format, it has its advantages but also many disadvantages when compared to the other Web-centric distributed data standard: CORBA. It handles huge data sets of human-readable text data with lazily-followed pointers exceptionally well. It does not provide a distributed messaging framework, nor support machine-readable non-text data. The reason for its push into the distributed computing arena comes from both a desire among geeks to push new technologies as far as they will go, as well as political desire by some organizations to support open standards without threatening their own technologies.

There are many application arenas where XML provides huge advantages, especially in document management and stream-based data manipulation. Other applications may not benefit as much from XML as they would from other open Web technologies better suited to their needs. Hopefully, I've helped architects and designers get a better understanding of when it is and is not appropriate to use XML as part of the system architecture.