IT'S THE LATEST buzzword in the Internet application arena; XML, the open-standards child of
SGML that promises to provide platform- and language-neutral data encapsulation and separate
application logic from application data. It's hot. It's powerful. Everyone loves it.
But isn't this exactly what CORBA already provides?
We will explore why XML is so hot, show how XML can be used as a distributed computing protocol,
and look at its advantages, disadvantages, and appropriate uses. We will also speculate on
some of the non-technical forces driving the XML phenomenon.
XML is being touted as the perfect partner for Java. Java supports the development of Web-aware,
platform-neutral applications, and XML is a platform-neutral document description meta-language.
Like a Web startup, both technologies have great promise, and maybe greater hype. Small
wonder that most of the XML tools becoming available are written in Java.
Brief Background
XML, for those just starting to pay attention, is a descendent of structured document markup
languages such as Tex, troff, and quite directly, SGML. It consists of two basic parts, an optional
structure definition (Document Type Definition or DTD) and the documents that use this definition
to organize their content. Think of HTML where you can add your own tags for a set of related documents
to use. This allows the content and structure to be separated. A third facility, the XML Style layer
or XSL, allows presentation styles to be applied to any document of a particular flavor.
What are the advantages? Well, let's take a particular example. Suppose that, at present, corporate
time sheets are created using an HTML editor. This is great, in that the creator is free to pick
the most appropriate tool on the most appropriate platform for him or her. Also, anyone can view
the time sheets with ubiquitous technology. But how does the corporation collect the information?
Humans infer the day/project information by their placement within tables, but this kind of inference
is very hard for a computer application to make accurately. However powerful they may be at crunching
numbers, computers are not really very smart.
XML offers the computer help. We can define XML tags that state that we are now in a day with a day
number attribute, and have embedded tags in that day for each project and the number of hours spent
on that project. We can now define multiple style sheets for the XML document that allow payroll
and managers to view the same document with different views. The advantages are two-fold: explicit
structure for data manipulation and decoupling of the presentation from the data to allow for different views.
Wonderful, you say. Better structured documents than HTML can give us.
Of course, despite many predictions, XML will not cause HTML to disappear. There are just too many
instances where the data being captured for presentation does not have enough "structural value" to
justify the added cost of developing the DTD and XSL. Think of your average home page on the world
of gum collecting. There are many cases where the information value or need for management is not
sufficient to justify the cost of creating a structure and presentation layer. For these cases, HTML,
with its predefined generic structure and presentation, is the logical choice. Indeed, HTML may become
one of a list of predefined XML instances as it is modified slightly to fit the XML syntax.
The Next Step
Now that we have a platform-neutral document markup language, why not use it as the glue between
applications? And because we are in Web land here, why not use it to describe data that must be
shared between distributed applications?
Microsoft has taken this approach in the definition of CDF, an XML-based channel definition format
that it released in March 1997, when PointCast and channels were all the rage. CDF defines channels
and software distribution meta-information such as logos, abstracts, and modification dates. The
last modification date is, for instance, an attribute string of format "yyyymmmddhhmmss" that
indicates in Greenwich Mean Time when the channel was last changed. It is repeated in several elements
because XML does not support element inheritance.
Hold Your Horses!
But wait a minute. Let's not pretend that XML invented platform-neutral data serialization. Since the
days of DCE, frameworks have been developed for applications to exchange messages and data. The two most
common examples of this are CORBA and RMI. They both seamlessly offer ways of moving an object tree to
an application on another machine—and in many ways much better than johnny-come-lately XML. Although
RMI allows for self-describing object serialization and is the simplest to use, I will concentrate on
CORBA, because RMI, while platform-neutral, requires Java at both ends of the pipe.
It would be a simple matter to define the CDF as a CORBA IDL that supports a reusable date type. In fact,
DTD to IDL mapping is much simpler than IDL to DTD mapping, because DTDs only support strings.
Here's the briefest background information on CORBA you're ever likely to read. Basically, CORBA is a
distributed object framework. It creates local proxy objects for remote objects that your code wishes
to talk to. When you send the local proxy object a message, including parameters, the proxy marshals
the data to the remote system, calls the same method on the remote object, and returns the result back
to you. In-out parameters, remote exceptions, one-way messages, contexts, security frameworks, naming,
trader, transaction, and other services also exist, of course.
The magic of CORBA is that your client can be a Java applet running on a Linux PC, sending messages to
objects in a Smalltalk application on a Solaris server. And the parameters can be basic types
(int, String, float, etc.) or any object whose class has an IDL
structure mapping. What is IDL? Just as XML allows for a DTD to share the structure of the tag sets,
the Interface Definition Language is the language-neutral shared definition of the object types
(and remote messages they support). It is what it says it is, a language-neutral interface definition
language. Tools are supplied by CORBA vendors (and JavaSoft) that create stubs and skeletons for IDL
specifications.
CORBA is not perfect for all applications. Although free ORBs are available, to get the services or
support required often involves the licensing of both developer and deployed application, with all
the license key management overhead and costs that involves. And unlike Java's RMI, automatic behavior
marshaling is not supported. That is, if the IDL says an object of type User is being sent across the wire, and you send an instance of Student (subclass of User) into the remote method, the object will morph into a User at the remote end, probably with incorrect behavior for some methods. Not that XML handles this any better. What XML does handle better, however, is:
- Large volumes of data. Most XML parsers support the SAX interface that allows the document
to be streamed in and processed without a full object model being built in memory. Obviously, there are
times when this is absolutely necessary.
- Persistent management of text trees. An XML document is simply a stream of bytes, and
so can be stored in a file for later retrieval using the correct version of the DTD. CORBA only exposes
object trees, and while these can also be serialized to data files, they are brittle in that changes to
class definitions can cause deserialization errors. Java does not allow for class versioning in the way
that DTDs may be versioned.
- Manual intervention. Let's not forget the different routes that have converged here. XML
is a child of the human-readable text markup evolution. CORBA emerged from DCE as a way to extend
computer processing between machines. If all else fails, an XML stream can be visually inspected
and parsed by a human.
XML can be thought of as the text processor's answer to data management. It is designed to encode the
human-oriented documents that still drive many business processes. CORBA was designed by programmers
to handle distributed system interactions and, as such, suffers from the complexities that go along
with lower-level programming issues. In a way, XML can be seen as a revenge on programmers for
developing a data distribution system that was too abstract and difficult to program.
Feature Checklist
Table 1 is a checklist of capabilities of both XML and CORBA. I've also included RMI for completeness.
By determining your needs, you can use this checklist to help determine the most appropriate technology
for distributing data among your application. This is not intended to be a complete checklist of all
distributed system needs or capabilities. Caveat emptor.
Table 1. Capabilities of XML and CORBA. |
Capability | Description | XML | CORBA | RMI | Winner (excluding RMI) |
Platform language independent | Can the client and server be on almost any OS, hardware, and application language? | Yes | Yes | No (only supports Java) | Tie |
Handle human readable text | Can the document be viewed and edited with low-tech tools? | Yes | No | No | XML |
Handle non-string data types | Is there native for integers, floating point numbers, booleans, etc.? | No (each DTD may define its own mapping to string-based representation, but there is no validation or parsing support or reuse. Note: Currently, work is underway to add this to XML through the Document Content Definition proposal) | string, short, int, float, double, boolean, byte, char, enums, structs (limited O-O support), union | all serializable objects | CORBA |
Huge data set management | Can the data be larger than the allowable application memory? | Yes | No (although with CORBA, data is usually only retrieved on an as-needed basis as part of a remote conversation) | No
(see CORBA note) | XML |
Schema versioning | Can an application handle two data streams that were created using different versions of the schema definition? | Yes | No | No | XML |
Distribution support | transparency
remote messaging
lookup facilities
security
transactions, etc. | none (simple messaging can be achieved by parsing transmitted XML documents) | full | partial (naming, remote method invocation, mobile code security) EJB, JTS, JNDI, etc., offer most CORBA services | CORBA |
Object-oriented | Can the data structure have behavior associated with it? Is inheritance and reuse supported? | No | partial OO mapping; IDL to Java generator creates mobile classes without behavior. Only remote objects can inherit | Yes | CORBA |
References | If node A refers to node B, can node B refer to node A?
Can nodes B and C both refer to node A?
Lazy pointers
| No cyclic support; Support for shared and lazy references | No | support for cyclic and shared references; no support for lazy pointers | XML |
Integration with application object model | Does the technology allow the application object model to be used as is? | No. XML parsers generally support DOM- or SAX-based object models. Use of application object model requires mapping functions or adapter layers | somewhat (the IDL to Java generator creates structure classes that must then be used) | full (any application object which is serializ- able can be passed as a remote parameter or returned as a return value) | CORBA |
Table 1. Capabilities of XML and CORBA.
,Politics
So, if XML has certain benefits but also certain drawbacks for building distributed systems, why is there
so much pressure currently to apply it to all problem domains? Part of this phenomenon stems from our
natural inclination to try to see how far we can push any new "cool" technology. But this is only part
of the explanation.
Take Microsoft, for example. It is always dangerous to speculate too much in the political arena. Note,
however, that Microsoft has made XML one of its linchpins in its drive toward Web-based open systems
support. Of course there are rumblings that Microsoft is trying to put its own proprietary spin on the
technology, but let's give Bill and company the benefit of the doubt.
CORBA has never been endorsed by Microsoft. Microsoft was an early member of the OMG long before the
Web was hot. But they always sat on the fence about supporting CORBA and finally decided that CORBA
competed too directly with their DCOM (now called simply COM) object model plans. DCOM was the
distributed extension of COM, which itself was an evolution of OLE, a technology for allowing windows
applications to interact with each other. Some say that Microsoft wanted to control the desktop
environment and saw DCOM as a way to extend its desktop presence to a whole network, thereby mandating
its operating systems as the required de facto standard. Microsoft couldn't support CORBA too directly
without sabotaging their DCOM message.
Assuming that this analysis is true, what happened next? Well, the Web came out of nowhere and threw
Microsoft's plans for Blackbird, its proprietary Information Superhighway technology, into disarray.
For several months, Microsoft appeared to be reacting like IBM at the beginning of the PC revolution.
Finally, they ate some crow, or maybe blackbird, and embraced the Web.
Some say the Web is antithetical to Microsoft's philosophy of controlling the technology behind its
products. In the Web world, open is good, vendor-neutral is good, OS- and language-independence is
good. So, while ActiveX morphed and limped out of OCX's, Java became ubiquitous. Open source and
Gnu public licenses became the flavor du jour. Finally, Microsoft needed to get in the open game
when it came to Web-based data standards. The W3C was not going to accept submissions for a channel
definition format based on DCOM. But publishing it as a CORBA IDL would be treasonous to its DCOM
technology.
Enter XML. It's an open, vendor-neutral, OS-independent standard spearheaded by the W3C. And it
doesn't give the image of betraying DCOM. In fact, paradoxically, it is precisely XML's inability
to specify a remote messaging interface (the core of both CORBA and DCOM) that make it an acceptable
remote data format to Microsoft.
Or so the theory goes. I have no knowledge of Microsoft's decision-making process and so all of this
is, obviously, speculation. Flame shields up.
More Politics
This is not the whole picture. Sun seems to add support for every new buzz word to Java as soon as
it appears. It has taken a couple of years for Sun to resolve its apparent splintering from the
CORBA camp with the release of RMI. Now they are pushing a Java XML standard extension and talking
of using XML to make Enterprise JavaBeans (EJBs) more portable. Of course, as we saw earlier, there
are valid domains for XML, and XML and Java have hit it off from the beginning. For class-version
robust specification of persistent data, such as EJBs configuration, XML is a good choice. So Sun's
work may not be as contradictory as it may first appear.
IBM, as well, has come out with a lot of XML support and components. This appears, however, to be more
of a support for OS-neutral standards than a push to extend XML into the distributed messaging market.
With so many of its own operating systems to support, as well as a stodgy image to update, IBM has a
lot at stake in supporting both Java and XML.
Even the OMG has had to support XML, with some obvious reservations. But with the OMG, that is probably
more of an effort to appear open to new technologies and directions than any belief that it may be a
good replacement for the core CORBA technology. Better to say "XML has its uses, and here's how it
can fit with CORBA," than to put your hands over your ears and chant "I can't hear you."
Wrap Up
XML is cool and, as Marshall McLuhan noted, cool is hot. It offers a simplified, Web-friendly data
markup language that provides domain-specific tagging and promises to make the maze of documents
on the Internet easier to manage, search, and navigate. As a distributed data format, it has its
advantages but also many disadvantages when compared to the other Web-centric distributed data
standard: CORBA. It handles huge data sets of human-readable text data with lazily-followed pointers
exceptionally well. It does not provide a distributed messaging framework, nor support machine-readable
non-text data. The reason for its push into the distributed computing arena comes from both a desire
among geeks to push new technologies as far as they will go, as well as political desire by some
organizations to support open standards without threatening their own technologies.
There are many application arenas where XML provides huge advantages, especially in document management
and stream-based data manipulation. Other applications may not benefit as much from XML as they would
from other open Web technologies better suited to their needs. Hopefully, I've helped architects and
designers get a better understanding of when it is and is not appropriate to use XML as part of the
system architecture.