In-Depth
Programmers Report -- Q&A with Brett McLaughlin
- By Jack Vaughan
- March 1, 2002
Java and XML: How to improve the marriage
Editor at Large Jack Vaughan recently spoke with Brett McLaughlin, an Enhydra
strategist at
Lutris Technologies and the author of
"Java & XML, Second Edition"
from O'Reilly. In his latest book, Brett focuses on using XML from Java applications,
as well Advanced SAX and Advanced DOM, SOAP and data binding. Brett offers his
views on how well Java and XML work together.
Q: Java and XML are from two different worlds. Where do they meet?
A: There is a Java culture and an XML culture. But, I wouldn't say the
meeting of them has been that each group went halfway. It's more a case where
more Java developers want to use and learn XML, while there are very few people
in the XML community who have moved toward Java. Most of the people writing Java
using XML are definitely writing their code with a bias toward the Java point
of view.
Q: Do developers embrace a markup language like XML slowly?
A: For the typical Java developer, it's very easy to get up and running
using XML, but there are a handful of very common mistakes. One is overusing XML.
Instead of treating it like any other Java API, where you weigh the pros and cons
of using JDBC, servlets or JSPs, they [use] XML like a magic bullet and it appears
everywhere. In many cases, it makes their application more of a mess than it was
[originally]. The second mistake is using XML without learning data-driven applications
and the data world of XML. In those cases, you usually see programming that meets
[a programmer's] needs, but is terribly inefficient and impossible to maintain.
It is also a very ill-written program simply because they haven't taken any time
to learn what XML is about. What you see are incredibly verbose documents. The
more Java people come to it, the sloppier the average XML document tends to get.
Q: What are the differences between RMI and RPC?
A: The traditional method of communicating with remote objects in Java
is RMI. The real introduction to RMI was when enterprise Java showed up. If you
are using any kind of EJB, you are using RMI. When people found out about that,
they started learning about it. RMI is a good protocol in terms of interactivity.
You can operate on a remote method just like it is local. But because of that
interactivity, there's a high cost associated with it in terms of processing and
network traffic. The more interactive something is, the more you will pay for
it. The less control you want, the less interactivity you want. RPC is a bit different;
it is essentially half the RMI process. In RMI, you make a method indication,
you call some method and you get a response from that method, which is the result
of that method's processing. If it takes 10 seconds to execute, you wait 10 seconds
to get the response back. In RPC, that is not the case or it doesn't have to be
the case. In RPC, you can make a remote procedure call and that remote procedure
has the option say "OK" and do something else, or it can mimic RMI.
Q: What is the status of the marriage of XML and Java today?
A: It is definitely better than it was a couple of years ago. Two years
ago, if you wanted to use XML, you pretty much had to create a home-brewed solution,
but you lost many of the advantages of XML. You still maintained operability,
but in terms of writing code, you basically had to write your own proprietary
solutions. About a year ago that changed, and we had what I call several low-level
APIs -- APIs that allowed you to directly interoperate with the document and were
very data-driven. If you used an API like SAX or DOM, you were working with the
document in a document-centric structure. You worked with elements, attributes
and text, and were confined to an XML-like structure in Java. Even though you
are not operating at a file-access level, you have to know something about XML
to use it. Because these Java APIs simply provide you with a document in Java
-- it makes you understand what an element is, what an attribute is, and what an
entity is -- those are the data structures you have. Almost three years ago, since
XML's conception, we have a lot more of what I call high-level APIs that are built
on these other APIs. One well-known one is XML data binding which allows you to
take an XML document and map its data onto a Java class. Instead of working with
elements (like a people element) and attributes, you use data binding to mask
that XML document onto a Java class. Instead of getting an element and its textual
value, you can say, "get me a person" and find out that person's first name. This
is more of a business-driven approach instead of a data-driven approach, which
started a secondary explosion toward XML.
Q: It seems as if there's a difference of opinion between what people have
been doing with XML and what was being done with Java.
A: Most people look at RPC and wait on it like an RMI call, and say, "What
is the big advantage?" It's not immediately obvious that by using RPC you can
have this asynchonous-type messaging going on. If you have a RPC call, XML is
a wonderful protocol to allow machines speaking different languages -- Java, C,
Perl, Python -- to communicate with each other in an RPC-type way using XML as
the means of communication.
Q: What are some XML performance issues?
A: XML is almost always going to be slower than traditional binary data
formats. Because it is a textual format, it takes up more space and it will take
longer to move a larger piece of data across the network. You [therefore] want
to be judicious in your use of XML. One common pitfall is that a lot of people
use XML for communication between two Java components; 80% of the time, there's
no reason to use XML. If you are speaking the same language -- Java to Java -- and
do not have to go through a firewall or a restricted network, there's no reason
to encode your data in XML. The point of XML is interoperability and if you have
Java to Java, you already have that. It is much easier to use a binary format;
just use normal Java serialization or RMI to communicate. The power of XML comes
in [to play] when [you are] speaking different languages. A Java sender can't
encode information into a binary format that the C++ receiver can understand.
In that case, even though you are paying a higher price for using XML, that price
is recouped because the two different languages can speak to each other without
a tremendous amount of infrastructure and translation going on. There is always
a tradeoff using XML; you will always lose over using traditional binary formats.
The question is "Are the pros greater than the cons?" If you don't make that comparison,
you will end up losing out.
Q: Do XML projects take longer than other types of projects?
A: The initial question is do you have XML experience on hand? If you have
developers who don't have to learn XML from scratch, it tends to be a very quick
process because you have existing APIs, tools, standards and best practices for
using XML as opposed to writing your own data format. If you don't have any XML
experience on hand, you run the risk of adding up-front cost to get developers
up to speed, and [you have to] weigh that against long-term costs.
About the Author
Jack Vaughan is former Editor-at-Large at Application Development Trends magazine.