In-Depth

Programmers Report -- Q&A with Brett McLaughlin


Java and XML: How to improve the marriage

Editor at Large Jack Vaughan recently spoke with Brett McLaughlin, an Enhydra strategist at Lutris Technologies and the author of "Java & XML, Second Edition" from O'Reilly. In his latest book, Brett focuses on using XML from Java applications, as well Advanced SAX and Advanced DOM, SOAP and data binding. Brett offers his views on how well Java and XML work together.

Q: Java and XML are from two different worlds. Where do they meet?
A: There is a Java culture and an XML culture. But, I wouldn't say the meeting of them has been that each group went halfway. It's more a case where more Java developers want to use and learn XML, while there are very few people in the XML community who have moved toward Java. Most of the people writing Java using XML are definitely writing their code with a bias toward the Java point of view.

Q: Do developers embrace a markup language like XML slowly?
A: For the typical Java developer, it's very easy to get up and running using XML, but there are a handful of very common mistakes. One is overusing XML. Instead of treating it like any other Java API, where you weigh the pros and cons of using JDBC, servlets or JSPs, they [use] XML like a magic bullet and it appears everywhere. In many cases, it makes their application more of a mess than it was [originally]. The second mistake is using XML without learning data-driven applications and the data world of XML. In those cases, you usually see programming that meets [a programmer's] needs, but is terribly inefficient and impossible to maintain. It is also a very ill-written program simply because they haven't taken any time to learn what XML is about. What you see are incredibly verbose documents. The more Java people come to it, the sloppier the average XML document tends to get.

Q: What are the differences between RMI and RPC?
A: The traditional method of communicating with remote objects in Java is RMI. The real introduction to RMI was when enterprise Java showed up. If you are using any kind of EJB, you are using RMI. When people found out about that, they started learning about it. RMI is a good protocol in terms of interactivity. You can operate on a remote method just like it is local. But because of that interactivity, there's a high cost associated with it in terms of processing and network traffic. The more interactive something is, the more you will pay for it. The less control you want, the less interactivity you want. RPC is a bit different; it is essentially half the RMI process. In RMI, you make a method indication, you call some method and you get a response from that method, which is the result of that method's processing. If it takes 10 seconds to execute, you wait 10 seconds to get the response back. In RPC, that is not the case or it doesn't have to be the case. In RPC, you can make a remote procedure call and that remote procedure has the option say "OK" and do something else, or it can mimic RMI.

Q: What is the status of the marriage of XML and Java today?
A: It is definitely better than it was a couple of years ago. Two years ago, if you wanted to use XML, you pretty much had to create a home-brewed solution, but you lost many of the advantages of XML. You still maintained operability, but in terms of writing code, you basically had to write your own proprietary solutions. About a year ago that changed, and we had what I call several low-level APIs -- APIs that allowed you to directly interoperate with the document and were very data-driven. If you used an API like SAX or DOM, you were working with the document in a document-centric structure. You worked with elements, attributes and text, and were confined to an XML-like structure in Java. Even though you are not operating at a file-access level, you have to know something about XML to use it. Because these Java APIs simply provide you with a document in Java -- it makes you understand what an element is, what an attribute is, and what an entity is -- those are the data structures you have. Almost three years ago, since XML's conception, we have a lot more of what I call high-level APIs that are built on these other APIs. One well-known one is XML data binding which allows you to take an XML document and map its data onto a Java class. Instead of working with elements (like a people element) and attributes, you use data binding to mask that XML document onto a Java class. Instead of getting an element and its textual value, you can say, "get me a person" and find out that person's first name. This is more of a business-driven approach instead of a data-driven approach, which started a secondary explosion toward XML.

Q: It seems as if there's a difference of opinion between what people have been doing with XML and what was being done with Java.
A: Most people look at RPC and wait on it like an RMI call, and say, "What is the big advantage?" It's not immediately obvious that by using RPC you can have this asynchonous-type messaging going on. If you have a RPC call, XML is a wonderful protocol to allow machines speaking different languages -- Java, C, Perl, Python -- to communicate with each other in an RPC-type way using XML as the means of communication.

Q: What are some XML performance issues?
A: XML is almost always going to be slower than traditional binary data formats. Because it is a textual format, it takes up more space and it will take longer to move a larger piece of data across the network. You [therefore] want to be judicious in your use of XML. One common pitfall is that a lot of people use XML for communication between two Java components; 80% of the time, there's no reason to use XML. If you are speaking the same language -- Java to Java -- and do not have to go through a firewall or a restricted network, there's no reason to encode your data in XML. The point of XML is interoperability and if you have Java to Java, you already have that. It is much easier to use a binary format; just use normal Java serialization or RMI to communicate. The power of XML comes in [to play] when [you are] speaking different languages. A Java sender can't encode information into a binary format that the C++ receiver can understand. In that case, even though you are paying a higher price for using XML, that price is recouped because the two different languages can speak to each other without a tremendous amount of infrastructure and translation going on. There is always a tradeoff using XML; you will always lose over using traditional binary formats. The question is "Are the pros greater than the cons?" If you don't make that comparison, you will end up losing out.

Q: Do XML projects take longer than other types of projects?
A: The initial question is do you have XML experience on hand? If you have developers who don't have to learn XML from scratch, it tends to be a very quick process because you have existing APIs, tools, standards and best practices for using XML as opposed to writing your own data format. If you don't have any XML experience on hand, you run the risk of adding up-front cost to get developers up to speed, and [you have to] weigh that against long-term costs.

About the Author

Jack Vaughan is former Editor-at-Large at Application Development Trends magazine.