XML Parsing: The “What”s and the “Why”s

An article on IBM’s DeveloperWorks site describes the three main options you have available when choosing how to parse an XML document. It’s a neat summing-up.

The first choice, using an object model API such as the portable and W3C-standard DOM, will be intuitive to most programmers: your code queries the document and gets one or more XML nodes back.

The second choice, using an event API such as SAX, is less intuitive because it turns the tables. In effect, your code is pinged via an event/callback mechanism while the parser traverses the document. Although more difficult to understand and trickier to get working initially (more code to write), event APIs prove useful under certain circumstances: e.g. when streaming large documents where you don’t have the whole document in memory at once to be able to query it.

The third alternative is to generate an object model from the XML schema, resulting in a custom parser. This should result in less manually written code that you have to write and maintain, as all the boilerplate parsing stuff is done in the generated parser.

The decision of which parser type to use depends, of course, on the nature of the problem. Go check out the DeveloperWorks article to work out which one is best suited to your project!

About the Author

Matt Stephens is a senior architect, programmer and project leader based in Central London. He co-wrote Agile Development with ICONIX Process, Extreme Programming Refactored, and Use Case Driven Object Modeling with UML - Theory and Practice.