In-Depth

ADT's Programmers Report: What's in store for XML storage?

On the face of it, the notion of native XML data storage is similar to the notion of object database storage promoted in the 1980s and 1990s. At that time, some fervent object advocates said object stores would replace the dominant relational databases.

If the new game were to play out along the lines of the older one, you would predict XML data stores would find a small niche, while established relational stores would add just enough XML-oriented bells and whistles to stay ahead. There are plenty of industry viewers who put forward this idea today.

But XML is still in its infancy, and it might be worthwhile to take a look at the stories of non-relational advocates and not reject XML alternatives out of hand. There are plenty of pain points in any project, and the day may come when something completely new might be in order.

Where did object databases succeed? On the main, in technical settings; the further away from financial ledgers and spreadsheets, the better. Has the business world changed enough to push the balance away from transaction-oriented RDBs? No, but document-centric computing is as strong as ever. In some settings, for example, the insurance industry, the document still holds sway over transaction databases, somewhat as it did before digital computing came about.

But does XML really change things? It is too early to tell. As an integration mechanism it may. And in computing middle tiers where RDBs are not necessarily popular, other means of storage may be considered, especially when volume and performance are at stake.

Players in the XML data space are quite varied, not yet exhibiting the basic homogeneity of SQL RDBMSs. Sometimes the best way of describing them is as 'non-relational.' Competitive approaches to RDBs are offered by companies that include Cincom, Excelon, Intersystems, IxiaSoft, Lazy Software, NeoCore, Software AG, Times Ten, X-Hive, XML Global Technologies and XYZFind among others. Traditional object database players such as Fresher, Poet Software and Versant are among those that can play a role in mid-tier storage of both objects and XML. As XML and Java often go hand-in-hand, Java DBs may be competitive. Count PointBase and Ozone among this class.

But as in the earlier object-oriented database 'war,' relational database makers are not slow to adjust. They added binary large objects (BLOBS) and other tools to store and query unstructured data in the relational playpen. They are at it again.

Especially noteworthy: SQL standard efforts to forge XQuery as a better means of querying XML data. Unquestionably, architects and developers should take a long look at their application before straying from familiar RDB turf.

Looking for insurance
The relational path was not the optimal route for a large commercial insurance application, said Andre Alguero at GRX Technologies, Providence, R.I. Why?

Documents in Alguero's world increasingly support XML. There is high variability in his content, and he has high volumes of data to work with. This led him to select an XIS XML data server from Excelon.

Alguero and his team are working on a system known as 'Risk Exchange.' Its main brunt is to bring structured data into insurers' collaborative environments. Included are brokers, insurers and re-insurers. Obviously, there are a number of sources for the data.

'When you talk about variability in content, structured RDBs are not really well suited,' remarked Alguero. It was a conscious decision to use XML. 'There's a wealth of [insurance] data to collect, and it is not collected in the same way by each individual. Nor do insurers want to present it in the same way,' he added.

'Each broker has a different [application form] question for assessing risk. And risk is handled differently by oil companies than by airlines,' he explained.

Volume was also a decider in Alguero's systems choice. 'We are talking about hundreds of thousands of documents that have to be managed,' he said. 'At some point, RDBs don't keep up. They are taking XML and scoring it in an RDB. The overhead impacts the scalability when it is not handled natively.'

Over the life of the application, as estimates of risk change, Alguero concludes, updates are easier if left in the XML domain, unmapped to a relational store.

To extend on the fly
Relational databases have proved resilient over decades of use, but new 'data types' can present challenges. When such data types come from new fields like genetic engineering, the very newness of the discoveries make it hard to fully plan querying strategies, and more novel databases may serve. A biotechnology player takes that tack.

David Shin, CTO at Philadelphia-based Varro Technologies, said XML and the XML storage technology of NeoCore Inc. are useful in handling data about the human genome. Varro has built a knowledge management system for researchers studying the intricacies of DNA. 'We're a peer-to-peer distributed network for sharing knowledge in genomics,' said Shin in an interview late in 2001.

'If you look at the information coming out of life-science research related to the human genome project, you find it is new information and comprises new types of information,' said Shin. 'For that reason, XML is particularly well suited for managing it.'

He noted that it has been historically hard to manage XML data in a high-performance manner, but that the NeoCore software met his company's requirements.

Shin has worked with relational databases, and said they are not well suited to Varro's application. 'They operate on very strict data models that are difficult to extend compared to XML. With RDBs, you have to know a lot about your data up front. But in gene research, there are many new discoveries, many unknowns,' he said. Genomic databases have to 'extend on the fly,' explained Shin.

Foregoing object-to-relational mapping
Relational technology is key for IT today, but sometimes more innovative solutions are called for. Skipping an object-to-relational mapping step and adding the ability to react with quick updates were major factors in an online currency exchange's decision to use an object database, in this case Matisse from Fresher Information Corp. This is not a story of XML data storage, but it reflects a direction some XML-oriented solutions may take, if, as expected, XML plays a part in next-generation, low-latency middleware solutions.

Matisse DB integration with Interactive Software Engineering's Eiffel allowed Standard Transactions, an organization that facilitates transportation of funds around the world, to develop a real-time cash payment network.

Loryn Jenkins, CTO at Standard Transactions, St. Thomas, U.S. Virgin Islands, said his company's system relies on an infrastructure based on Microsoft Windows 2000, utilizing a COM runtime developed using Eiffel. 'Matisse is the database underpinning the entire system,' he explained. Alternatives considered were relational databases and in-memory databases.

'We decided the key competitive advantage we were trying to seek was to shorten our time-to-modification of an application. Our advantage was not the ultimate speed of the computer hardware. We wanted to make our software development very slick and our changes very rapid,' said Jenkins.

'Also, we wanted to cut down on object-relational mapping tasks,' he noted.

Fresher, and a few other firms with roots in the object-oriented database field, are riding the wave of object software commercialization that is often driven by Java success, but which is sometimes highly reliant on XML or other technologies as well.

Revenge of the relationals
The coming months will see significant updates to relational databases that will continue to narrow many of the gaps between what they and native XML databases offer.

Sybase's Adaptive Server Enterprise now generates XML data directly from the database. Sybase provides an XML parser that converts XML data into a form that can be easily understood by the Adaptive Server database. Data stored in the database can be retrieved as XML data.

Importantly, Sybase's DB implements a general XML-Query facility, XQL, that allows developers to construct queries from XML data whether it is stored in the database, a flat file or even a URL.

Meanwhile, Oracle Corp., Redwood Shores, Calif., is at work on Oracle9i/ Database Release 2, which was scheduled to be made available to its Early Release Program participants in mid-May. Oracle9i Release 2 will integrate native XML support within the Oracle environment. It will handle SQL data and XML content using both traditional SQL operations and emerging XQL standards. The Oracle XML DB provides native support for XML standards like XML Schema, XPath and XSL-T.

And even last year, Microsoft Corp. had begun to tout Microsoft XML storage capabilities that would rely on a new version of SQL Server. That project, parts of which were demonstrated at April's Microsoft TechEd event in New Orleans, is presently code-named 'Yukon.' Full-fledged XML storage is expected in Yukon, which is due in 2003.

In the meantime, Microsoft is pushing its efforts to boost XML development. In March, it forged a deal with XML Spy tool maker Altova Inc. that helps bridge the gap between XML and relational data by creating XML views of relational data.

Also on the XML data prowl: IBM Corp. The company announced a new online demo at the end of March that illustrates the evolution of its DB2 software in support of XML and Web services. Code-named 'Xperanto,' it presents the company's work to integrate XQuery standards with DB2.

Portions of this story previously appeared in ADT and e-ADT, Application Development Trends' e-mail newsletter.

For more information, read the related article 'DataPower sets eyes on era of XML acceleration .'

 

Related
ZapThink: Big changes ahead in XML data storage -Jan 2, 2003
Q&A with Kevin Dick: Where is XML headed? -July 2002
New Routes to XML data integration article -Jan 2002