In-Depth
ADT's Programmers Report: What's in store for XML storage?
- By Jack Vaughan
- June 3, 2002
On the face of it, the notion of native XML data storage is similar to the
notion of object database storage promoted in the 1980s and 1990s. At that time,
some fervent object advocates said object stores would replace the dominant
relational databases.
If the new game were to play out along the lines of the older one, you would
predict XML data stores would find a small niche, while established relational
stores would add just enough XML-oriented bells and whistles to stay ahead.
There are plenty of industry viewers who put forward this idea today.
But XML is still in its infancy, and it might be worthwhile to take a look
at the stories of non-relational advocates and not reject XML alternatives out
of hand. There are plenty of pain points in any project, and the day may come
when something completely new might be in order.
Where did object databases succeed? On the main, in technical settings; the
further away from financial ledgers and spreadsheets, the better. Has the business
world changed enough to push the balance away from transaction-oriented RDBs?
No, but document-centric computing is as strong as ever. In some settings, for
example, the insurance industry, the document still holds sway over transaction
databases, somewhat as it did before digital computing came about.
But does XML really change things? It is too early to tell. As an integration
mechanism it may. And in computing middle tiers where RDBs are not necessarily
popular, other means of storage may be considered, especially when volume and
performance are at stake.
Players in the XML data space are quite varied, not yet exhibiting the basic
homogeneity of SQL RDBMSs. Sometimes the best way of describing them is as 'non-relational.'
Competitive approaches to RDBs are offered by companies that include Cincom,
Excelon, Intersystems, IxiaSoft, Lazy Software, NeoCore, Software AG, Times
Ten, X-Hive, XML Global Technologies and XYZFind among others. Traditional object
database players such as Fresher, Poet Software and Versant are among those
that can play a role in mid-tier storage of both objects and XML. As XML and
Java often go hand-in-hand, Java DBs may be competitive. Count PointBase and
Ozone among this class.
But as in the earlier object-oriented database 'war,' relational
database makers are not slow to adjust. They added binary large objects (BLOBS)
and other tools to store and query unstructured data in the relational playpen.
They are at it again.
Especially noteworthy: SQL standard efforts to forge XQuery as a better means
of querying XML data. Unquestionably, architects and developers should take
a long look at their application before straying from familiar RDB turf.
Looking for insurance
The relational path was not the optimal route for a large commercial insurance
application, said Andre Alguero at GRX Technologies, Providence, R.I. Why?
Documents in Alguero's world increasingly support XML. There is high variability
in his content, and he has high volumes of data to work with. This led him to
select an XIS XML data server from Excelon.
Alguero and his team are working on a system known as 'Risk Exchange.'
Its main brunt is to bring structured data into insurers' collaborative environments.
Included are brokers, insurers and re-insurers. Obviously, there are a number
of sources for the data.
'When you talk about variability in content, structured RDBs are not really
well suited,' remarked Alguero. It was a conscious decision to use XML.
'There's a wealth of [insurance] data to collect, and it is not collected
in the same way by each individual. Nor do insurers want to present it in the
same way,' he added.
'Each broker has a different [application form] question for assessing
risk. And risk is handled differently by oil companies than by airlines,'
he explained.
Volume was also a decider in Alguero's systems choice. 'We are talking
about hundreds of thousands of documents that have to be managed,' he said.
'At some point, RDBs don't keep up. They are taking XML and scoring it
in an RDB. The overhead impacts the scalability when it is not handled natively.'
Over the life of the application, as estimates of risk change, Alguero concludes,
updates are easier if left in the XML domain, unmapped to a relational store.
To extend on the fly
Relational databases have proved resilient over decades of use, but new 'data
types' can present challenges. When such data types come from new fields
like genetic engineering, the very newness of the discoveries make it hard to
fully plan querying strategies, and more novel databases may serve. A biotechnology
player takes that tack.
David Shin, CTO at Philadelphia-based Varro Technologies, said XML and the
XML storage technology of NeoCore Inc. are useful in handling data about the
human genome. Varro has built a knowledge management system for researchers
studying the intricacies of DNA. 'We're a peer-to-peer distributed network
for sharing knowledge in genomics,' said Shin in an interview late in 2001.
'If you look at the information coming out of life-science research related
to the human genome project, you find it is new information and comprises new
types of information,' said Shin. 'For that reason, XML is particularly
well suited for managing it.'
He noted that it has been historically hard to manage XML data in a high-performance
manner, but that the NeoCore software met his company's requirements.
Shin has worked with relational databases, and said they are not well suited
to Varro's application. 'They operate on very strict data models that are
difficult to extend compared to XML. With RDBs, you have to know a lot about
your data up front. But in gene research, there are many new discoveries, many
unknowns,' he said. Genomic databases have to 'extend on the fly,'
explained Shin.
Foregoing object-to-relational mapping
Relational technology is key for IT today, but sometimes more innovative solutions
are called for. Skipping an object-to-relational mapping step and adding the
ability to react with quick updates were major factors in an online currency
exchange's decision to use an object database, in this case Matisse from Fresher
Information Corp. This is not a story of XML data storage, but it reflects a
direction some XML-oriented solutions may take, if, as expected, XML plays a
part in next-generation, low-latency middleware solutions.
Matisse DB integration with Interactive Software Engineering's Eiffel allowed
Standard Transactions, an organization that facilitates transportation of funds
around the world, to develop a real-time cash payment network.
Loryn Jenkins, CTO at Standard Transactions, St. Thomas, U.S. Virgin Islands,
said his company's system relies on an infrastructure based on Microsoft Windows
2000, utilizing a COM runtime developed using Eiffel. 'Matisse is the database
underpinning the entire system,' he explained. Alternatives considered
were relational databases and in-memory databases.
'We decided the key competitive advantage we were trying to seek was to
shorten our time-to-modification of an application. Our advantage was not the
ultimate speed of the computer hardware. We wanted to make our software development
very slick and our changes very rapid,' said Jenkins.
'Also, we wanted to cut down on object-relational mapping tasks,'
he noted.
Fresher, and a few other firms with roots in the object-oriented database field,
are riding the wave of object software commercialization that is often driven
by Java success, but which is sometimes highly reliant on XML or other technologies
as well.
Revenge of the relationals
The coming months will see significant updates to relational databases that
will continue to narrow many of the gaps between what they and native XML databases
offer.
Sybase's Adaptive Server Enterprise now generates XML data directly from the
database. Sybase provides an XML parser that converts XML data into a form that
can be easily understood by the Adaptive Server database. Data stored in the
database can be retrieved as XML data.
Importantly, Sybase's DB implements a general XML-Query facility, XQL, that
allows developers to construct queries from XML data whether it is stored in
the database, a flat file or even a URL.
Meanwhile, Oracle Corp., Redwood Shores, Calif., is at work on Oracle9i/ Database
Release 2, which was scheduled to be made available to its Early Release Program
participants in mid-May. Oracle9i Release 2 will integrate native XML support
within the Oracle environment. It will handle SQL data and XML content using
both traditional SQL operations and emerging XQL standards. The Oracle XML DB
provides native support for XML standards like XML Schema, XPath and XSL-T.
And even last year, Microsoft Corp. had begun to tout Microsoft XML storage
capabilities that would rely on a new version of SQL Server. That project, parts
of which were demonstrated at April's Microsoft TechEd event in New Orleans,
is presently code-named 'Yukon.' Full-fledged XML storage is expected
in Yukon, which is due in 2003.
In the meantime, Microsoft is pushing its efforts to boost XML development.
In March, it forged a deal with XML Spy tool maker Altova Inc. that helps bridge
the gap between XML and relational data by creating XML views of relational
data.
Also on the XML data prowl: IBM Corp. The company announced a new online demo
at the end of March that illustrates the evolution of its DB2 software in support
of XML and Web services. Code-named 'Xperanto,' it presents the company's
work to integrate XQuery standards with DB2.
Portions of this story previously appeared in ADT and e-ADT, Application
Development Trends' e-mail newsletter.
For more information, read the related article 'DataPower sets eyes on era of
XML acceleration
.'
Related
ZapThink: Big changes ahead
in XML data storage -Jan 2, 2003
Q&A with Kevin Dick:
Where is XML headed? -July 2002
New Routes to XML data integration
article -Jan 2002