XML Cutting Down Unstructured Data
- By John K. Waters
We tend to think of data as either structured (the roughly 20 percent that fits neatly into the cells of a relational database) or unstructured (the audio, video, e-mail, and Word files that are usually referred to as content, and which constitute the remaining 80 percent). But thanks to the Extensible Markup Language (XML), there's a third category emerging: semi-structured data.
More and more documents are being rendered in XML every day. (Microsoft Word, the world's most widely used word processor, is now XML-aware.) It provides a widely accepted standard for labeling the structure and content of data, transforming what were traditionally unstructured, text-based documents into much more manageable--and therefore useful--semi-structured content.
"There's a lot of potential in XML and service-oriented architectures (SOAs) in the realm of unstructured data," says Gartner analyst Toby Bell. "As we tip toward XML in terms of transforming existing content and creating new content, our ability to describe content, reuse it, repurpose it, section it, segment it, slice it and analyze it improves dramatically, and so does that content's business value."
Because of the widespread use of XML, organizations are simply generating less unstructured data. As enterprises move toward SOAs with XML Web services to complement business processes, and as more business-process and content standards become available, organizations will be able to make the interdependency between people, process and content much more fluid, Bell says.
"There's an awful lot of conversion technology out there right now," he says. "There will be a lot more that creates content in XML natively in the future. What we presently think of as unstructured content will eventually become much more manageable, part of a larger business intelligence architecture."
An interesting example of this trend can be seen in the U.S. Army's Content Management Program (FCMP). The project's mission is to convert the Army's legendarily lethargic paper-based systems to an e-forms-based model.
The FCMP will convert many paper-based administrative processes with a solution based on technology provided by IBM, PureEdge and Silanis. The new solution combines XML-based e-forms with digital signature and content management software. By utilizing a single solution department-wide, the Army will be able to replace existing redundant stovepipe technologies across different divisions (for example, logistics, medical, and personnel) simplifying the workflow for soldiers.
When fully implemented over the next decade, the new system is expected to cut out a great deal of paperwork and administrative procedures, according to an Army Audit Agency report, as well as save money.
John K. Waters is a freelance writer based in Silicon Valley. He can be reached