A New Approach to the Data Usability Problem

Moving data from one system to another is easy; making it usable outside the application that created it is anything but. This is not a new problem, and yet a surprising number of enterprise integration projects underestimate the difficulty of making data fully compatible with the requirements of other applications or systems.

"Data integration is a difficult process, even if you have the best tools of the trade," says business intelligence expert Claudia Imhoff, Ph.D. "In most organizations, the data is all over the place-it's in the order entry system, the billing system, the ERP system, the HR system, the POS system. Integrating the data from all those disparate systems can feel like stitching together a Frankenstein. And you've got to make sure you don't sew an arm where a leg is supposed to be."

Dr. Imhoff is the president and founder of Intelligent Solutions, a Boulder, CO-based consultancy specializing in business intelligence and CRM technologies and strategies. The co-author of five books, she writes regularly on these subjects for a range of technology and trade publications.

In some ways, "usability" is in the eye of the beholder, Imhoff says. Data that is usable in one situation may not be usable in another. But generally, usable data is "data that is consistent and in a form that is appropriate for the purpose at hand."

Much of the data generated in enterprises today is unusable, Imhoff says, because most applications are designed and built without universal corporate standards. Data in different systems is nearly always inconsistent in content, format, units of measure, language and other factors. These data incompatibilities become roadblocks to attempts to leverage the data among systems and users, limiting collaboration across the extended enterprise.

A number of manual and script-based solutions designed to deal with the problem of data usability exist today, Imhoff points out. In a script-based approach, a subject-matter expert works with a programmer, and together they come up with some rules that get coded into some form of executable process or script. Generally, the script relies on some if-then-else condition to identify meaning, and calls that perform a series of string functions to substitute text. For example, "if description contains 'hp' substitute 'hp' with 'horsepower.'"

This semantic translation of data-the expression of the same information in alternate and more usable forms-is a critical requirement for almost any form of data integration project, Imhoff says. But the scripted and manual solutions are time-consuming, and the resulting translation tables tend to be brittle.

"Translation tables are very difficult to build and require a lot of manual effort," she says. "And once you leave the domain of customer data, you get into some other data that's not so easily matched with a social security or telephone number. Product data, for example, doesn't have that same kind of universal code to match on."

In her recently published white paper, "Data Usability in Enterprise Data Integration: The Need for Semantic Translation Technologies in Creating Usable Data," Imhoff identifies a promising new approach to solving data incompatibility that utilizes "technology that can work with the insight and knowledge of a human expert, but also leverage the speed and scale of software."

"It's the whole idea of capturing the subject-matter expert's expertise," Imhoff says.

One vendor she cites, Boulder-based Silver Creek Systems, has eschewed traditional match-and-replace schemes based on pattern-matching, and instead employs what the company calls a comprehend-and-use approach. The company's DataLens system uses its proprietary Data Refraction technology to "automate the identification of record-level context and comprehension." Rather than a search for repeating patterns, this solution identifies data context and meaning at the phrase and record level.

"It's a new way of doing all this that shows a lot of promise," Imhoff says. "Companies with large volumes of data, especially complex and variable data, will want to take the time to understand the new approaches and solutions that may make a huge difference in how they manage and integrate their data."

About the Author

John K. Waters is a freelance writer based in Silicon Valley. He can be reached at [email protected].