The Search for Extra-Relation Business Intelligence

Don’t look now, but the next big frontier in enterprise data management is one most IT organizations probably feel they’ve already licked: search.

And to the extent that relational databases and data warehouses are able to return consistent, accurate and more or less complete information in response to a given set of query parameters, the enterprise search problem is already licked.

But once you get outside the relational or multidimensional database spaces, that is, once you take into account the gigabytes or terabytes of unstructured, non-relational information stored in HTML documents, PDF files, Microsoft Word documents, Excel spreadsheets and individual e-mail messages, the enormity of the enterprise search problem becomes apparent. Call it the search for extra-relational business intelligence.

Microsoft Corp. and search specialist Google Inc. have both developed search tools for desktop systems, but much of the real legwork in enterprise search is taking place in the thriving enterprise information integration (EII) market. And while EII pure-plays like Composite Software market federated data access solutions that pretty much work as advertised (i.e., they facilitate access to multiple, heterogeneous structured and unstructured data sources), some of the most interesting work in enterprise search is being done by IBM Corp., whose WebSphere Information Integrator offering more or less took EII mainstream when it debuted, as DB2 Information Integrator, two years ago.

WebSphere Information Integrator, like almost all other EII tools, lets users query for information against heterogeneous data sources, but IBM claims that the next version of its EII solution, currently code-named “Serrano,” and slated for a late-2005 release, will feature drastically improved enterprise search capabilities.

In fact, says Nelson Mattos, distinguished engineer and vice president of IBM’s Information Integration product group, Big Blue’s goal with Serrano is to make it about as easy for users to find corporate information as it is to search the Web. The upshot for developers, of course, is programmatic access, via ODBC, JDBC, XML Query Language (XQuery) or even Web services standards, to what IBM promises will be best-in-breed enterprise search capabilities.

Sounds like a job for the Googles and Yahoo!s of the world, though, doesn’t it? Not surprisingly, Mattos disagrees.

“IBM is different from the Internet search vendors. From a business perspective, we are not really focusing on the consumer market. Our focus is on the enterprise; this is where the real problem is,” he observes.

What’s so different about it? For starters, Mattos says, Internet search engines are largely non-discriminating. They’ll return all relevant information that turns up in response to a given query. That approach won’t wash in the enterprise, for a variety of reasons. “On the Internet, every piece of data is public domain. I don’t give every employee access to the payroll system; I don’t give every employee information about how I’m doing from a payroll perspective. I can’t allow that, and I can’t have my search tool allowing that,” he says.

More to the point, Mattos claims, enterprise search is a more richly heterogeneous proposition than its Internet counterpart.

“The Web is dealing with Web pages, HTML documents. It’s a very homogeneous environment in terms of the forms of data I’m dealing with,” he says. “In the enterprise, I have heterogeneous data in a lot of different systems, so you need a search engine that can talk with these different systems in their different interfaces. You need to be able to talk to content management systems, from IBM, from FileNet, which are different.”

When it ships, Serrano will extend the WebSphere Information Integrator portfolio to include many different kinds of structured, semi-structured or unstructured data.

About the Author

Stephen Swoyer is a contributing editor for Enterprise Systems. He can be reached at [email protected].