News
The Search for Extra-Relation Business Intelligence
- By Stephen Swoyer
- April 25, 2005
Don’t look now, but the next big frontier in enterprise data management
is one most IT organizations probably feel they’ve already licked: search.
And to the extent that relational databases and data warehouses are able to
return consistent, accurate and more or less complete information in response
to a given set of query parameters, the enterprise search problem is already
licked.
But once you get outside the relational or multidimensional database spaces,
that is, once you take into account the gigabytes or terabytes of unstructured,
non-relational information stored in HTML documents, PDF files, Microsoft Word
documents, Excel spreadsheets and individual e-mail messages, the enormity of
the enterprise search problem becomes apparent. Call it the search for extra-relational
business intelligence.
Microsoft Corp. and search specialist Google Inc. have both developed search
tools for desktop systems, but much of the real legwork in enterprise search
is taking place in the thriving enterprise information integration (EII) market.
And while EII pure-plays like Composite Software market federated data access
solutions that pretty much work as advertised (i.e., they facilitate access
to multiple, heterogeneous structured and unstructured data sources), some of
the most interesting work in enterprise search is being done by IBM Corp., whose
WebSphere Information Integrator offering more or less took EII mainstream when
it debuted, as DB2 Information Integrator, two years ago.
WebSphere Information Integrator, like almost all other EII tools, lets users
query for information against heterogeneous data sources, but IBM claims that
the next version of its EII solution, currently code-named “Serrano,”
and slated for a late-2005 release, will feature drastically improved enterprise
search capabilities.
In fact, says Nelson Mattos, distinguished engineer and vice president of IBM’s
Information Integration product group, Big Blue’s goal with Serrano is
to make it about as easy for users to find corporate information as it is to
search the Web. The upshot for developers, of course, is programmatic access,
via ODBC, JDBC, XML Query Language (XQuery) or even Web services standards,
to what IBM promises will be best-in-breed enterprise search capabilities.
Sounds like a job for the Googles and Yahoo!s of the world, though, doesn’t
it? Not surprisingly, Mattos disagrees.
“IBM is different from the Internet search vendors. From a business perspective,
we are not really focusing on the consumer market. Our focus is on the enterprise;
this is where the real problem is,” he observes.
What’s so different about it? For starters, Mattos says, Internet search
engines are largely non-discriminating. They’ll return all relevant information
that turns up in response to a given query. That approach won’t wash in
the enterprise, for a variety of reasons. “On the Internet, every piece
of data is public domain. I don’t give every employee access to the payroll
system; I don’t give every employee information about how I’m doing
from a payroll perspective. I can’t allow that, and I can’t have
my search tool allowing that,” he says.
More to the point, Mattos claims, enterprise search is a more richly heterogeneous
proposition than its Internet counterpart.
“The Web is dealing with Web pages, HTML documents. It’s a very
homogeneous environment in terms of the forms of data I’m dealing with,”
he says. “In the enterprise, I have heterogeneous data in a lot of different
systems, so you need a search engine that can talk with these different systems
in their different interfaces. You need to be able to talk to content management
systems, from IBM, from FileNet, which are different.”
When it ships, Serrano will extend the WebSphere Information Integrator portfolio
to include many different kinds of structured, semi-structured or unstructured
data.
About the Author
Stephen Swoyer is a contributing editor for Enterprise Systems. He can be reached at [email protected].