In-Depth

Enterprise Search: IT Faces the Google Phenomenon

Oracle enters the already-crowded enterprise search market, as IT moves to supply fast but secure search results.

Earlier this month, Oracle Corp. threw its hat into the enterprise search arena, belatedly joining IBM Corp. and Microsoft Corp. into a market with estimated annual revenues as high as $400 million.

That figure—or the $370 million talked up by Gartner Inc. and others—helps explain the interest of major relational database vendors. To what extent, however, are customers responding in kind? Are IT organizations legitimately interested in enterprise search, or—as is sometimes the case—are IBM, Microsoft, Oracle (and others) trying to take hold of an emerging market by its nose? Call it a little bit of both, analysts say.

The revenues are, to a large extent, real, if you’re inclined to believe market watchers. Gartner says its estimate reflects 10 percent growth over last year, and predicts encouraging growth through 2009, at which point the information access and search market could amount to nearly half a billion dollars ($482.1 million) in annual licensing revenues.

That’s still just a mere fraction of overall relational database market revenues, which (according to Gartner’s 2004 figures, the last full year for which figures are available) amounted to nearly $8 billion.

But industry watchers say the enterprise search wave is real, if not spectacular.

Wayne Eckerson, research and services director for TDWI, calls it “the Google phenomenon”—a desire for an essential reduplication of the Web-searching experience in the enterprise. For a variety of reasons, of course, this isn’t feasible or desirable. “People who use Google at home to query millions of Web servers all around the world and get responses back in less than a second wonder why they can't do the same thing at work when they want to query data housed in the local data center and still have to wait minutes or hours for results,” Eckerson notes. Obviously, these are two different types of queries and expectations must be set, but the Google effect is real.”

Revising Expectations

So why not unleash Google, Yahoo!, or Altavista on the problem? Both Google and Yahoo, along with Microsoft, have done just that, introducing desktop search-oriented tools. But the relational database powerhouses say search in the enterprise is too important a problem to be entrusted to the Googles of the world.

Internet search doesn’t discriminate, they argue: It returns any information that turns up in response to a query. That approach won’t wash in the enterprise, for a variety of reasons, says Greg Crider, senior director of product marketing with Oracle.

“People are browsing the Web on their own, they just go to one of these Internet search sites and they get results. And they go back to their office and say, why can’t I have the same sort of experience?” he explains. “On the other hand, if you look at the point of view of what big organizations are going through today, they have all of these concerns about securing their information, about meeting compliance requirements, about dealing with privacy laws, about dealing with intellectual property.”

In essence, Crider says, organizations want to have their search and secure it, too, and that’s the market opportunity for the Oracles of the world—along with (Crider allows) possibly the IBMs and Microsofts, too. “[IT organizations are] kind of in the middle here, where users are asking for superior ease of use and these new results, but you have the business people involved on the other end demanding ways to secure this information.”

Nelson Mattos, IBM distinguished engineer and senior vice-president of Big Blue’s Information Integration product group, has outlined ambitious plans for his company’s own search strategy, which officially launched with its new version of WebSphere Information Integrator OmniFind Edition late last year. In an interview last summer, Mattos used the example of a pharmacy that deploys IBM’s enterprise search tool to search across patient records and exams and discover potentially lethal combinations of drugs.

That use case describes an altogether more sophisticated problem than traditional search. Mattos was just warming up. “The expansion of the kinds of applications that we can deal with, such as analyzing patient records to discover drugs that may have problems, give the end user a quite different experience when dealing with these enterprise search engines,” he argued. Another example, said Mattos, might involve—for instance—a knowledge worker searching for the name of the president of Brazil: Even if there isn’t a single document anywhere in the organization which explicitly states that Lula Da Silva is the president of Brazil, IBM’s search tool would be able to connect the dots by discovering otherwise isolated fragments of information.

In a certain sense, IBM is the 800-pound gorilla of enterprise search: it took the enterprise information integration (EII) market mainstream three years ago when it launched its DB2 Information Integrator product, and added enterprise search to its information management portfolio last December. What’s more, IBM last summer said it would make its Unstructured Information Management Architecture (UIMA)—which effectively powers the search capabilities of its WebSphere Information Integrator OmniFind Edition—available as open-source software. Big Blue positions UIMA as a powerful search technology that can parse text within documents and other content sources to discover latent meanings, buried relationships, and relevant facts.

In this respect, IBM’s acquisition two weeks ago of Language Analysis Systems (LAS) further bolsters its search credentials. LAS provides multi-cultural name identification, profiling, and cleansing software. Its products include NameInspector, a tool that helps identify parsing issues and erroneous names, highlights gender and culture distribution by name, and can pinpoint other name-based anomalies or insights; NameParser (its name is self-explanatory); NameClassifier (which can identify names on the basis of ethnicity or nationality); NameHunter (a name-search optimization tool); and MetaMatch. The LAS acquisition complements Big Blue’s purchase last year of SRD, a provider of identity-resolution software.

The idea, analysts say, is that IBM is cobbling together the technology pieces that will enable it to deliver on the vision of search Mattos outlined last year.

Oracle also stitched together its search technology by acquiring technology from outside the company: some of the technology assets it first acquired to flesh out its Project Fusion middleware initiative will soon be put to work in its upcoming enterprise search tool. Last March, Oracle acquired identity management specialist Oblix, which—along with a pair of content-management-related acquisitions (TripleHop Technologies Inc., Context Media)—helps complement its traditional data management and query expertise with identity-based access and content management capabilities. Neither Big Blue nor Oracle is the last word in enterprise search, of course. Microsoft Corp. and Google Inc. have both developed search tools for desktop systems (Microsoft has a SharePoint-based enterprise search strategy, too), and much of the “legwork” of enterprise search has also been tackled by enterprise information integration (EII) pure plays such as Composite Software, which markets federated data access solutions.

A Two-Pronged Problem

Eckerson says the search market basically has two dimensions, which in turn describe two very different technology problems.

The first is largely business intelligence-related: users are demanding (or organizations are attempting to provide them with) easier ways to submit ad hoc queries. This more closely resembles a classic BI problem than the second, which has a classic information management impetus. This is the Google phenomenon, in which organizations are embracing search as a way to query both structured and unstructured data.

To date, Eckerson says, there isn’t yet a consensus on how best to address either of these issues. “There seem to be two approaches to querying unstructured and structured data today,” he explains. “[The first is to] index all the data using search technology, [and the second is to] parse the unstructured data using text mining tools—[effectively] ETL for text—and natural language processing tools [that understand semantics] and put the data in a relational schema where it can be queried.”

Related Article:

About the Author

Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.