In-Depth

Knowledge portals unify data streams

Over the years, different types of software tools have tried, with limited success, to address this issue, from groupware and collaborative tools to document management, to information search and retrieval to knowledge management. The rise of the Internet/intranet/extranet has only exacerbated the problem, bringing even more disparate and far-flung data to the corporate network. Turning both structured and unstructured data into information, or knowledge, to better accomplish business objectives is still a quixotic quest in many ways.

But the trend toward Internet portals such as Yahoo!, which organize, categorize and even personalize information, and that also provide a single point of access to things like online shopping, news feeds and stock trading, is beginning to be reflected in the corporate world with the emergence of corporate portals.

According to the Delphi Group, a corporate portal sits on top of the intranet architecture, providing a single point of access to internal as well as external data. "Corporate portals go far beyond integrating information spread across a company's intranet only," said Larry Hawes, senior analyst at Delphi Group, Boston. "Portals also provide a unified view of information from the Internet and from internal legacy systems, including ERP and mainframe systems."

While very early in the market, corporate portals promise to deliver on helping to turn data into information by organizing and categorizing it, as well as putting some meaning around it and then getting it to the people who need it to do their jobs. "Information is data put together in revealing ways," said Ian Black, head of communications and public affairs for British Aerospace, Farnborough, Hampshire, England.

Corporate portals represent the coming together of various "knowledge" technologies. "I see the corporate portal as an evolution from document management, encompassing document management and knowledge management," said Jim Wessely, senior consultant, Central Research & Development at DuPont, Wilmington, Del. "Document management never did hit the mainstream, and knowledge management has been vague at best. The concepts embodied in those discrete market areas are encompassed in a portal. To make centralized information available, and to receive the information you need when you need it, you've got to have document management, data mining, text mining, knowledge sharing and acquisition, and access to knowledge capital. It means the portal
really is the encompassing body."

A corporate portal's primary function is quite straightforward, sums up Derek Binney, marketing director of the corporate knowledge program, Computer Sciences Corp., El Segundo, Calif. "A corporate portal is aimed at integrating knowledge management with doing real work; you don't have to go somewhere else to do the 'knowledge' thing," noted Binney.

Searchers on target

The Delphi Group projects that the cororate portal software market will reach $740 million in 2001. Today, there are about 40 ISVs with applications or tools that provide one or more elements of portal functionality, according to Delphi. These ISVs come from a variety of established product categories: collaboration, search, knowledge management, information aggregation and publishing, Internet, document management, business intelligence and ERP. Each brings something different to the table. In addition, a new category of software, the Enterprise Information Portal (EIP) is emerging, with new players such as Columbia, Md.-based Sequoia Software, which offers an XML-based framework.

Three companies with a heritage in information search and retrieval that are targeting the corporate portal market are Verity Inc., Semio Corp. and Autonomy Corp. Where general purpose Internet search engines go broad, often turning up an overwhelming number of "hits," these information-retrieval products go deep, turning up more targeted and meaningful "hits." Delphi also puts Dataware, Cambridge, Mass., Excalibur Technologies Corp., Vienna, Va., and Toronto-based Hummingbird Communications Ltd. in this category.

As companies develop more sophisticated intranets, they are demanding more from their retrieval products. "Two years ago, customers were poorly educated about the power of retrieval technology," said Ronald Weissman, vice president strategy, corporate marketing at Verity Inc., Sunnyvale, Calif. "The Web has done a great job of educating people in the basics.

"Everybody knows how to do a search in Lycos," he continued. "But in the corporate setting, where you're dealing with legacy data and problems of classification and security, that's a much more sophisticated [search] than popping up Web pages. Most customers don't understand linguistics or semantics, but they have a clear sense of what they want in a business perspective. The corporate portal has redefined that territory."

What businesses want is a way to bring order and usability to chaos. "We have tons of data all over the company in discrete islands, and in different formats and technologies, from DB2 to Notes to document management systems to individual file systems to paper archives," said DuPont's Wessely. "A lot of this stuff isn't doing a heck of a lot of good; the trick is to get it out where it can be used."

In response to this demand for more sophistication, search-and-retrieval technology vendors like Verity, Semio and Autonomy have "moved well beyond simple keyword indexing and searching to offer advanced full-text capabilities, including automatic information categorization and concept-based analysis of documents," said Delphi's Hawes. "They can classify single pieces of information and then relate them to other information based on concepts articulated in their content. The value that these products bring to the corporate portal environment is their ability to automatically and transparently analyze, organize and reveal structured and unstructured information from multiple internal and external sources."

Each of these vendors takes a very different approach to search and retrieval. Verity's heritage is in keyword searching and Boolean logic. Semio uses a pattern recognition approach to match semantically related terms. Autonomy, which has its roots in neural networking technology and was spun out of Neurodynamics in 1996, uses adaptive pattern recognition to find related ideas. And while Verity and Autonomy sell their software solutions, Semio's model is one of a service provider.

"One of the key differentiators between Autonomy, Semio and Verity is the level of user involvement needed to construct the initial taxonomy that drives categorization and concept clustering," explained Hawes. "Semio is zero-based; it automatically generates a taxonomy without user input. Verity requires an administrator to develop an extensive taxonomy before it can group information. Autonomy occupies a middle ground in that it can start with a simple pre-built taxonomy and create a more elaborate one as it analyzes document collections."

A critical differentiator, noted Hawes, "is the completeness of portal functionality that each vendor supplies in its product. Autonomy's Portal-in-a-Box and Verity's Knowledge Organizer provide much of the functionality portal implementers need to begin to support a portal project. Semio's products form a less complete toolkit and are not integrated into an out-of-the box portal solution."

Verity is the most established of these three players, while Autonomy and Semio are relative newcomers. Verity's Weissman said the company's philosophy behind its "topics" technology is a combination of human and computer skills. "Our customers find that some human-authored business rules give you a more precise determination of what you're looking for," he noted. Said Delphi's Hawes, "Verity's 'topics' approach allows for highly accurate control of the categorization process and the application of human library science."

Verity's knowledge retrieval suite of products includes the following pieces: Information Server, Intranet Spider, Agent Server, Knowledge Organizer, CD Web Publisher, Verity HTML Expert and KeyView Pro.

At Computer Sciences Corp. (CSC),Verity technology plays a key role in the knowledge environment, which consists of the knowledge base, collaborative technologies and communities of interest, and a corporate portal. CSC supports 55,000 users on IBM Netfinity equipment, hosted in four major centers around the world. CSC's Binney said the company wanted an engine that provided a universal search of CSC's knowledge base without needing to know where the data was. At CSC, that knowledge-base data resides in Lotus Notes databases and Domino-enabled Web content, relational databases, the company intranet and outside news feeds. In addition, Verity "drives one of the major taps off the portal." Binney said CSC used Cold Fusion to develop the portal, and has plugged in best-of-breed technology.

CSC is also in the process of negotiating an alliance agreement with Verity, which would give CSC access to Verity development and research.

While CSC uses Verity to enhance internal users' experience, New York City-based Chase Manhattan uses Verity technology to enhance its external customers' experience. Chase recently relaunched its Web site to be more customer-centric. Chase liked Verity because of its functionality out of the box, said Jennifer Baliotti, product manager for the search engine at Chase.

"We only had four months to gather the bank-wide requirements, select the product and implement. We didn't have a lot of time to customize, and we had specific needs," she said. "For example, we've broken the site into avenues that speak to different customer segments." These segments are personal, small business, and corporate and institutional. "When customers use the search engine, we want to provide them with results specific to their avenue of interest."

A feature unique to Verity among the products Chase evaluated was the ability to highlight the words in the search results. "So every time a customer searched for 'loan,' the word 'loan' would be highlighted each time it appeared on a result page; you would have had to customize other apps to get that feature," said Baliotti.

Dupont's experience

For users who know what they are looking for, such as loan information, text-retrieval technology such as Verity's enables "you to retrieve as much as you can about what you know is there," said DuPont's Wessely. For those who wish to discover which information is relevant, Redwood Shores, Calif.-based Semio fits the bill, said Wessely.

For the past five years, Wessely has been researching ways to mine textual information, and to show primary, secondary and even tertiary relationships among that information, as DuPont seeks to become a knowledge company internally. Wessely said he has researched neural network-type technology, AI rules-based natural language technology, semantic analysis and more, filling a file cabinet with enough technical esoterica "to choke a horse." For DuPont's purposes, Wessely "found two approaches that can make sense: some of the neural [network technology] and what Semio is doing.

"Semio takes into account how the language is used," he said, "as opposed to most neurals, which rely upon the proximity relationships of terms to determine association. Both are valid approaches."

Wessely's first use of Semio was as an add-on to a departmental virtual library, which he developed several years ago for the business development group. His challenge at the time was to corral all the different sources feeding into the company's library for this discipline, from commercial reports and databases, to FoxPro and Excel files to books on shelves, and to "put it into one coherent repository that was easily approached" so people could gain insight without physically going to the library.

The virtual library provided numerous ways to access information, from a table of contents, to a search to various links. Wessely chose the Verity search engine, he said, because of its robustness, scalability and flexibility.

Then he discovered SemioMap, which creates visual, navigable representations of text-based content, and added it to the virtual library so people could "get info they didn't know about.

"When you have a search engine, you know what you're looking for and what you want extracted," said Wessely. "On the other side of the coin, you have a body of information you want to turn into knowledge; that's where mining and a discovery engine come in." SemioMap shows the relationships between ideas or phrases so "I'm able to 'connect the dots,'" he added.

Semio has since rolled out a taxonomy-generation tool, which it is providing as part of a monthly service. According to President and CEO Roger Ferguson, Semio consultants will go on-site and work with project leaders to define where the data is coming from and what the company wants the taxonomy to do. Semio will then build the taxonomy and update and maintain it over time.

"What we produce looks like a Yahoo!-style directory," explained Ferguson. "You would look for things by browsing the directory, which goes from general to specific. The difference between us and a search engine is that the user generates his own notion of what he's looking at. We present you with a list of concepts found in the documents, and you select the concepts meaningful to you. You can start with not knowing very much and quickly hone in on what's meaningful."

He added, "A unique feature is that you can define and build multiple taxonomies for the same set of documents. For example, the HR department might want to look at a document in a different way than engineering."

While Semio will rent its taxonomy-generation software, Ferguson said they do not encourage it because "there's still art involved" with building and maintaining a taxonomy.

DuPont's Wessely said Semio's service model is a good one, because "Semio understands what it takes to generate a taxonomy. In general, most people will be stumped. However, there will be an occasion [for some companies] where nobody external to the company is allowed to see the data, whether because it's private or a matter of national security, for example. At that point, Semio will have to give out the tool. If I were them, I would not target those types of application areas right off the bat."

As a Fortune 10 company, though, DuPont is working closely with Semio and learning how to generate a taxonomy, said Wessely. He expects this feature to play a role in the company's planned corporate portal, for which he is the principal architect. While the portal is under development and subject to change, he said it will be aimed at internal employees and appropriate contractors. DuPont is developing a portal prototype because "we need to empower the knowledge worker. Without the enabling technology to do that, it will never happen."

Even with the technology, moving users to a knowledge mode and getting them to take advantage of things like corporate portals takes some education and training, as well as a cultural shift, suggests British Aerospace's Black. "The single biggest issue with all this technology is behavioral, not IT. It doesn't matter how good the technology is built to understand the user if the user refuses to take advantage." Getting users to understand how to interact with all this information coming back through such a small window takes time, he said. "The technology is scalable, but the users are not."

Portal-in-a-Box

British Aerospace is using San Francisco-based Autonomy's Portal-in-a-Box as the customer-facing portal in its intranet, as well as a portal to the company's virtual university. The company moved toward portals because "we were looking at an intelligent way to make sense of the huge amount of unstructured data that sat on our own networks, and that we could also [use to] make sense of structured data as well," said Black. "Last year, we started as a way to find a more intelligent process for searching, but now it's more about the management of information and data; it's much more than searching."

The goal is simple, he said -- to get the right information to the right person, at the right time. A unique feature of Autonomy's technology is that registered users can have their own Web page, with information tailored to them. Users create agents that go out and bring back information, and are then automatically retrained as the users' search becomes more refined. The tool "brings information to someone about something they didn't even know they wanted to know," said Black.

Black selected Autonomy's product, which he considers second-generation technology, after looking at first-generation search-and-text tools. He said the cost of operation for Portal-in-a-Box is low, and it is easy to set up and use. "It has been written well, very tight, and it hasn't suffered years of bits and bolts being added to it. It literally comes out of the box and runs." However, he added, "it would add a lot of configuration [time] if you really want to get it bolted into your organization" by integrating it with applications.

That is what the Department of Defense is doing, said First Lieutenant George Hellstern with the Chief Technical Task Force at the DoD in Crystal City, Va. The DoD is integrating Autonomy's product with the Joint Reserve Intelligence Support System. "For our application, we're doing a lot of customized things. For general use, it's not something you need to customize at all."

The goal is twofold: to develop a list of all IT skills throughout the DoD, and to have a knowledge management tool for the department. Hellstern said when a user fills out a resume on the Web site, it will produce a portal tailored to the individual's needs or what they do. Then each user will have agents, as well as the ability to add new agents, which will be constantly looking for the latest news on a particular topic, say aircraft maintenance.

Portal-in-a-Box gives users control over what gets returned to them, which reduces the chance of getting information back that may not relate. However, Hellstern did say that a downside is that the user has to point at what he or she wants to look at. "You have to have an idea of where you want to look. Say if I was searching for colleges and didn't put in .edu, then nothing would get searched at .edu."

Personal and critical

Hellstern, however, praises the personalization features of Autonomy, which Verity officials said the company is working on and Semio said it is considering.

"Personalization is a critical element of both knowledge management and corporate portal applications," said Delphi's Hawes. "End users need to be exposed to relevant information only, rather than wading through the overwhelming amount of information available to them. The rise of e-business demands that knowledge workers be able to control their computing interface so they can quickly access relevant information and participate in the business processes germane to their role in the organization. An interface is not a portal unless it can be personalized manually by the user and/or automatically by leveraging attributes contained in a user profile."

Added Michael Lynch, Autonomy CEO, managing director and founder, "With older technology, you can only put in a query. With new technology, it can read whole pages. While you're working, the system reads what's on your [personal] page and pops up with information that might help you with what you're working on. It brings that work to your work."

And getting work done more efficiently and effectively is what corporate portals and knowledge management are all about. But different companies have different needs, and information-retrieval tools like Autonomy, Verity and Semio all bring different features to the table. Some companies, like DuPont, may end up using two or more of these products together as part of their knowledge architecture.

Delphi's Hawes adds that "the search-and-retrieval tool is only one of many necessary to provide a comprehensive portal application. Specifically, in addition to categorization and search capabilities, a portal will need integration with other enterprise applications, collaborative facilities, publishing and distribution features, and business process support."

He also warns that few portal products at this early stage are complete. "A company must choose a specific product as a base for a corporate portal development project and integrate the functionality of other software offerings to complete the picture."

DuPont's Wessely said his company will use open technology and take a best-of-breed approach, believing that no one product can be everything to everybody, especially a Fortune 10 company like DuPont. "If someone has a complete portal to plug in, I know it can't do everything I have to do," he said. It will come down to a business decision for many companies, he believes: "Doing it right vs. just getting by."

In the meantime, as companies begin building their corporate portals, they are hoping to achieve the promise that "knowledge is power."