In-Depth
Knowledge portals unify data streams
- By Colleen Frye
- June 12, 2001
Over the years, different types of software tools have tried, with limited
success, to address this issue, from groupware and collaborative tools
to document management, to information search and retrieval to knowledge
management. The rise of the Internet/intranet/extranet has only exacerbated
the problem, bringing even more disparate and far-flung data to the corporate
network. Turning both structured and unstructured data into information,
or knowledge, to better accomplish business objectives is still a quixotic
quest in many ways.
But the trend toward Internet portals such as Yahoo!, which organize,
categorize and even personalize information, and that also provide a single
point of access to things like online shopping, news feeds and stock trading,
is beginning to be reflected in the corporate world with the emergence
of corporate portals.
According to the Delphi Group, a corporate portal sits on top of the
intranet architecture, providing a single point of access to internal
as well as external data. "Corporate portals go far beyond integrating
information spread across a company's intranet only," said Larry Hawes,
senior analyst at Delphi Group, Boston. "Portals also provide a unified
view of information from the Internet and from internal legacy systems,
including ERP and mainframe systems."
While very early in the market, corporate portals promise to deliver
on helping to turn data into information by organizing and categorizing
it, as well as putting some meaning around it and then getting it to the
people who need it to do their jobs. "Information is data put together
in revealing ways," said Ian Black, head of communications and public
affairs for British Aerospace, Farnborough, Hampshire, England.
Corporate portals represent the coming together of various "knowledge"
technologies. "I see the corporate portal as an evolution from document
management, encompassing document management and knowledge management,"
said Jim Wessely, senior consultant, Central Research & Development
at DuPont, Wilmington, Del. "Document management never did hit the mainstream,
and knowledge management has been vague at best. The concepts embodied
in those discrete market areas are encompassed in a portal. To make centralized
information available, and to receive the information you need when you
need it, you've got to have document management, data mining, text mining,
knowledge sharing and acquisition, and access to knowledge capital. It
means the portal
really is the encompassing body."
A corporate portal's primary function is quite straightforward, sums
up Derek Binney, marketing director of the corporate knowledge program,
Computer Sciences Corp., El Segundo, Calif. "A corporate portal is aimed
at integrating knowledge management with doing real work; you don't have
to go somewhere else to do the 'knowledge' thing," noted Binney.
Searchers on target
The Delphi Group projects that the cororate portal software market will
reach $740 million in 2001. Today, there are about 40 ISVs with applications
or tools that provide one or more elements of portal functionality, according
to Delphi. These ISVs come from a variety of established product categories:
collaboration, search, knowledge management, information aggregation and
publishing, Internet, document management, business intelligence and ERP.
Each brings something different to the table. In addition, a new category
of software, the Enterprise Information Portal (EIP) is emerging, with
new players such as Columbia, Md.-based Sequoia Software, which offers
an XML-based framework.
Three companies with a heritage in information search and retrieval
that are targeting the corporate portal market are Verity Inc., Semio
Corp. and Autonomy Corp. Where general purpose Internet search engines
go broad, often turning up an overwhelming number of "hits," these information-retrieval
products go deep, turning up more targeted and meaningful "hits." Delphi
also puts Dataware, Cambridge, Mass., Excalibur Technologies Corp., Vienna,
Va., and Toronto-based Hummingbird Communications Ltd. in this category.
As companies develop more sophisticated intranets, they are demanding more
from their retrieval products. "Two years ago, customers were poorly educated
about the power of retrieval technology," said Ronald Weissman, vice president
strategy, corporate marketing at Verity
Inc., Sunnyvale, Calif. "The Web has done a great job of educating people
in the basics.
"Everybody knows how to do a search in Lycos," he continued. "But in
the corporate setting, where you're dealing with legacy data and problems
of classification and security, that's a much more sophisticated [search]
than popping up Web pages. Most customers don't understand linguistics
or semantics, but they have a clear sense of what they want in a business
perspective. The corporate portal has redefined that territory."
What businesses want is a way to bring order and usability to chaos.
"We have tons of data all over the company in discrete islands, and in
different formats and technologies, from DB2 to Notes to document management
systems to individual file systems to paper archives," said DuPont's Wessely. "A lot of this stuff
isn't doing a heck of a lot of good; the trick is to get it out where
it can be used."
In response to this demand for more sophistication, search-and-retrieval
technology vendors like Verity, Semio and Autonomy have "moved well beyond
simple keyword indexing and searching to offer advanced full-text capabilities,
including automatic information categorization and concept-based analysis
of documents," said Delphi's Hawes. "They can classify single pieces of
information and then relate them to other information based on concepts
articulated in their content. The value that these products bring to the
corporate portal environment is their ability to automatically and transparently
analyze, organize and reveal structured and unstructured information from
multiple internal and external sources."
Each of these vendors takes a very different approach to search and
retrieval. Verity's heritage is in keyword searching and Boolean logic.
Semio uses a pattern recognition approach to match semantically related
terms. Autonomy, which has its roots in neural networking technology and
was spun out of Neurodynamics in 1996, uses adaptive pattern recognition
to find related ideas. And while Verity and Autonomy sell their software
solutions, Semio's model is one of a service provider.
"One of the key differentiators between Autonomy, Semio and Verity is
the level of user involvement needed to construct the initial taxonomy
that drives categorization and concept clustering," explained Hawes. "Semio
is zero-based; it automatically generates a taxonomy without user input.
Verity requires an administrator to develop an extensive taxonomy before
it can group information. Autonomy occupies a middle ground in that it
can start with a simple pre-built taxonomy and create a more elaborate
one as it analyzes document collections."
A critical differentiator, noted Hawes, "is the completeness of portal
functionality that each vendor supplies in its product. Autonomy's Portal-in-a-Box
and Verity's Knowledge Organizer provide much of the functionality portal
implementers need to begin to support a portal project. Semio's products
form a less complete toolkit and are not integrated into an out-of-the
box portal solution."
Verity is the most established of these three players, while Autonomy
and Semio are relative newcomers. Verity's Weissman said the company's
philosophy behind its "topics" technology is a combination of human and
computer skills. "Our customers find that some human-authored business
rules give you a more precise determination of what you're looking for,"
he noted. Said Delphi's Hawes, "Verity's 'topics' approach allows for
highly accurate control of the categorization process and the application
of human library science."
Verity's knowledge retrieval suite of products includes the following
pieces: Information Server, Intranet Spider, Agent Server, Knowledge Organizer,
CD Web Publisher, Verity HTML Expert and KeyView Pro.
At Computer Sciences Corp. (CSC),Verity technology plays a key role
in the knowledge environment, which consists of the knowledge base, collaborative
technologies and communities of interest, and a corporate portal. CSC
supports 55,000 users on IBM Netfinity equipment, hosted in four major
centers around the world. CSC's Binney said the company wanted an engine
that provided a universal search of CSC's knowledge base without needing
to know where the data was. At CSC, that knowledge-base data resides in
Lotus Notes databases and Domino-enabled Web content, relational databases,
the company intranet and outside news feeds. In addition, Verity "drives
one of the major taps off the portal." Binney said CSC used Cold Fusion
to develop the portal, and has plugged in best-of-breed technology.
CSC is also in the process of negotiating an alliance agreement with
Verity, which would give CSC access to Verity development and research.
While CSC uses Verity to enhance internal users' experience, New York
City-based Chase Manhattan uses Verity technology to enhance its external
customers' experience. Chase recently relaunched its Web site to be more
customer-centric. Chase liked Verity because of its functionality out
of the box, said Jennifer Baliotti, product manager for the search engine
at Chase.
"We only had four months to gather the bank-wide requirements, select
the product and implement. We didn't have a lot of time to customize,
and we had specific needs," she said. "For example, we've broken the site
into avenues that speak to different customer segments." These segments
are personal, small business, and corporate and institutional. "When customers
use the search engine, we want to provide them with results specific to
their avenue of interest."
A feature unique to Verity among the products Chase evaluated was the
ability to highlight the words in the search results. "So every time a
customer searched for 'loan,' the word 'loan' would be highlighted each
time it appeared on a result page; you would have had to customize other
apps to get that feature," said Baliotti.
Dupont's experience
For users who know what they are looking for, such as loan information, text-retrieval
technology such as Verity's enables "you to retrieve as much as you can about
what you know is there," said DuPont's Wessely. For those who wish to discover
which information is relevant, Redwood Shores, Calif.-based Semio
fits the bill, said Wessely.
For the past five years, Wessely has been researching ways to mine textual
information, and to show primary, secondary and even tertiary relationships
among that information, as DuPont seeks to become a knowledge company
internally. Wessely said he has researched neural network-type technology,
AI rules-based natural language technology, semantic analysis and more,
filling a file cabinet with enough technical esoterica "to choke a horse."
For DuPont's purposes, Wessely "found two approaches that can make sense:
some of the neural [network technology] and what Semio is doing.
"Semio takes into account how the language is used," he said, "as opposed
to most neurals, which rely upon the proximity relationships of terms
to determine association. Both are valid approaches."
Wessely's first use of Semio was as an add-on to a departmental virtual
library, which he developed several years ago for the business development
group. His challenge at the time was to corral all the different sources
feeding into the company's library for this discipline, from commercial
reports and databases, to FoxPro and Excel files to books on shelves,
and to "put it into one coherent repository that was easily approached"
so people could gain insight without physically going to the library.
The virtual library provided numerous ways to access information, from
a table of contents, to a search to various links. Wessely chose the Verity
search engine, he said, because of its robustness, scalability and flexibility.
Then he discovered SemioMap, which creates visual, navigable representations
of text-based content, and added it to the virtual library so people could
"get info they didn't know about.
"When you have a search engine, you know what you're looking for and
what you want extracted," said Wessely. "On the other side of the coin,
you have a body of information you want to turn into knowledge; that's
where mining and a discovery engine come in." SemioMap shows the relationships
between ideas or phrases so "I'm able to 'connect the dots,'" he added.
Semio has since rolled out a taxonomy-generation tool, which it is providing
as part of a monthly service. According to President and CEO Roger Ferguson,
Semio consultants will go on-site and work with project leaders to define
where the data is coming from and what the company wants the taxonomy
to do. Semio will then build the taxonomy and update and maintain it over
time.
"What we produce looks like a Yahoo!-style directory," explained Ferguson.
"You would look for things by browsing the directory, which goes from
general to specific. The difference between us and a search engine is
that the user generates his own notion of what he's looking at. We present
you with a list of concepts found in the documents, and you select the
concepts meaningful to you. You can start with not knowing very much and
quickly hone in on what's meaningful."
He added, "A unique feature is that you can define and build multiple
taxonomies for the same set of documents. For example, the HR department
might want to look at a document in a different way than engineering."
While Semio will rent its taxonomy-generation software, Ferguson said
they do not encourage it because "there's still art involved" with building
and maintaining a taxonomy.
DuPont's Wessely said Semio's service model is a good one, because "Semio
understands what it takes to generate a taxonomy. In general, most people
will be stumped. However, there will be an occasion [for some companies]
where nobody external to the company is allowed to see the data, whether
because it's private or a matter of national security, for example. At
that point, Semio will have to give out the tool. If I were them, I would
not target those types of application areas right off the bat."
As a Fortune 10 company, though, DuPont is working closely with Semio
and learning how to generate a taxonomy, said Wessely. He expects this
feature to play a role in the company's planned corporate portal, for
which he is the principal architect. While the portal is under development
and subject to change, he said it will be aimed at internal employees
and appropriate contractors. DuPont is developing a portal prototype because
"we need to empower the knowledge worker. Without the enabling technology
to do that, it will never happen."
Even with the technology, moving users to a knowledge mode and getting
them to take advantage of things like corporate portals takes some education
and training, as well as a cultural shift, suggests British Aerospace's
Black. "The single biggest issue with all this technology is behavioral,
not IT. It doesn't matter how good the technology is built to understand
the user if the user refuses to take advantage." Getting users to understand
how to interact with all this information coming back through such a small
window takes time, he said. "The technology is scalable, but the users
are not."
Portal-in-a-Box
British Aerospace is using San Francisco-based Autonomy's
Portal-in-a-Box as the customer-facing portal in its intranet, as well as
a portal to the company's virtual university. The company moved toward portals
because "we were looking at an intelligent way to make sense of the huge amount
of unstructured data that sat on our own networks, and that we could also [use
to] make sense of structured data as well," said Black. "Last year, we started
as a way to find a more intelligent process for searching, but now it's more about the management of information and
data; it's much more than searching."
The goal is simple, he said -- to get the right information to the right
person, at the right time. A unique feature of Autonomy's technology is
that registered users can have their own Web page, with information tailored
to them. Users create agents that go out and bring back information, and
are then automatically retrained as the users' search becomes more refined.
The tool "brings information to someone about something they didn't even
know they wanted to know," said Black.
Black selected Autonomy's product, which he considers second-generation
technology, after looking at first-generation search-and-text tools. He
said the cost of operation for Portal-in-a-Box is low, and it is easy
to set up and use. "It has been written well, very tight, and it hasn't
suffered years of bits and bolts being added to it. It literally comes
out of the box and runs." However, he added, "it would add a lot of configuration
[time] if you really want to get it bolted into your organization" by
integrating it with applications.
That is what the Department of Defense is doing, said First Lieutenant
George Hellstern with the Chief Technical Task Force at the DoD in Crystal
City, Va. The DoD is integrating Autonomy's product with the Joint Reserve
Intelligence Support System. "For our application, we're doing a lot of
customized things. For general use, it's not something you need to customize
at all."
The goal is twofold: to develop a list of all IT skills throughout the
DoD, and to have a knowledge management tool for the department. Hellstern
said when a user fills out a resume on the Web site, it will produce a
portal tailored to the individual's needs or what they do. Then each user
will have agents, as well as the ability to add new agents, which will
be constantly looking for the latest news on a particular topic, say aircraft
maintenance.
Portal-in-a-Box gives users control over what gets returned to them,
which reduces the chance of getting information back that may not relate.
However, Hellstern did say that a downside is that the user has to point
at what he or she wants to look at. "You have to have an idea of where
you want to look. Say if I was searching for colleges and didn't put in
.edu, then nothing would get searched at .edu."
Personal and critical
Hellstern, however, praises the personalization features of Autonomy,
which Verity officials said the company is working on and Semio said it
is considering.
"Personalization is a critical element of both knowledge management
and corporate portal applications," said Delphi's Hawes. "End users need
to be exposed to relevant information only, rather than wading through
the overwhelming amount of information available to them. The rise of
e-business demands that knowledge workers be able to control their computing
interface so they can quickly access relevant information and participate
in the business processes germane to their role in the organization. An
interface is not a portal unless it can be personalized manually by the
user and/or automatically by leveraging attributes contained in a user
profile."
Added Michael Lynch, Autonomy CEO, managing director and founder, "With
older technology, you can only put in a query. With new technology, it
can read whole pages. While you're working, the system reads what's on
your [personal] page and pops up with information that might help you
with what you're working on. It brings that work to your work."
And getting work done more efficiently and effectively is what corporate
portals and knowledge management are all about. But different companies
have different needs, and information-retrieval tools like Autonomy, Verity
and Semio all bring different features to the table. Some companies, like
DuPont, may end up using two or more of these products together as part
of their knowledge architecture.
Delphi's Hawes adds that "the search-and-retrieval tool is only one
of many necessary to provide a comprehensive portal application. Specifically,
in addition to categorization and search capabilities, a portal will need
integration with other enterprise applications, collaborative facilities,
publishing and distribution features, and business process support."
He also warns that few portal products at this early stage are complete.
"A company must choose a specific product as a base for a corporate portal
development project and integrate the functionality of other software
offerings to complete the picture."
DuPont's Wessely said his company will use open technology and take
a best-of-breed approach, believing that no one product can be everything
to everybody, especially a Fortune 10 company like DuPont. "If someone
has a complete portal to plug in, I know it can't do everything I have
to do," he said. It will come down to a business decision for many companies,
he believes: "Doing it right vs. just getting by."
In the meantime, as companies begin building their corporate portals,
they are hoping to achieve the promise that "knowledge is power."