XML seen as godsend to researchers
Unstructured data on the Internet confronts business users with too much
information and not enough business intelligence, according to Sundar Kadayam,
chief technology officer at Intelliseek Inc., Cincinnati.
How, for example, is the product manager for an automobile manufacturer going
to make sense of all the comments drivers post on various Web bulletin boards?
How many e-mails must the marketing group read to find out that buyers hate a
particular color or love one model's five-speed transmission?
Kadayam has been tapped to lead Intelliseek's new Applied Research Center
(ARC) in a major R&D effort to provide a technology answer for the age old
question: ''What are people really saying about us?''
He said a combination of data mining and XML is the key to taking such
unstructured data and providing business users with what he calls ''nuggets of
''The purpose of the ARC team is to help us maintain leadership in this whole
area of mining unstructured data,'' Kadayam said. ''That basically means how do I
read free text, discern key topics within it, categorize them, and then identify
the entities, relationships, comparative information, sentiment and so
Intelliseek's current technology uses XML tagging to help make sense of the
unstructured data and then deliver it to customer's via Web interfaces or
directly into their applications via SOAP, Kadayam said.
''The method that we have used to get the greatest value from these processes
is to take unstructured data, put it through our proprietary mining processes
and essentially lend structure to the data in the form of XML,'' he
If the unstructured data, including both posts on Internet chat boards and
e-mails to a company Web site, can be transformed and summarized, Kadayam said,
it can provide valuable insights into competitive consumer-based businesses.
''There are hundreds of Internet communities where discussions are happening
about various automotive industry products, brands and models,'' he said, using
as an example an unnamed Intelliseek customer who is a carmaker. ''In this
particular context there is quite a rich amount of insight into what consumers
like and do not like, and what competitive comparisons are being drawn up. That
is extremely useful for marketing and brand managers, who are essentially
responsible for the formulation of new features and capabilities and new vehicle
models. Our technology allows for users in the auto industry to get a very
highly qualified view of what all this buzz is indicating.''
Kadayam's team at ARC, which will be working to advance these XML, Internet
data mining, and business intelligence technologies, include the following
experts from the academic world:
Greg Bishop, Ph.D. (Ohio State),
specializing in information retrieval, information filtering and classification;
Natalie Glance, Ph.D. (Stanford), specializing in personalization, information
extraction and collaborative systems; Matthew Hurst, Ph.D. (University of
Edinburgh), specializing in text mining, message understanding and document
analysis; Kamal Nigam, Ph.D. (Carnegie Mellon), specializing in information
extraction, data mining and machine learning; Matthew Siegler, Ph.D. (Carnegie
Mellon), specializing in information retrieval and speech recognition; Robert
Stockton, MS (Carnegie Mellon), specializing in software architecture, machine
learning and computer vision; and Takashi Tomokiyo, MS (Kyushu University and
Carnegie Mellon), specializing in information extraction and statistical
For more information, click on http://www.intelliseek.com/.
Rich Seeley is Web Editor for Campus Technology.