XML seen as godsend to researchers

Unstructured data on the Internet confronts business users with too much information and not enough business intelligence, according to Sundar Kadayam, chief technology officer at Intelliseek Inc., Cincinnati.

How, for example, is the product manager for an automobile manufacturer going to make sense of all the comments drivers post on various Web bulletin boards? How many e-mails must the marketing group read to find out that buyers hate a particular color or love one model's five-speed transmission?

Kadayam has been tapped to lead Intelliseek's new Applied Research Center (ARC) in a major R&D effort to provide a technology answer for the age old question: ''What are people really saying about us?''

He said a combination of data mining and XML is the key to taking such unstructured data and providing business users with what he calls ''nuggets of information.''

''The purpose of the ARC team is to help us maintain leadership in this whole area of mining unstructured data,'' Kadayam said. ''That basically means how do I read free text, discern key topics within it, categorize them, and then identify the entities, relationships, comparative information, sentiment and so forth.''

Intelliseek's current technology uses XML tagging to help make sense of the unstructured data and then deliver it to customer's via Web interfaces or directly into their applications via SOAP, Kadayam said.

''The method that we have used to get the greatest value from these processes is to take unstructured data, put it through our proprietary mining processes and essentially lend structure to the data in the form of XML,'' he explained.

If the unstructured data, including both posts on Internet chat boards and e-mails to a company Web site, can be transformed and summarized, Kadayam said, it can provide valuable insights into competitive consumer-based businesses.

''There are hundreds of Internet communities where discussions are happening about various automotive industry products, brands and models,'' he said, using as an example an unnamed Intelliseek customer who is a carmaker. ''In this particular context there is quite a rich amount of insight into what consumers like and do not like, and what competitive comparisons are being drawn up. That is extremely useful for marketing and brand managers, who are essentially responsible for the formulation of new features and capabilities and new vehicle models. Our technology allows for users in the auto industry to get a very highly qualified view of what all this buzz is indicating.''

Kadayam's team at ARC, which will be working to advance these XML, Internet data mining, and business intelligence technologies, include the following experts from the academic world:
Greg Bishop, Ph.D. (Ohio State), specializing in information retrieval, information filtering and classification; Natalie Glance, Ph.D. (Stanford), specializing in personalization, information extraction and collaborative systems; Matthew Hurst, Ph.D. (University of Edinburgh), specializing in text mining, message understanding and document analysis; Kamal Nigam, Ph.D. (Carnegie Mellon), specializing in information extraction, data mining and machine learning; Matthew Siegler, Ph.D. (Carnegie Mellon), specializing in information retrieval and speech recognition; Robert Stockton, MS (Carnegie Mellon), specializing in software architecture, machine learning and computer vision; and Takashi Tomokiyo, MS (Kyushu University and Carnegie Mellon), specializing in information extraction and statistical language processing.

For more information, click on

About the Author

Rich Seeley is Web Editor for Campus Technology.


Upcoming Events


Sign up for our newsletter.

Terms and Privacy Policy consent

I agree to this site's Privacy Policy.