Data mining comes of age

A decade-long effort to build useful data marts and data warehouses has set the stage for a new generation of analytical application software tools that find patterns in the vast quantities of corporate data available for consumption. Some call it data mining.

At its heart are algorithms that pursue data, find trends and, sometimes, predict events. It's increasingly an area of interest. Analyst firm IDC estimates that the data mining tools market will reach $1.85 billion in 2006.

Data mining is not an altogether popular term, nor entirely new. Its goal is the mining of “information,” critics point out. Moreover, finding nuggets in hills of data has long been an objective. Mining has always been the point behind information access tools, OLAP data stages and the like.

But, as data integration best practices become better known, and useful infrastructures are put in place, attention turns to the analytical tools that one applies to the data. Both successes and failures in the fast-moving CRM segment drive interest in analytics.

Once composed mainly of statistical packages, software tools that support data mining have come to include a variety of advanced neural net and fuzzy logic solutions. As analytic systems get faster, real-time data feeds start to look like the next frontier for such software (see “Analyzing data in real time,” ADT, April 2002).

In fact, Wall Street brokers have been forging software-based “calculators and functions” for quite some time. Although out of the reach of mom-and-pop organizations, real-time analytical options may filter through more corporations in the years ahead.

The slew of possible math filters form a heady list. Included are: clustering, regression, decision trees and more (see “Mining gives meaning to complex data,” ADT, October 2000).

Who are the players? Like OLAP and BI before it, the data mining world attracts many companies. Some are long-time algorithmic specialists who may have enhanced their overall packages for use with today’s data marts. Some are companies that have broader data handling solutions to which they have added increasingly specialized analytical tools.

In the first group mark Numerical Algorithm Group, HNC Software, MathLab, SAS, SPSS, Data Distilleries and a large cast of others. In the second group place E.piphany, Angoss, Hyperion, Microsoft, Informatica, Business Objects S.A., MicroStrategy, Ascential, Cognos, IBM and a large cast of others.

As ever, students and professors will place their home-brewed algorithms at the disposal of others, and programmers in organizations will write these “free” components into their solutions. So managers should pencil in the freeware option when they make their data mining lineups -- in many cases, their developer team surely will.

ADT recently talked with a host of data mining players and tools users. What are the trends they see?

  • Well, neural processors may have moved from the realm of the lab into real-world uses for risk analysis and the like. There is no longer any real war between statistical and neural-type tools. However, some people are still uncomfortable with aspects of the neural mode. In some industries, users must know how results were obtained, and the “black box” neuron is out of place -- unless, perhaps, it is matched with a statistical method.
  • The tools are being used by a broader set of people, working on a broader set of data types. Notable among new types of data for integration is textual data. This unstructured data could come from support desk reports or complex Web searches.
  • Data mining is casting off its role as a standalone discipline. It is being integrated into overall business processes. “Remember,” a development manager told us, “it’s all about business-centric activity. Your results must feed back to a real business architecture. Data mining should be part of the overall business.”
  • Verticals have merit. Some vendors are very deliberately going at specific industries with specific solutions. IDC data access guru Henry Morris points to this as among the most vibrant of tributaries contributing to data mining growth. Insighful, Spotfire, HNC and Searchspace are among the players working this vein.

Trends aside, software developers face the classic problems of development when they venture into data mining. The whole view is important -- the data must be correctly prepared and kept current. The idea that we are ready for an age of data mining is based on the idea that companies have correctly set up data marts and warehouses -- that the infrastructure that allows fancier computing is in place.

And the powerful algorithms need to be integrated into user-friendly applications. Specialized algorithms fresh out of academia are just not written for commercial purposes, noted an end user.

Finally, there is a classic pitfall to avoid: Building a Taj Mahal to house a pizza stand. “Create a model that is as sophisticated as the problem requires -- but not more so,” said a data miner. Physicist Einstein and philosopher William of Occam would concur.

Data mining profiles

HNC Software Inc.
Informatica
IBM
SAS Institute
Spotfire Inc.
SPSS user -- Daniele Micci-Barreca of ClearCommerce

About the Author

Jack Vaughan is former Editor-at-Large at Application Development Trends magazine.