In-Depth

Data Warehousing Special Report: Data mining comes of age

A decade-long effort to build useful data marts and data warehouses has set the stage for a new generation of analytical application software tools that find patterns in the vast quantities of corporate data available for consumption. Some call it data mining.

At its heart are algorithms that pursue data, find trends and even predict events. Analyst firm IDC estimates the data mining tools market will reach $1.85 billion in 2006.

Data mining is not an altogether popular term, nor is it new. Its goal, critics point out, is the mining of "information." Finding nuggets in hills of data has long been an objective. Mining has always been the point of information access tools, OLAP data stages and the like.

But as data integration best practices become better known, and useful infrastructures are put in place, attention turns to the analytical tools that one applies to the data. Both successes and failures in the fast-moving CRM segment are driving interest in analytics.

Once composed mainly of statistical packages, software tools that support data mining now include various advanced neural net and fuzzy logic solutions. As analytic systems get faster, real-time data feeds look like the next frontier (see "Analyzing data in real time").

Wall Street brokers have been forging software-based "calculators and functions" for years. Although out of the reach of mom-and-pop organizations, real-time analytical options may filter through more corporations in the years ahead.

The slew of possible math filters form a heady list. Included are clustering, regression, decision trees and more (see "Mining gives meaning to complex data").

Like OLAP and BI before it, data mining attracts many companies. Some are long-time algorithmic specialists that have enhanced their overall packages for use with today's data marts. Some have broader data handling solutions to which they've added increasingly specialized analytical tools.

In the first group, mark Numerical Algorithm Group, HNC Software, MathLab, SAS, SPSS, Data Distilleries and a large cast of others. In the second group, place E.piphany, Angoss, Hyperion, Microsoft, Informatica, Business Objects S.A., MicroStrategy, Ascential, Cognos, IBM and many others.

As ever, students and professors will place their home-brewed algorithms at the disposal of others, and programmers will write these "free" components into their solutions. Managers should pencil in the freeware option when making their data mining lineups -- in many cases, their developer teams will.

Trends and pitfalls
ADT recently talked with a host of data mining players and tools users. What trends do they see?

Neural processors may have moved from the realm of the lab into real-world uses for risk analysis and the like. There is no longer any real war between statistical and neural-type tools. But some are still uncomfortable with aspects of the neural mode. In some industries, users must know how results were obtained, and the "black box" neuron is out of place unless it is matched with a statistical method.

Tools are being used by a broader set of people, working on a broader set of data types. Notable among new types of data for integration is textual data. This unstructured data could come from support desk reports or from complex Web searches.

Data mining is casting off its role as a standalone discipline and is being integrated into overall business processes. One development manager tells us, "It's all about business-centric activity. Your results must feed back to a real business architecture. Data mining should be part of the overall business."

Verticals have merit. Some vendors are very deliberately going at specific industries with specific solutions. IDC data access guru Henry Morris points to this as among the most vibrant of tributaries contributing to data mining growth. Insightful, Spotfire, HNC and Searchspace are among the players working this vein.

Trends aside, software developers face the classic problems of development when they venture into data mining. The whole view is important -- the data must be correctly prepared and kept current. The idea that we are ready for an age of data mining is based on the idea that companies have correctly set up data marts and warehouses -- that the infrastructure that allows fancier computing is in place.

And the powerful algorithms need to be integrated into user-friendly applications. Specialized algorithms fresh out of academia are just not written for commercial purposes, notes an end user.

Finally, there is a classic pitfall to avoid: Building a Taj Mahal to house a pizza stand. "Create a model that is as sophisticated as the problem requires -- but not more so," said a data miner. Physicist Einstein and philosopher William of Occam would concur.

For a more extensive version of this story, with profiles of data mining players and their tools, as well as tips from end users, go to http://www.adtmag.com/article.asp?id=6286.

About the Author

Jack Vaughan is former Editor-at-Large at Application Development Trends magazine.