In-Depth
Data Warehousing Special Report: Data mining comes of age
- By Jack Vaughan
- May 1, 2002
A decade-long effort to build useful data marts and data warehouses has set
the stage for a new generation of analytical application software tools that
find patterns in the vast quantities of corporate data available for consumption.
Some call it data mining.
At its heart are algorithms that pursue data, find trends and even predict
events. Analyst firm IDC estimates the data mining tools market will reach $1.85
billion in 2006.
Data mining is not an altogether popular term, nor is it new. Its goal, critics
point out, is the mining of "information." Finding nuggets in hills
of data has long been an objective. Mining has always been the point of information
access tools, OLAP data stages and the like.
But as data integration best practices become better known, and useful infrastructures
are put in place, attention turns to the analytical tools that one applies to
the data. Both successes and failures in the fast-moving CRM segment are driving
interest in analytics.
Once composed mainly of statistical packages, software tools that support data
mining now include various advanced neural net and fuzzy logic solutions. As
analytic systems get faster, real-time data feeds look like the next frontier
(see "Analyzing data
in real time").
Wall Street brokers have been forging software-based "calculators and
functions" for years. Although out of the reach of mom-and-pop organizations,
real-time analytical options may filter through more corporations in the years
ahead.
The slew of possible math filters form a heady list. Included are clustering,
regression, decision trees and more (see "Mining
gives meaning to complex data").
Like OLAP and BI before it, data mining attracts many companies. Some are long-time
algorithmic specialists that have enhanced their overall packages for use with
today's data marts. Some have broader data handling solutions to which they've
added increasingly specialized analytical tools.
In the first group, mark Numerical Algorithm Group, HNC Software, MathLab,
SAS, SPSS, Data Distilleries and a large cast of others. In the second group,
place E.piphany, Angoss, Hyperion, Microsoft, Informatica, Business Objects
S.A., MicroStrategy, Ascential, Cognos, IBM and many others.
As ever, students and professors will place their home-brewed algorithms at
the disposal of others, and programmers will write these "free" components
into their solutions. Managers should pencil in the freeware option when making
their data mining lineups -- in many cases, their developer teams will.
Trends and pitfalls
ADT recently talked with a host of data mining players and tools users.
What trends do they see?
Neural processors may have moved from the realm of the lab into real-world
uses for risk analysis and the like. There is no longer any real war between
statistical and neural-type tools. But some are still uncomfortable with aspects
of the neural mode. In some industries, users must know how results were obtained,
and the "black box" neuron is out of place unless it is matched with
a statistical method.
Tools are being used by a broader set of people, working on a broader set of
data types. Notable among new types of data for integration is textual data.
This unstructured data could come from support desk reports or from complex
Web searches.
Data mining is casting off its role as a standalone discipline and is being
integrated into overall business processes. One development manager tells us,
"It's all about business-centric activity. Your results must feed back
to a real business architecture. Data mining should be part of the overall business."
Verticals have merit. Some vendors are very deliberately going at specific
industries with specific solutions. IDC data access guru Henry Morris points
to this as among the most vibrant of tributaries contributing to data mining
growth. Insightful, Spotfire, HNC and Searchspace are among the players working
this vein.
Trends aside, software developers face the classic problems of development
when they venture into data mining. The whole view is important -- the data
must be correctly prepared and kept current. The idea that we are ready for
an age of data mining is based on the idea that companies have correctly set
up data marts and warehouses -- that the infrastructure that allows fancier
computing is in place.
And the powerful algorithms need to be integrated into user-friendly applications.
Specialized algorithms fresh out of academia are just not written for commercial
purposes, notes an end user.
Finally, there is a classic pitfall to avoid: Building a Taj Mahal to house
a pizza stand. "Create a model that is as sophisticated as the problem
requires -- but not more so," said a data miner. Physicist Einstein and
philosopher William of Occam would concur.
For a more extensive version of this story, with profiles of data mining players
and their tools, as well as tips from end users, go to
http://www.adtmag.com/article.asp?id=6286.
About the Author
Jack Vaughan is former Editor-at-Large at Application Development Trends magazine.