In-Depth

ROI: Data mining's bottom line

Easier-to-use and embeddable data mining tools carrying shrinking price tags enable many more IT organizations to sift through complex data for kernels of gold.

Data mining tools first came on the scene about five years ago as a way to extract pertinent data from data warehouses. Prior to the emergence of these types of tools, finding the data you wanted required knowing the right questions to send to the data warehouse. Data mining tools, however, look at data statistically and notice certain things that would take a human too many hours to track.

In the past, the problem with data mining tools was their cost. Although seemingly useful tools, companies could not afford their up-front expense. And because data mining tools employ mathematical algorithms to track data, they were difficult for non-technical end users to operate, often requiring someone with a mathematical and/or statistical background to handle them.

This has changed as more tools have entered the market, become easier to use and dropped in price. Data mining tools today are priced anywhere from $2,000 to $500,000, with good tools available at both ends of the spectrum and everywhere in between. The difference among the tools is in what they can do, the size of the database they can work with and the complexity of the problems they can handle.

Regardless of price and complexity, the tools are all cost-effective. "In data mining, the payoffs are very clear," noted data management guru Herb Edelstein, president of Two Crows Corp., Potomac, Md. "You can predict your payoff and measure whether you got it."

Aaron Zornes, research director for application delivery strategies at Meta Group Inc., Burlingame, Calif., agrees that data mining tools are cost-effective in terms of risk avoidance and determining when customers will "turn." "They're good at predicting customer behaviors," said Zornes. "They're most effective where you have millions of customers and are competing on the basis of a sliver of a percentage point," such as retail grocers and telecommunications companies. "Every little sliver then drops to the bottom line," he added.

Reaping such an advantage has data mining tools growing in popularity. With e-commerce as significant as it is today, companies are using data mining tools to provide them with a return on investment that can keep them ahead of the game. For example, Minnetonka, Minn.-based Fingerhut Companies Inc. spent more than $1 million building a solution to optimize the sequence of the 35 different catalogs it sends to its 6 million customers. The project has resulted in $3.5 million of annual profits for the mail-order merchandise seller.

Because of its extensive customer base and the number of catalogs it mails out, Fingerhut thought its efforts might not be as profitable as they appeared. "There were trillions of combinations of catalogs that could go to those millions of customers," explained Randy Erdahl, director of business intelligence.

Using Orchestrate from Torrent Systems Inc., Cambridge, Mass., Fingerhut is able to spread the computation process across four simultaneously running nodes. The company updates all its data from the previous week's customer activity, runs that data through the system, scores 6 million customers on 35 catalogs, and chooses which catalogs to mail to which customers and in which sequence - all in a three-day process. Prior to adopting Orchestrate, the process took 22 days.

Orchestrate is an application environment for building, deploying and managing large-scale applications. It is basically an application framework and a set of component libraries. It allows companies to build in one environment and execute or run on between one and 400 processors without changing a line of code. "You don't have to write logic for a neural network," explained Rob Utzschneider, Torrent's founder. "It's just piecing together the Lego building blocks you need to do data mining, but in an environment that will run a massive data environment."

IBM Corp. is also realizing a return on investment from data mining tools. When the company's yield dropped on its microprocessor production, it put data mining tools to work to figure out why.

"Data mining looked at the whole [production process from] end to end and showed us correlations we couldn't see as humans," noted Dan Graham, IBM's business intelligence solutions executive in the Data Management Division, Somers, N.Y. "When this and that happened, that's when the yield was down. It saved us millions. When you do [data mining] right, the return on investment is unbelievable."

Scoring for dollars
Data mining consists of two major parts: building models and executing them. Building the models is the easy part. Executing them is a bit more complex.

The Bank of Montreal in Toronto relies on IBM's DB2 Intelligent Miner Scoring to execute models. Intelligent Miner Scoring uses Predictive Mining Modeling Language (PMML), an XML-based language that provides a vendor-independent method of defining predictive models. It also allows companies to rank customers according to a set of pre-determined criteria.

The bank uses Intelligent Miner Scoring to track changes in customer profiles, such as those who recently purchased certain products—indicating a likelihood they might be interested in a new product—or those who closed several accounts in one week—indicating they are about to turn. When there is a change in a customer profile, Intelligent Miner Scoring allows users to recalculate the score without waiting for a report at the end of the month. "Each time something changes on one customer profile, you can immediately rerun scores," explained Jan Mrazek, senior manager of Business Intelligence Solutions at Bank of Montreal.

Intelligent Miner Scoring is an extender to DB2 and works directly from the relational database, which helps to speed up the data mining process. The scoring, the determination of which customers are most likely to respond to some form of marketing, for example, is integrated into the database management system itself.

Putting the industry-standard model inside the database makes it available for near real-time access to data mining results, noted IBM's Graham. "Once you've created a cluster analysis or a cross-sell analysis model, you don't have to put a score on every customer every night. Instead, you can wait until a customer calls and get the score then," he said.

"Applied to the right business problems," Graham continued, "you're looking at a 100% to 500% return on investment. If you can drive a new decision point that was never possible before, it's truly a competitive advantage and cost-effective."

The Bank of Montreal has found this to be true. It can build a new model and get a score faster and smoother, the firm's Mrazek said. "The return on investment comes from better responses, maintenance of the client base and likely higher customers."

What makes Intelligent Miner Scoring so appealing is its integration with DB2. Data miners want integration. It makes their jobs easier. But most data miners, including Bank of Montreal and Fingerhut, use a number of tools to get the job done.

Fingerhut uses five or six data mining tools other than Orchestrate and has a library of more than 200 models that have been built for different kinds of catalogs or offers. "We have six or seven different types of analytical tools we'll call into play depending on the situation," said Fingerhut's Erdahl. "No single tool or technique works the best in every single case. Using a number of tools, we've learned the best ones to use in certain circumstances."

Integrating data mining with mainstream applications is becoming an important part of e-commerce applications. Meta Group's Zornes noted that data mining tools are difficult to sell as standalone tools; they are better as part of a solution.

"Customers don't want data mining toolkits to be separate from other parts of their application infrastructure," added IBM's Graham. "Clients want us to embed data mining algorithms—the process—into other applications. We don't want a complete, separate tool bench as much as we want data mining to do its magic."

Torrent Systems has found the same truth. "Exploitation of data mining techniques will not really start to take off until packaged applications have these techniques embedded in them," said the firm's Utzschneider.

Data mining tools of the future will be embedded in larger packages. Not only does this make data mining a quicker process, it also makes it an easier product for the typical user, who does not have a doctorate in mathematics or statistics, to use. "Products are increasingly trying to simplify the process and make the process part of the product," said Two Crows' Edelstein. "People are taking a more systematic approach to solving data mining problems and that makes it easier to use."

In addition to embedding data mining tools into packaged applications, we can look for data mining analytics to be integrated with analytic and reporting mechanisms. A telecommunications company, for example, could track which customers are likely to turn. With the integration of analytics and reporting with data mining, the company's system could produce a report every night listing customers that are most likely to leave, thus allowing the firm to offer a pricing discount or something similar in an effort to keep those people.

That is the real bread and butter of data mining tools—being able to predict customers' behaviors and responses before they happen. And that, according to Edelstein, is what data mining is all about. For him, data mining is simply making predictions. "It's not about algorithms. It's not about describing or exploring data, although it requires these. Data mining is the tools and techniques for making good predictions," he noted.

Peeling back the onion
Jeffrey Canter, executive vice president of operations at Pittsburgh-based Innovative Systems Inc., has his own description of data mining. According to Canter, data mining is "the process of peeling back the onion in some sort of data repository in order to get to that one kernel of information that's buried somewhere in there that is going to drive your decision-making process. It's poring over what is clearly a sea of information and finding the little nuggets that will drive some sort of decision-making process with respect to the customer." It is about making decisions and making them in a way that retains customers and keeps a company profitable.

Moving forward we will also see more emphasis on making good predictions in real time. "What these companies want to do is to be able to deliver targeted content to a customer when [they are] at the [Web] site," said Torrent Systems' Utzschneider. We will also see a movement toward real-time and continuous processing. "A Web site is very dynamic. Subject matter changes," he added. "We need to be able to build these models constantly, so we understand how visitor behavior is changing, and do a much better job of scoring records and delivering the right content to the right visitors."

Something else in the path ahead is mining streaming video. "At this point, there's a lot of text information and very little video information," noted Two Crows' Edelstein. "What happens when we go to videoconferencing?" As that continues to gain in significance, people will want to better understand where their resources lie and be able to analyze and predict where their research is taking them. Being able to find vital information in videoconferences in addition to budgets and products could prove very beneficial to retaining customers.

As data volumes continue to grow, companies will rely on data mining tools to sift through stores of information and find the nuggets that will make them the most profitable. Data mining tools are proving over and over again that they have the ability and the robustness to help companies retain customers, thus paying for themselves in the long run. Regardless of which tools you choose, you are almost guaranteed a return on investment.