In-Depth

Analyzing data in real time


In recent years, business intelligence systems have played pivotal roles in helping organizations to fine-tune business goals such as improving customer retention, market penetration, profitability and efficiency.

In most cases, these insights are driven by analyses of historic data. Therefore, it is human nature to pose the next logical question: If historic data can help us make better decisions going forward, could real-time data improve decision making here and now?

Trimac Transportation, a Calgary, Alberta-based company that is North America's largest freight hauler of commodity goods, has a business intelligence system that uses Cognos PowerPlay reporting to analyze "trip standards" data to ensure that its 3,300-truck fleet is utilized efficiently. While this helps Trimac make strategic decisions, it does not necessarily provide immediate answers about how to best use a specific truck that is going out half full or with a one-way load.

"It would take transaction data to another level," admitted Al Kobie, Trimac's director of operations development. Current information would have to be massaged with historic data and rules-based analyses to optimize the juggling of freight routings. Preferably, all this should be done while the company still has the customer's attention.

"If you could pull something like that off, it would be worth pretty big dollars," Kobie conceded.

Kobie's goal has proven elusive to most companies. Excluding scattered customer scoring and fraud-detection applications, real-time analytics are far from commonplace. Of course, the first question to ask is whether real-time analysis is absolutely necessary. Is there any information that has broken in the last five minutes that might change decisions about how a company should respond to a customer, revamp its sales and marketing tactics, or fine-tune operations?

For the City of Richmond, British Columbia, the answer is yes, for one easily understandable reason. Because the city is situated on a coastal island with an average elevation of three feet above sea level, it must know instantly whether its network of flood-control pumps are operating, how well and why. The answer was also yes for Bank of Montreal, which required real-time predictive analytics for fraud detection and credit scoring for customers phoning into its call centers.

If real-time analytics spell the difference between a city or company staying afloat or sinking, the next pertinent question is whether the analysis is best performed by an automated system or inside a business analyst's head. For the City of Richmond, the answer again was similar: a system-based solution was preferable to relying on humans alone. The city installed an Information Builders WebFocus reporting system, drawing instant trend analyses from a real-time database from the city's SCADA (supervisory control) system that manages the pumps.

However, for many companies, the relevance of real-time analytics is not so cut-and-dried. "The more sophisticated the analysis, the less real-time it may be because it might involve an approval chain," noted Peter Grambs, analytics practice leader for outsourcing firm Cognizant Technology Solutions Corp., Teaneck, N.J.

Furthermore, live production data might prove distracting. "Real-time analysis is overrated because real-time data is not necessarily clean data," said Rohit Amarnath, a New York-based consultant who specializes in building large OLAP cubes for global financial industry clients. Even if the data in question is accurate, he added, it might actually throw off the analysis because it may not have all the necessary pieces to provide a full picture.

More than speeds and feeds
What do we mean by real time anyway? The issue is clouded because it is easy to confuse real-time or interactive access to analytic data with analyses performed in real time using live production data.

For instance, many analytic systems already provide rapid access to trend information and reporting. In these cases, availability depends on policy decisions and available resources. In some organizations, the difference between whether analysis is available interactively or generated through a batch process later in the day may depend on the user's role and whether the request is made during a slow period or at a peak time.

Most analytic systems are engineered differently from transaction systems to improve responsiveness. It starts with denormalizing relational databases using star schema, which add numerous summary tables to make the right information more accessible. In extreme cases, where analyses are extremely complex, but predictable, multidimensional OLAP databases, or "cubes," are designed to pre-package specific data views, fast. And where the analyses require an iterative approach just to spot hidden patterns, vendors such as IBM, SAS Institute and others have designed tools that make data mining easier to non-statisticians, as well as faster to run.

And if producing analyses rapidly is not enough, new tools are emerging to help analysts rapidly deal with the glut of information. Based on research conducted in U.S. Air Force plane cockpits, Oakland, Calif.-based iSpheres has developed a tool that provides, in essence, real-time analysis of analytic data. The firm's tool checks for exception conditions and redirects users to the right data at the right moment. Currently deployed with Hyperion Essbase OLAP cubes, the iSpheres system is now used to help analysts in financial services and energy trading.

When it comes to building data warehouses and repositories of analytic data, the processes for populating them have accelerated. Techniques such as parallel processing, multithreading, streaming, pipelining and, in some cases, continuous trickle feeds from operational data stores or original transaction system sources, are populating data warehouses faster and, indirectly, making analytic systems more current.

But fast and current does not necessarily equate with real time. Regardless of whether analytic systems exploit any of these technologies, most rely on historic, not live, production data. That data is typically extracted from production systems during batch windows, usually at about 3:00 in the morning. So the question of real-time analytics revolves around the issue of whether it is necessary to supplement these managed query environments, star schema databases or OLAP cubes with applications that also draw some real-time transactional data into the mix.

But could this be technology searching for a problem?

"A couple of years ago, our customers began pushing us on the latency issue," recounted Faisal Shah, chief technology officer at Chicago-based analytic system consulting firm Knightsbridge Solutions LLC. Probably the most obvious app is real-time credit scoring or fraud detection, where the need exists to perform an event-driven sophisticated analysis, such as when a customer phones a call center.

Tread gently
Building analytic systems that drill down to live transaction systems remains easier said than done. A modest approach is to embellish the existing transaction application with simple analytics, such as adding a profitability analysis to a customer history screen in a CRM app. Packaged applications such as Siebel, SAP, Oracle and PeopleSoft are increasingly adding modest analytic capabilities to supplement their core systems.

But if the goals are more ambitious, the live production database should be touched as little as possible. Theoretically, such a system would either replicate current transaction data to a separate, mirrored data store, or extract data quickly and efficiently to minimize the impact on the transaction database, moving it to a repository or operational data store. If you are considering touching a live system, indexing is critical, according to Richard Winter, president of Winter Corp., a Boston-based consulting firm specializing in very large databases. "Some databases can do a lot [of indexing], but not necessarily while they are updating," he warned.

But even if your transaction database can do the equivalent of walking and chewing gum at the same time, the rest of the infrastructure must also measure up. "You'll have to have robust hardware and, in some cases, you might need to mirror it," said Mike Murphy, systems manager at Chicago-based telecom provider Alltel, which operates fraud detection and other real-time analytics applications from consolidated online network operational data.

Knightsbridge's Shah added that, even if the analytic infrastructure is designed to fuse transaction and analysis, there is the question of maintaining or balancing service-level requirements. By nature, transaction systems are supposed to be live whenever the business is open, which could mean 24x7 for many large enterprises. However, because analytic systems are by nature far more complex, they typically require significant downtime to aggregate data and regenerate themselves.

Bridging the varying operational behaviors of the transaction and analytic worlds proves challenging and costly, Shah noted. "As these disparate systems get connected, service levels become hard to maintain, especially when it comes to budgets and costs. The first thing you may have to do is re-address service levels or reengineer systems."

Dividing the labor
Combining the best of both worlds -- the updates of current transaction data with the insights gained from analyzing historic trends -- requires compromises that avoid bringing systems to their knees. Shah suggests limiting live updates to well-contained, narrow slices of production data, and performing most of the modeling ahead of time to narrow the choices down when the final analysis is performed.

That is the strategy employed by two large banking institutions that are currently implementing applications that identify, in real time, which products to sell to existing accountholders while they are talking to the bank's call centers.

For instance, at Bank of Montreal, a new call center app is being developed that will help customer service representatives identify additional bank products or services to sell to existing customers. Such applications, often described as "cross-selling" or "up-selling," are also being used and extended at Postbank, a leading Dutch retail bank with more than 7 million customers, where a call center analytic application is currently under development. In addition, Bank of Montreal is using its system to make real-time decisions regarding potential credit card fraud.

At Bank of Montreal, the system will perform a mix of online and offline analysis. On a periodic basis, the bank runs data mining processes on its highly distributed, DB2-based data warehouse, residing on a massively parallel IBM SP2 server, that is fed data from the bank's various account-based transaction sources. The mining routines analyze aggregate patterns in customer behavior to identify which sets of products and services are penetrating which demographic groups, and generate models that score factors such as the customer's propensity to buy and whether selling such products or services would meet bank profitability targets. Then, when the customer calls, an analysis is run to identify the appropriate demographic category that, in turn, returns screens to the call center representative showing which offerings to promote and in which order.

Similarly, at Postbank, an analytic app based on DD Series, a rules engine provided by Netherlands-based DataDistilleries, intercepts data from the Siebel CRM system, which provides a unified view of the customer and runs a similar form of scoring. Built using many of the same models used for an earlier direct mail application -- which improved sales conversion rates by 25% -- most of the heavy lifting is performed offline. Those models balance factors such as the propensity of customer groups to buy specific categories of products against the customers' current account balances, credit history, and whether the bank could realistically make a decent profit selling the given product or service to those customers. Models that determine product sales preferences by customer demographic are run periodically, on a frequency ranging from a few times a year to every month or two. The results of those runs are stored in a DB2 database that is heavily indexed.

Nonetheless, noted Ronald Oudendijk, vice president of Postbank's database marketing center, while the call center heavily leveraged the knowledge base built for the direct mail application, the real-time nature of phone interactions makes structuring the models far more complex. "There is a strong dependency between the models," he noted, adding, "if there is something unusual about the account balance or activity pattern, this provides a more challenging situation." And, he noted, while a call center representative is on the phone with the customer, there is less chance to second-guess incorrect assumptions.

The value of production data
Is the addition of current transaction data worth all the headaches? The answers are lukewarm. "You cannot build predictive models on primary data, even if the data is cleansed and all in one place," noted Jan Mrazek, who formerly directed the Bank of Montreal customer profiling app and has since joined Toronto-based consulting firm Adastra Corp. as president. "The whole idea of analysis is looking at the big picture, not just what's happened over the past five minutes," he said.

But for consultant Amarnath, who has developed more than 100 OLAP cubes for his current client, it makes sense to close the business cycle before aggregating the data. Holding off is even more critical, since some of the cubes developed take hours to populate, coming from data sources as large as 30 million rows. Although his client does not need live data, they do want it as current as possible. "We are working toward updating the cubes three times per day so each world region can see the previous shift," he said.

But for a small, select group of organizations where large chunks of revenue may be at stake, it can be worthwhile to add modest real-time elements, such as adjusting a customer profile based on a current transaction. The key is having a large enough number of customer transactions to make a real difference. But do not get carried away, suggests Adastra's Mrazek. "Some vendors of real-time decisioning may try to convince you that you need to keep rebuilding models and recalculating scores all the time, but that's nonsense," he said, noting that a mix of offline batch processes can be used to perform 95% of the work.

For Postbank, the solution involved running most of the analyses offline and limiting real-time work to data for individual customers, whose records are retrieved from a Siebel CRM system that runs against a pre-selected subset of models. "We don't poll the entire database, we only use narrow snapshots," said Marcel Holsheimer, president of DataDistilleries, Postbank's analytics vendor.

Stated coldly, the decision about whether to conduct real-time analysis is in direct relationship to what the customer is worth to the business. "For a small minority of customers who undergo marked changes in behavior, you need to ask the question whether to wait until the end of the month to recalculate everything," said Mrazek, who warned, "by then, the customer might be gone."

About the Author

Tony Baer is principal with onStrategies, a New York-based consulting firm, and editor of Computer Finance, a monthly journal on IT economics. He can be reached via e-mail at [email protected].