In-Depth
Analyzing data in real time
- By Tony Baer
- April 1, 2002
In recent years, business intelligence systems have played pivotal roles in
helping organizations to fine-tune business goals such as improving customer
retention, market penetration, profitability and efficiency.
In most cases, these insights are driven by analyses of historic data. Therefore,
it is human nature to pose the next logical question: If historic data can help
us make better decisions going forward, could real-time data improve decision
making here and now?
Trimac Transportation, a Calgary, Alberta-based company that is North America's
largest freight hauler of commodity goods, has a business intelligence system
that uses Cognos PowerPlay reporting to analyze "trip standards" data
to ensure that its 3,300-truck fleet is utilized efficiently. While this helps
Trimac make strategic decisions, it does not necessarily provide immediate answers
about how to best use a specific truck that is going out half full or with a
one-way load.
"It would take transaction data to another level," admitted Al Kobie,
Trimac's director of operations development. Current information would have
to be massaged with historic data and rules-based analyses to optimize the juggling
of freight routings. Preferably, all this should be done while the company still
has the customer's attention.
"If you could pull something like that off, it would be worth pretty big
dollars," Kobie conceded.
Kobie's goal has proven elusive to most companies. Excluding scattered customer
scoring and fraud-detection applications, real-time analytics are far from commonplace.
Of course, the first question to ask is whether real-time analysis is absolutely
necessary. Is there any information that has broken in the last five minutes
that might change decisions about how a company should respond to a customer,
revamp its sales and marketing tactics, or fine-tune operations?
For the City of Richmond, British Columbia, the answer is yes, for one easily
understandable reason. Because the city is situated on a coastal island with
an average elevation of three feet above sea level, it must know instantly whether
its network of flood-control pumps are operating, how well and why. The answer
was also yes for Bank of Montreal, which required real-time predictive analytics
for fraud detection and credit scoring for customers phoning into its call centers.
If real-time analytics spell the difference between a city or company staying
afloat or sinking, the next pertinent question is whether the analysis is best
performed by an automated system or inside a business analyst's head. For the
City of Richmond, the answer again was similar: a system-based solution was
preferable to relying on humans alone. The city installed an Information Builders
WebFocus reporting system, drawing instant trend analyses from a real-time database
from the city's SCADA (supervisory control) system that manages the pumps.
However, for many companies, the relevance of real-time analytics is not so
cut-and-dried. "The more sophisticated the analysis, the less real-time
it may be because it might involve an approval chain," noted Peter Grambs,
analytics practice leader for outsourcing firm Cognizant Technology Solutions
Corp., Teaneck, N.J.
Furthermore, live production data might prove distracting. "Real-time
analysis is overrated because real-time data is not necessarily clean data,"
said Rohit Amarnath, a New York-based consultant who specializes in building
large OLAP cubes for global financial industry clients. Even if the data in
question is accurate, he added, it might actually throw off the analysis because
it may not have all the necessary pieces to provide a full picture.
More than speeds and feeds
What do we mean by real time anyway? The issue is clouded because it is easy
to confuse real-time or interactive access to analytic data with analyses performed
in real time using live production data.
For instance, many analytic systems already provide rapid access to trend information
and reporting. In these cases, availability depends on policy decisions and
available resources. In some organizations, the difference between whether analysis
is available interactively or generated through a batch process later in the
day may depend on the user's role and whether the request is made during a slow
period or at a peak time.
Most analytic systems are engineered differently from transaction systems to
improve responsiveness. It starts with denormalizing relational databases using
star schema, which add numerous summary tables to make the right information
more accessible. In extreme cases, where analyses are extremely complex, but
predictable, multidimensional OLAP databases, or "cubes," are designed
to pre-package specific data views, fast. And where the analyses require an
iterative approach just to spot hidden patterns, vendors such as IBM, SAS Institute
and others have designed tools that make data mining easier to non-statisticians,
as well as faster to run.
And if producing analyses rapidly is not enough, new tools are emerging to
help analysts rapidly deal with the glut of information. Based on research conducted
in U.S. Air Force plane cockpits, Oakland, Calif.-based iSpheres has developed
a tool that provides, in essence, real-time analysis of analytic data. The firm's
tool checks for exception conditions and redirects users to the right data at
the right moment. Currently deployed with Hyperion Essbase OLAP cubes, the iSpheres
system is now used to help analysts in financial services and energy trading.
When it comes to building data warehouses and repositories of analytic data,
the processes for populating them have accelerated. Techniques such as parallel
processing, multithreading, streaming, pipelining and, in some cases, continuous
trickle feeds from operational data stores or original transaction system sources,
are populating data warehouses faster and, indirectly, making analytic systems
more current.
But fast and current does not necessarily equate with real time. Regardless
of whether analytic systems exploit any of these technologies, most rely on
historic, not live, production data. That data is typically extracted from production
systems during batch windows, usually at about 3:00 in the morning. So the question
of real-time analytics revolves around the issue of whether it is necessary
to supplement these managed query environments, star schema databases or OLAP
cubes with applications that also draw some real-time transactional data into
the mix.
But could this be technology searching for a problem?
"A couple of years ago, our customers began pushing us on the latency
issue," recounted Faisal Shah, chief technology officer at Chicago-based
analytic system consulting firm Knightsbridge Solutions LLC. Probably the most
obvious app is real-time credit scoring or fraud detection, where the need exists
to perform an event-driven sophisticated analysis, such as when a customer phones
a call center.
Tread gently
Building analytic systems that drill down to live transaction systems remains
easier said than done. A modest approach is to embellish the existing transaction
application with simple analytics, such as adding a profitability analysis to
a customer history screen in a CRM app. Packaged applications such as Siebel,
SAP, Oracle and PeopleSoft are increasingly adding modest analytic capabilities
to supplement their core systems.
But if the goals are more ambitious, the live production database should be
touched as little as possible. Theoretically, such a system would either replicate
current transaction data to a separate, mirrored data store, or extract data
quickly and efficiently to minimize the impact on the transaction database,
moving it to a repository or operational data store. If you are considering
touching a live system, indexing is critical, according to Richard Winter, president
of Winter Corp., a Boston-based consulting firm specializing in very large databases.
"Some databases can do a lot [of indexing], but not necessarily while they
are updating," he warned.
But even if your transaction database can do the equivalent of walking and
chewing gum at the same time, the rest of the infrastructure must also measure
up. "You'll have to have robust hardware and, in some cases, you might
need to mirror it," said Mike Murphy, systems manager at Chicago-based telecom
provider Alltel, which operates fraud detection and other real-time analytics
applications from consolidated online network operational data.
Knightsbridge's Shah added that, even if the analytic infrastructure is designed
to fuse transaction and analysis, there is the question of maintaining or balancing
service-level requirements. By nature, transaction systems are supposed to be
live whenever the business is open, which could mean 24x7 for many large enterprises.
However, because analytic systems are by nature far more complex, they typically
require significant downtime to aggregate data and regenerate themselves.
Bridging the varying operational behaviors of the transaction and analytic
worlds proves challenging and costly, Shah noted. "As these disparate systems
get connected, service levels become hard to maintain, especially when it comes
to budgets and costs. The first thing you may have to do is re-address service
levels or reengineer systems."
Dividing the labor
Combining the best of both worlds -- the updates of current transaction data
with the insights gained from analyzing historic trends -- requires compromises
that avoid bringing systems to their knees. Shah suggests limiting live updates
to well-contained, narrow slices of production data, and performing most of
the modeling ahead of time to narrow the choices down when the final analysis
is performed.
That is the strategy employed by two large banking institutions that are currently
implementing applications that identify, in real time, which products to sell
to existing accountholders while they are talking to the bank's call centers.
For instance, at Bank of Montreal, a new call center app is being developed
that will help customer service representatives identify additional bank products
or services to sell to existing customers. Such applications, often described
as "cross-selling" or "up-selling," are also being used
and extended at Postbank, a leading Dutch retail bank with more than 7 million
customers, where a call center analytic application is currently under development.
In addition, Bank of Montreal is using its system to make real-time decisions
regarding potential credit card fraud.
At Bank of Montreal, the system will perform a mix of online and offline analysis.
On a periodic basis, the bank runs data mining processes on its highly distributed,
DB2-based data warehouse, residing on a massively parallel IBM SP2 server, that
is fed data from the bank's various account-based transaction sources. The mining
routines analyze aggregate patterns in customer behavior to identify which sets
of products and services are penetrating which demographic groups, and generate
models that score factors such as the customer's propensity to buy and whether
selling such products or services would meet bank profitability targets. Then,
when the customer calls, an analysis is run to identify the appropriate demographic
category that, in turn, returns screens to the call center representative showing
which offerings to promote and in which order.
Similarly, at Postbank, an analytic app based on DD Series, a rules engine
provided by Netherlands-based DataDistilleries, intercepts data from the Siebel
CRM system, which provides a unified view of the customer and runs a similar
form of scoring. Built using many of the same models used for an earlier direct
mail application -- which improved sales conversion rates by 25% -- most of
the heavy lifting is performed offline. Those models balance factors such as
the propensity of customer groups to buy specific categories of products against
the customers' current account balances, credit history, and whether the bank
could realistically make a decent profit selling the given product or service
to those customers. Models that determine product sales preferences by customer
demographic are run periodically, on a frequency ranging from a few times a
year to every month or two. The results of those runs are stored in a DB2 database
that is heavily indexed.
Nonetheless, noted Ronald Oudendijk, vice president of Postbank's database
marketing center, while the call center heavily leveraged the knowledge base
built for the direct mail application, the real-time nature of phone interactions
makes structuring the models far more complex. "There is a strong dependency
between the models," he noted, adding, "if there is something unusual
about the account balance or activity pattern, this provides a more challenging
situation." And, he noted, while a call center representative is on the
phone with the customer, there is less chance to second-guess incorrect assumptions.
The value of production data
Is the addition of current transaction data worth all the headaches? The answers
are lukewarm. "You cannot build predictive models on primary data, even
if the data is cleansed and all in one place," noted Jan Mrazek, who formerly
directed the Bank of Montreal customer profiling app and has since joined Toronto-based
consulting firm Adastra Corp. as president. "The whole idea of analysis
is looking at the big picture, not just what's happened over the past five minutes,"
he said.
But for consultant Amarnath, who has developed more than 100 OLAP cubes for
his current client, it makes sense to close the business cycle before aggregating
the data. Holding off is even more critical, since some of the cubes developed
take hours to populate, coming from data sources as large as 30 million rows.
Although his client does not need live data, they do want it as current as possible.
"We are working toward updating the cubes three times per day so each world
region can see the previous shift," he said.
But for a small, select group of organizations where large chunks of revenue
may be at stake, it can be worthwhile to add modest real-time elements, such
as adjusting a customer profile based on a current transaction. The key is having
a large enough number of customer transactions to make a real difference. But
do not get carried away, suggests Adastra's Mrazek. "Some vendors of real-time
decisioning may try to convince you that you need to keep rebuilding models
and recalculating scores all the time, but that's nonsense," he said, noting
that a mix of offline batch processes can be used to perform 95% of the work.
For Postbank, the solution involved running most of the analyses offline and
limiting real-time work to data for individual customers, whose records are
retrieved from a Siebel CRM system that runs against a pre-selected subset of
models. "We don't poll the entire database, we only use narrow snapshots,"
said Marcel Holsheimer, president of DataDistilleries, Postbank's analytics
vendor.
Stated coldly, the decision about whether to conduct real-time analysis is
in direct relationship to what the customer is worth to the business. "For
a small minority of customers who undergo marked changes in behavior, you need
to ask the question whether to wait until the end of the month to recalculate
everything," said Mrazek, who warned, "by then, the customer might
be gone."
About the Author
Tony Baer is principal with onStrategies, a New York-based consulting firm, and editor of Computer Finance, a monthly journal on IT economics. He can be reached via
e-mail at [email protected].