In-Depth
Real Time Is the Right Time
- By Alan R. Earls
- August 1, 2005
Talking Points
NEED FOR SPEED
- There’s an undeniable trend toward faster and better access to ever-increasing
amounts of data. This one is evolutionary, more often based on existing
investments and familiar products.
- Instead of extract, transform and load, data warehouses need to do something
that is more like data integration, transform and load.
- Driven by demand for real time, ETL vendors have begun to move away from
the idea of microbatches to real time by exposing their ETL engines as a service.
Although the U.S. Air Force might be best known for its awe-inspiring fighter
jets and precision-guided rockets,
it’s also a vast global organization that depends on smooth and efficient
business practices. So, in the late 1990s, when the service was faced with pressures
to squeeze more performance from a massive $37 billion annual budget, the Air
Force hatched plans to create a global data warehouse backed by financial management
analysis and executive decision support tools called CRIS, short for the Commanders’
Resource Integration System.
|
Because the Air Force’s data was widely scattered among many systems,
the system engineers chose scale-out architecture based on independent modules
running on top of a SQL Server, according to David Reeves, marketing manager
at Teksouth, which built the system.
The system had to deliver all kinds of information to a huge and varied user
audience, and it had to do it as much as possible in real time. Today, in its
current refined form, CRIS provides for ad hoc queries as well as flash Web
views for managers who want a broad, up-to-date overview and extensive drill-down
capability. Most important, CRIS fulfills the mantra of faster and better, more
than doubling the number of users served between 2003 and 2004, and handling
a five-fold increase in individual queries while reducing response time to seconds.
Says Reeves, “What it boils down to is that because queries can now be
performed for pennies, in addition to being fast, this is also data warehousing
for the masses.” The Air Force’s CRIS is part of an undeniable trend
toward faster and better access to ever-increasing amounts of data. For many
organizations, the benefits are obvious.
That’s because unlike some other IT trends, which have turned out to be built
on shaky new technology, this one is evolutionary, more often based on existing
investments and familiar products. In short, it takes money and time, but it’s
not rocket science. (See related story, “On-demand
data points.”)
Right time is the right time
Still, cautionary voices point out that there can be some potentially daunting
complexities and problems. The goal isn’t necessarily on-demand or real-time
data access but right-time access—which highlights the notion the
business cycle is different in different enterprises and that data warehousing
speed need only match that cycle to
bring benefits. Therefore, daily data may be perfectly fine for some businesses
and anything faster a waste of money.
Of course, some industries move so fast that minutes count, and they must seek
out the best available technology,
regardless of cost. Noel-Levitz is good example of that. As a top player in
the highly competitive business serving higher education, faster is always better.
Tim Thein, senior VPat Noel-Levitz, explains his company uses its proprietary
information and modeling expertise to give its higher-ed clients the ability
to score student records on the fly, which helps them determine who among their
thousands of applicants is most likely to enroll, obviously
a key factor in admission decisions.
“One of the challenges in college admissions is being first in the mailbox,”
Thein says. Previously, Noel-Levitz’s
clients had to wait at least 72 hours before receiving a scored file of student
records, on which the mailings are based. Now, using software from SAS Institute
delivered through Web services, it only takes a matter of minutes to access
key data, making it possible for schools to communicate quicker with qualified
student candidates.
“These records can number into the hundreds of thousands, and SAS makes
it easy to not only predict potential student interest, but [also] to do so
much faster and more inexpensively than previously possible,” Thein says.
Thein’s experience also illustrates the way in which evolving business demands
and ever more capable technology are conspiring to rewrite the once straightforward
definitions that used to describe data warehousing. On-demand is one of a number
of related terms that describes a faster-better variant of yesterday’s typical
data warehouse. (See related story, “Words
to live by.”)
As Mark Robinson, a consultant at Greenbrier & Russel, points out: “Now
instead of extract, transform and load,
data warehouses need to do something that is more like data integration, transform
and load.”
One version of truth
What’s driving on-demand is the trend toward harnessing BI applications
and analysis tools to tactical, day-to-day
decisions, says Gary O’Connell, market manager for IBM Information Integration
Solutions (formerly Ascential).
“It used to be the data warehouse was an organization’s central
view and often the official version of the truth,” he says. On the other
hand, he adds, BI applications usually were not mission critical: if they stopped
running, it didn’t have an immediate impact on the business.
Now, O’Connell says, businesses are working to get a single view of data
and to make that view available for BI tools for operational as well as strategic
purposes.
“That is leading to a shift in how data warehouses are designed and even
in how organizations are operating,” he says.
Sense of urgency
Forrester analyst Phil Russom says although the concept may not be new, the
urgency is. “People are being driven by the need to react faster to information,”
he says. “It is the whole compression that business is experiencing.” Thus,
in the past, data warehousing tended to be focused on historic rather than operational
data. Refresh rates were low. Increasingly, however, daily refresh is the standard,
and hourly, or even more frequent, refresh rates are common.
Russom says the average data warehouse is not designed to receive a real- time
information push. So if a company
has a data warehouse and follows the common model of orienting it toward historical
information, the enterprise
can’t suddenly go to real time—it must redesign the warehouse, usually
by adding a whole new layer of processing
functionality, he says.
Although the field has its leading players, such as Teradata, the unique aspect
of on-demand is that anyone can
build it using familiar technologies. Jazzed-up Oracle, DB2 and SQL Server back
ends (as well as open-source alternatives) can become real time, focused with
the right investments.
“A real-time or right-time data warehouse is one in which the updates
to data are occurring in less time than the standard business cycle,”
says Mark Beyer, an analyst at Gartner. Beyer says businesses don’t need
to build anything faster than their standard business cycle, which he defines
as the time it takes to complete a successful transaction that generates revenue
or avoids cost.
Beyer says the differences in what constitutes real time can be seen in contrasting
examples of operating cycles. Battlefield situations and emergency response
activities are must be analyzed within seconds. Online furniture shopping normally
has a multi-day cycle while shoppers mull choices and consider options.
Doable on demand
Although there is a need for speed, the key is often extract, transform and
load functions. Driven by demand for real
time, Beyer says ETLvendors have begun to move away from the idea of microbatches
that simply use the same basic
batch processing engine. “Some are now moving toward honest real time,
such as Informatica, Ascential and possibly Pervasive and Ab Initio,”
he says. All are exposing their ETL engines as a service.
On-demand is entirely doable, Beyer says, but those moving to real time need
to recognize that architectural issues are paramount. For instance, if an enterprise
wants to analyze in real time—using up-to-the-minute source data—it
can’t force source systems to keep older data, too. “If you want
to be optimized and effective, you can’t say you must keep four years
of data,” he says.
To drive accuracy, Beyer suggests building an infrastructure that optimizes
data for accuracy based on an SLA. “In
the case of a database structure, I could choose to have a big table with all
the old historic information, and a small table that follows the same structure
but with just the newer data,” he says.
By the dashboard lights
The on-demand data warehouse might be thought of primarily as an offspring of
traditional data warehouses, but its
characteristics are also inviting other approaches. For instance, some organizations
are pursuing much the same vision using a dashboard. UNICCO Service, one of
North America’s largest integrated facilities services outsourcing companies
($700 million in annual sales), recently selected Bowstreet’s Corporate
Performance Suite for its executive dashboard. According to Bill Jenkins, senior
director of IT, the executive dashboard will drive UNICCO’s decision making
and provide customers with improved process visibility, too.
Jenkins says UNICCO is using the dashboard to access batch data, track balance
score card initiatives and identify
trends for top executives down. “It will provide information for decision
makers to help them better focus and to identify areas of opportunity,”
Jenkins says.
Eric Lofstrom, business intelligence manager at Quicken Loans, also uses a
dashboard for an on-demand data warehouse. Quicken Loans deployed SQL Server
for a 64-bit data warehouse to build development and production data marts.
It also employs SQLServer Analysis Services and Reporting Services for Real-Time
BI.
Lofstrom says when the project started about 3 years ago, it was not intended
to produce real-time results. “When we started, it was more of a traditional
BI project that pulled information from the main production system,” he says.
“As the project evolved, though, it gradually turned into a dashboard system.”
Quicken Loans approached data warehousing to yield an analytical point of view.
“As we built, we realized the types of information that we wanted to analyze
and that we also had a tactical component—we knew we wanted to analyze not only
the last 18 months but also the last 18 minutes,” he says.
At the same time, the production systems were getting clobbered by reporting
requests, he says. At first, Quicken
Loans looked at purchasing new hardware, Lofstrom says, but decided to look
instead at accelerating its BI system by running it during the day. At first,
it was once an hour, and then every half hour and then every 10 minutes. “Now
we are almost up to real time,” he says.
“We do a classic change data, capture method,” Lofstrom explains.
“We get data throughout the day. “If certain attributes, or the
status of the loan changes, we apply that information as it occurs.” The
source system posts the
change using HTTP to send it to the ETL engine.
The initial system rollout included about 1,500 users, but will eventually
roll out to 2,500 users, he says.
Right time to make the right decision
But as systems get faster, some, like TDWI’s
Wayne Eckerson, wonder if the human element will keep up. Information latency
may soon be a thing of the past, but what he calls “decision latency”
may remain. [Note: TDWI and ADTshare the same parent company.]
Indeed, he notes, as the last stop in the chain, the business user may just
turn into the biggest bottleneck. So, as users
move toward on-demand data warehousing, Eckerson says, enterprises need to look
at their business processes and
their people. It may be that on-demand data warehousing will only reach its
full potential when business processes are optimized and when business users
receive proper training, too.
Finally, Ray Johnson at consulting firm Greenbrier & Russel warns: “The closer
you get to real time, the more expensive it gets.”
Sidebar: On-demand data
points
Sidebar: Words to live
by
On ADTmag.com
Goods for the Last Drop
of Data
By Kathleen Ohlson
BI Tool Keeps Independence
Air Competitive
By Lana Gates
Kalido’s Master Data
Management Approach
By John K. Waters
PHOTO BY JAN CAUDRON/ANAKLASIS