In-Depth
Q&A: Eckerson measures state of data warehousing
Wayne Eckerson has been in the middle of the data warehousing business
for years as director of education and research at the well-respected Data
Warehousing Institute in Seattle. Wayne sat down over dinner recently with
ADT
Editor-at-Large Jack Vaughan to answer some questions on the state
of the industry.
Q: Why has data warehousing stuck around? It was a big-ticket item
and there were a lot of bad experiences. In this downturn, it seems like the
people closest to the data are getting most of the budgets. Your
comments?
A: I think if you asked anyone who started out with data
warehousing in the mid-'90s, when it came to the forefront whether it would
still be here -- whether the Institute would still be around in 2002 -- I don't
think anyone would have said that it would still be here. I think a lot of our
faculty are still surprised that we're running four conferences a year and
getting 500, 600 or 700 people there each time. It is amazing. What has happened
is that data warehousing has indeed become part of the IT infrastructure. It
really is, in my mind, a complement to the transaction processing infrastructure
that we built during the '70s and '80s. And during the '90s and the next decade,
we're building the information insight infrastructure. When you look back on it,
it makes a lot of sense to me.
Q: Is it true that there's a distinct data person or data warehouse
person? How do they fit in within a larger organization?
A: We just
published our salary survey, The Salary Roles and Responsibility
Report. There are a lot of different roles that people are playing. The
primary role -- our primary audience -- is data warehousing project managers.
Q: It seems that people are still trying to mine their data -- find
gold -- but are still sending out flyers that go to people that are
deceased.
A: Well, we haven't perfected the art of leveraging our
information yet. We still have a long way to go, which is good news for the data
warehousing and business intelligence (BI) market. I use the term business
intelligence because it's a broader term that encompasses the data warehousing
infrastructure as well as the analytical tools and applications that run on top
of those two things. The BI market, from a Wall Street perspective, has done
better than most other IT markets in the last two years. It's up 2% if you look
at the publicly traded companies and track their revenue. But the CRM market,
ERP market and other IT markets are experiencing significant downturns compared
to BI. BI is a hotter market because people understand that in a competitive
market, or in an economy that's in a downturn, information and insight about
customers, supply chain, operations and employees can spell the difference. It
can drive significant costs out of your budget, and it can bring additional
revenue in to the tune of millions and millions of dollars.
Q: A couple of years ago, people were focused on visible failures.
Have we gotten over that, and did it really happen?
A: It happens.
Every new technology goes through this cycle where there's initial hype. It's a
new technology, everyone focuses on the business advantages it can provide, and
then the reality sets in because technology is hard to implement and derive
business value from. It's just hard to do. So there's always a lot of
disappointment following the initial hype. Gartner publishes what I think it
calls the ''hype curve'' and tracks where technologies are on the hype curve.
There's the initial hype, and then most technologies go into this trough of
disappointment. Those that have real value eventually pull out of that trough.
They don't ever measure up to the initial hype, but they begin to increase the
value they provide companies. Data warehousing took its stings early on. Early
adopters made some fundamental mistakes. That was bound to happen. That's why
they're early adopters. They're willing to take more risks than the majority of
people. The big problem with early adopters is that they just try to do too much
at one time. They let the scope of their projects get way out of hand, pull data
from too many sources, develop too many applications at once and try to model
too many subject areas. Those 'big bang' projects had a high risk ratio and some
of them didn't succeed. A lot of them were reined in, and they had to redirect
the scope. I've known a few of those. This is where the data mart phenomenon
came from. People didn't want to take that much risk. They wanted to start small
and get something done quickly. There's very little tolerance these days for the
''big bang'' IT project -- the two- or three-year project that costs multiple
millions of dollars. People want quick hits. It's not only cheaper to do it that
way, it's also less risky and more intelligent because you learn a lot as you
go. If you're trying to do too much at once, you don't give yourself room to
adjust and adapt.
Q: Well, we're a few years past the data mart phenomenon. Did it
create islands of automation?
A: That has been a problem. I'm not
sure it's a problem with data marts as much as the way companies are organized.
A lot of companies decentralized their organizations to give more autonomy to
groups to make decisions at a local level; to make decisions that are in the
best interests of customers and the profit goals of the company. That means a
lot of them have their own IT group, so a lot of them built data warehouses and
data marts isolated from other groups in the company. But at a certain point,
someone in the organization typically wants to look across all these diverse
units and look for trends across these data marts and they can't do it. That's
when they begin to realize they're spending a lot of money duplicating effort,
staff and technology; and oftentimes, they put their foot down and say we need a
single version of the truth. We need to stamp out the costs from all these
redundant systems and come up with a centralized architecture that supports
decision making and insight. A lot of companies are now in the process of
pulling that together.
Q: Is it too early to see if people are using XML to make data hubs
or to make data formats more interchangeable?
A: There have been a
couple of XML initiatives in our space. One was an XML interchange for meta data
that's part of the CWM -- Common Warehouse Meta model meta data standard -- now
under the jurisdiction of the Object Management Group. I can't say that's really
gone anywhere, and it may not go anywhere too fast too soon. Something more
promising, though, is XML for analysis. It's a new standard for OLAP access, and
it's one of the OLAP equivalents of SQL. Basically, it's the Web services or XML
equivalent of ODBO, which stands for OLE DB for OLAP. Now they're turning that
into more of a Web services, making the syntax XML-compliant, which would open
up the OLAP market a lot more so that any client could go against any server.
And all the major OLAP vendors are actually behind it. Q196: Have people truly
built data warehouse infrastructures that are good enough now to capitalize on
by diving down to analytics and business intelligence? A203: Absolutely. It's
kind of an evolution. When companies first build warehouses, they many times are
merely replacing their operational reporting systems or end-user reporting
systems. But if they've got any vision and any information pain, they start
giving users tools to do more ad hoc queries and more analysis to find out why
something happened, not just what happened. Then the leading companies,
especially if they're doing CRM, are applying statistical analysis, statistical
modeling, to delve into what's driving customer behavior.
Q: Is that still something that you need a Ph.D. to do? Do you need
somebody who understands mathematics and statistics?
A: To build
accurate models, you need someone who knows the business, the data and
statistics. And that's not going to change. What we're seeing in the area of
data mining is that if you can narrow the domain and really specify the
applications, the parameters of the applications -- fraud detection is an
appropriate example -- you can build algorithms that are designed to detect
fraud for those types of applications. The narrower the application, the more
likely you can build an algorithm into an application that any average business
user can apply and tweak.
Q: Now to the industry. It seems like verticals are good because you
have customers whose business you understand and benefit from as long as they
stay in business. But whether it's data warehousing, BI or DSS, it seems immune
to real consolidation. There are some kinds of consolidation that happen, but
there's always hundreds of vendors.
A: It's kind of a healthy
market, but everything is going verticals. You will always have applications for
horizontal functions like finance and sales and marketing, and most of the
leading vendors have addressed those areas and will continue to supply analytic
products as well as applications that combine both operational and analytical
components. We've seen that in CRM. But going forward, we think you'll see a lot
of full-blown solutions that will be offered on a vertical basis. You'll see a
lot more vendors, including BI and ERP vendors, figure out how to build
end-to-end applications that bundle in data warehouses and analytical tools, as
well as processes for dealing with specific applications that are specific to a
vertical industry or even a subset of a vertical industry.
Q: Does that mean spinning off the results of the reports into
operational systems?
A: Yes. Once you understand what's going on,
you can send an alert to someone: inventory is down, or you have to go check
inventory at various distribution centers.
Q: What does the future hold for data warehousing?
A: I
think what we're seeing now is that the majority of companies have built their
first warehouse. They've established themselves and have most of the
architectural bugs out. They've reached that first plateau and are delivering
some value to the company. But now we see a lot of companies step back and say
what's next? Where do we go from here? How do we make this more valuable to the
company? We know there's a lot of value in this data, but we know users aren't
using it as much as they could. We know that not all users who should be using
this are using it. We know we could add more value if we made it available to
customers and suppliers. We know we could probably add more value if we added
different types of data to the warehouse, like e-mail or documents. We know we
could add more value if we could get more data in there at a more detailed
transaction level, more quickly. We know we can do this more cheaply. We know we
could probably include query performance. So there are a lot of people stepping
back right now, saying: What's next? Where do I get the biggest value for the
investment? How can I make this more valuable for people in the company? It's
quite clear that in some companies that this BI infrastructure runs the company.
Without it, companies like Best Buy, Bank of Montreal, Wal-Mart and any number
of companies that invested significant dollars -- we're not talking cheap stuff
here -- have built something that's a piece of the company. It's a multimillion
dollar investment over time. Companies are stepping back to say: How can we get
to that level?