In-Depth

Q&A: Eckerson measures state of data warehousing

Wayne Eckerson has been in the middle of the data warehousing business for years as director of education and research at the well-respected Data Warehousing Institute in Seattle. Wayne sat down over dinner recently with ADT Editor-at-Large Jack Vaughan to answer some questions on the state of the industry.

Q: Why has data warehousing stuck around? It was a big-ticket item and there were a lot of bad experiences. In this downturn, it seems like the people closest to the data are getting most of the budgets. Your comments?
A: I think if you asked anyone who started out with data warehousing in the mid-'90s, when it came to the forefront whether it would still be here -- whether the Institute would still be around in 2002 -- I don't think anyone would have said that it would still be here. I think a lot of our faculty are still surprised that we're running four conferences a year and getting 500, 600 or 700 people there each time. It is amazing. What has happened is that data warehousing has indeed become part of the IT infrastructure. It really is, in my mind, a complement to the transaction processing infrastructure that we built during the '70s and '80s. And during the '90s and the next decade, we're building the information insight infrastructure. When you look back on it, it makes a lot of sense to me.

Q: Is it true that there's a distinct data person or data warehouse person? How do they fit in within a larger organization?
A: We just published our salary survey, The Salary Roles and Responsibility Report. There are a lot of different roles that people are playing. The primary role -- our primary audience -- is data warehousing project managers.

Q: It seems that people are still trying to mine their data -- find gold -- but are still sending out flyers that go to people that are deceased.
A: Well, we haven't perfected the art of leveraging our information yet. We still have a long way to go, which is good news for the data warehousing and business intelligence (BI) market. I use the term business intelligence because it's a broader term that encompasses the data warehousing infrastructure as well as the analytical tools and applications that run on top of those two things. The BI market, from a Wall Street perspective, has done better than most other IT markets in the last two years. It's up 2% if you look at the publicly traded companies and track their revenue. But the CRM market, ERP market and other IT markets are experiencing significant downturns compared to BI. BI is a hotter market because people understand that in a competitive market, or in an economy that's in a downturn, information and insight about customers, supply chain, operations and employees can spell the difference. It can drive significant costs out of your budget, and it can bring additional revenue in to the tune of millions and millions of dollars.

Q: A couple of years ago, people were focused on visible failures. Have we gotten over that, and did it really happen?
A: It happens. Every new technology goes through this cycle where there's initial hype. It's a new technology, everyone focuses on the business advantages it can provide, and then the reality sets in because technology is hard to implement and derive business value from. It's just hard to do. So there's always a lot of disappointment following the initial hype. Gartner publishes what I think it calls the ''hype curve'' and tracks where technologies are on the hype curve. There's the initial hype, and then most technologies go into this trough of disappointment. Those that have real value eventually pull out of that trough. They don't ever measure up to the initial hype, but they begin to increase the value they provide companies. Data warehousing took its stings early on. Early adopters made some fundamental mistakes. That was bound to happen. That's why they're early adopters. They're willing to take more risks than the majority of people. The big problem with early adopters is that they just try to do too much at one time. They let the scope of their projects get way out of hand, pull data from too many sources, develop too many applications at once and try to model too many subject areas. Those 'big bang' projects had a high risk ratio and some of them didn't succeed. A lot of them were reined in, and they had to redirect the scope. I've known a few of those. This is where the data mart phenomenon came from. People didn't want to take that much risk. They wanted to start small and get something done quickly. There's very little tolerance these days for the ''big bang'' IT project -- the two- or three-year project that costs multiple millions of dollars. People want quick hits. It's not only cheaper to do it that way, it's also less risky and more intelligent because you learn a lot as you go. If you're trying to do too much at once, you don't give yourself room to adjust and adapt.

Q: Well, we're a few years past the data mart phenomenon. Did it create islands of automation?
A: That has been a problem. I'm not sure it's a problem with data marts as much as the way companies are organized. A lot of companies decentralized their organizations to give more autonomy to groups to make decisions at a local level; to make decisions that are in the best interests of customers and the profit goals of the company. That means a lot of them have their own IT group, so a lot of them built data warehouses and data marts isolated from other groups in the company. But at a certain point, someone in the organization typically wants to look across all these diverse units and look for trends across these data marts and they can't do it. That's when they begin to realize they're spending a lot of money duplicating effort, staff and technology; and oftentimes, they put their foot down and say we need a single version of the truth. We need to stamp out the costs from all these redundant systems and come up with a centralized architecture that supports decision making and insight. A lot of companies are now in the process of pulling that together.

Q: Is it too early to see if people are using XML to make data hubs or to make data formats more interchangeable?
A: There have been a couple of XML initiatives in our space. One was an XML interchange for meta data that's part of the CWM -- Common Warehouse Meta model meta data standard -- now under the jurisdiction of the Object Management Group. I can't say that's really gone anywhere, and it may not go anywhere too fast too soon. Something more promising, though, is XML for analysis. It's a new standard for OLAP access, and it's one of the OLAP equivalents of SQL. Basically, it's the Web services or XML equivalent of ODBO, which stands for OLE DB for OLAP. Now they're turning that into more of a Web services, making the syntax XML-compliant, which would open up the OLAP market a lot more so that any client could go against any server. And all the major OLAP vendors are actually behind it. Q196: Have people truly built data warehouse infrastructures that are good enough now to capitalize on by diving down to analytics and business intelligence? A203: Absolutely. It's kind of an evolution. When companies first build warehouses, they many times are merely replacing their operational reporting systems or end-user reporting systems. But if they've got any vision and any information pain, they start giving users tools to do more ad hoc queries and more analysis to find out why something happened, not just what happened. Then the leading companies, especially if they're doing CRM, are applying statistical analysis, statistical modeling, to delve into what's driving customer behavior.

Q: Is that still something that you need a Ph.D. to do? Do you need somebody who understands mathematics and statistics?
A: To build accurate models, you need someone who knows the business, the data and statistics. And that's not going to change. What we're seeing in the area of data mining is that if you can narrow the domain and really specify the applications, the parameters of the applications -- fraud detection is an appropriate example -- you can build algorithms that are designed to detect fraud for those types of applications. The narrower the application, the more likely you can build an algorithm into an application that any average business user can apply and tweak.

Q: Now to the industry. It seems like verticals are good because you have customers whose business you understand and benefit from as long as they stay in business. But whether it's data warehousing, BI or DSS, it seems immune to real consolidation. There are some kinds of consolidation that happen, but there's always hundreds of vendors.
A: It's kind of a healthy market, but everything is going verticals. You will always have applications for horizontal functions like finance and sales and marketing, and most of the leading vendors have addressed those areas and will continue to supply analytic products as well as applications that combine both operational and analytical components. We've seen that in CRM. But going forward, we think you'll see a lot of full-blown solutions that will be offered on a vertical basis. You'll see a lot more vendors, including BI and ERP vendors, figure out how to build end-to-end applications that bundle in data warehouses and analytical tools, as well as processes for dealing with specific applications that are specific to a vertical industry or even a subset of a vertical industry.

Q: Does that mean spinning off the results of the reports into operational systems?
A: Yes. Once you understand what's going on, you can send an alert to someone: inventory is down, or you have to go check inventory at various distribution centers.

Q: What does the future hold for data warehousing?
A: I think what we're seeing now is that the majority of companies have built their first warehouse. They've established themselves and have most of the architectural bugs out. They've reached that first plateau and are delivering some value to the company. But now we see a lot of companies step back and say what's next? Where do we go from here? How do we make this more valuable to the company? We know there's a lot of value in this data, but we know users aren't using it as much as they could. We know that not all users who should be using this are using it. We know we could add more value if we made it available to customers and suppliers. We know we could probably add more value if we added different types of data to the warehouse, like e-mail or documents. We know we could add more value if we could get more data in there at a more detailed transaction level, more quickly. We know we can do this more cheaply. We know we could probably include query performance. So there are a lot of people stepping back right now, saying: What's next? Where do I get the biggest value for the investment? How can I make this more valuable for people in the company? It's quite clear that in some companies that this BI infrastructure runs the company. Without it, companies like Best Buy, Bank of Montreal, Wal-Mart and any number of companies that invested significant dollars -- we're not talking cheap stuff here -- have built something that's a piece of the company. It's a multimillion dollar investment over time. Companies are stepping back to say: How can we get to that level?