Columns

Conversation with Ken Orr

Q: Within the data warehousing industry there have been a few alignments or purchases of middleware companies, and there seems to be a paradigm shift where people are moving away from dash and toward trickle-feed or more immediate updates. If that is the case, where are we in this evolution?
A: I think what we're seeing is that people are now recognizing that reporting and business intelligence are not really separate activities. Historically, we always built a lot of our operational reporting into our operational systems. As we start looking at building real data warehouses, that makes less and less sense. What you want to do, by and large, is get your data out of your operational systems and into reporting form as soon as you can.

I think the timeframe of this whole concept of real-time enterprises and straight-through processing continues to get after some people. A lot of the people I see right now are pausing mid-stream and re-evaluating their warehouses and marts. This is not a new activity. We do this every five to 10 years, whether we need to or not. And part of it is because the warehouse message has always been hard to buy as it takes a long time. The problem is that with all the shortcuts, we end up with dozens or hundreds of data marts. The complexity of keeping all of that in sync and up-to-date is hard.

The other thing is that some of the COTS vendors have been catching up in the sense that they've been delivering better and better warehousing tools, but they're still hard to get data out of, ordinarily. One of my customers told me one time: 'We bought this ERP package because they came in and made a presentation to management and said, 'Hey, we've got a thousand reports in here. No matter what you ask for, we can do it.' But the first thing we discovered was that none of the thousand was what we wanted. And even though it was it sort of easy to get stuff into these ERP packages, it was very hard to get stuff out.' We've been struggling with that in a lot of big companies for the last five or six years.


Q: What do you think the next year or two will bring? ERP people have been successful, to a point.
A:
They've been working hard. I've done some consulting, for example, with both SAP and PeopleSoft sites and SAP, even four or five years ago, had a strong data warehousing direction. They understand what has to be done to get this stuff out; it just wasn't their highest priority for a long time. The problem that still plagues them -- and I think it's only been in the last two or three years that they've really understood -- is that whatever warehouse they build has to also be able to easily import or integrate data from legacy systems or other COTS packages. It's not enough to be able to just report the stuff in the ERP; you have to add the HR and CRM data.


Q: Analytics is being discussed as moving into a commodity area. A company that has certainly been in the forefront there, and has hardly been a commodity company, is SAS Institute. Are any kind of off-the-shelf analytics a threat to a company like SAS?
A:
I've worked closely with IBI and SAS over the years, and SAS has a very hard position to get dislodged from as they're the vendor of choice for statistical analysis. And, at least in my perception, there isn't a second choice in that. Plus, over the years they have carved out this niche of people who are serious analysts vs. the guys who are just doing fancy reports. I think their real strength has always been on the statistical side and that's heavy lifting for anybody.


Q: You talked quite interestingly earlier this year about how one goes about making decisions in the realm of technology. Could you touch a little upon the Orr framework, perhaps with an example of how it might fit with the real-time enterprise?
A:
The thing that I keep trying to tell people is that we know a lot about the diffusion of innovation. We know that ever since Everett Rogers wrote this stuff down, we've known about visionaries and early adopters, and early maturity, etc. So there's a predictable path that people go through. I have this chart that says at the front end you have this initial enthusiasm and then it tends to flatten out for a long time before it takes off again, if it takes off at all.

Gartner has something called the hype curve or something like that. But it's a similar concept. And the problem that happens is that in the first part of that curve we make decisions typically based on technology leaders. At the beginning, it's usually 'This is a really good technology. These are early adopter types. They like this stuff.' In the middle, you start having to understand what's the depth of the support? Who's going to win the marketing battle? And then when you get to the deployment side, then you have to re-evaluate.

Typically, what happens with technical guys is that they stay too late into the game. They fall in love with the first technology that they see, learn to use it really well, and then somebody comes along and says, 'By the way, those guys have 5% of the market share. What's the chance that they're going to survive?' And that's a continuing pattern. People fail to watch what some of the big guys like Microsoft or Oracle, are actually doing.

I get into discussions all the time when I'm teaching data warehousing design or data architecture. People will say 'We haven't made a choice yet on our multidimensional database or on our data cubes.' And I say, 'Let's just use pivot tables in Excel.' They look at me and I say, 'Let me show you why you want to use this stuff. It's already sitting on your desktop.' And people look at you and say, 'That's not real. That's not real multidimensional database.' And that's right. It's not industrial-grade like a Hyperion or some of these other things, but it also has a billion users, right?


Q: What about data in context? And what is your opinion of the semantic Web? That's sort of a large, decade-long issue.
A:
Just as aviation right after World War II bumped into the sound barrier, I think we're bumping into a semantic barrier. We've gone almost as far as we can go in the current database technologies in terms of the current sophistication of our meta data. We are basically still trapped in this fixed-field, fixed-definition, it has to be exactly five bytes long [phase]. And that's probably the largest problem we have in terms of being able to build more intelligent systems. We have to put in increasing semantics, where the systems are smart enough to recognize that client name and customer name are probably the same, and can then look at the fields themselves, and the data in the fields and do structuring from that.

We have some experiments. What Tim Berners-Lee and others are calling the semantic Web, that's a stretch. The RDF stuff is pretty primitive. It's not very smart; it's a better description of what we have on a Web site, on a report or in XML, but it's not semantics yet.


Q: One of the biggest developments in the last five years has been XML. Your thoughts?
A:
XML is a self-defining language, in a sense. And to that point, XML is a breakthrough. It is, in some respects, really old. It's almost a return to Cobol in that it allows you to embed the structure in your data. But it separates the physical form of the data from the logical. You can send out the data itself, and then you can send another file along that says: This is the way we want to present it. It makes much of what we do in data analytics or warehousing easier, because we don't send just the report, we send the data beneath that report as one piece and then we send the structure of the report as another, so that the guy at the other end can do what he wants to with the data.


Q: What is a big future issue in your opinion?
A:
I've been doing a lot of work in enterprise architecture recently, and this model-driven stuff. And I think that one of the dangers is that in the last 10 years we've kind of de-emphasized requirements and design, and kind of overemphasized the tricks of the trade.

I think the reason people are suddenly kind of interested in architecture largely stems from the fact that we don't have a lot of design anymore. And so we have to re-invent design and call it architecture. I think that's going to be a big issue for the next five to 10 years.

About the Author

Jack Vaughan is former Editor-at-Large at Application Development Trends magazine.