Columns
Conversation with Ken Orr
- By Jack Vaughan
- January 1, 2004
Q: Within the data warehousing industry there
have been a few alignments or purchases of middleware companies, and there seems
to be a paradigm shift where people are moving away from dash and toward
trickle-feed or more immediate updates. If that is the case, where are we in
this evolution?
A: I think what
we're seeing is that people are now recognizing that reporting and business
intelligence are not really separate activities. Historically, we always built a
lot of our operational reporting into our operational systems. As we start
looking at building real data warehouses, that makes less and less sense. What
you want to do, by and large, is get your data out of your operational systems
and into reporting form as soon as you can.
I think the timeframe of this whole concept of real-time enterprises and
straight-through processing continues to get after some people. A lot of the
people I see right now are pausing mid-stream and re-evaluating their warehouses
and marts. This is not a new activity. We do this every five to 10 years,
whether we need to or not. And part of it is because the warehouse message has
always been hard to buy as it takes a long time. The problem is that with all
the shortcuts, we end up with dozens or hundreds of data marts. The complexity
of keeping all of that in sync and up-to-date is hard.
The other thing is that some of the COTS vendors have been catching up in the
sense that they've been delivering better and better warehousing tools, but
they're still hard to get data out of, ordinarily. One of my customers told me
one time: 'We bought this ERP package because they came in and made a
presentation to management and said, 'Hey, we've got a thousand reports in here.
No matter what you ask for, we can do it.' But the first thing we discovered was
that none of the thousand was what we wanted. And even though it was it sort of
easy to get stuff into these ERP packages, it was very hard to get stuff out.'
We've been struggling with that in a lot of big companies for the last five or
six years.
Q: What do you think the next year or two will bring? ERP people have
been successful, to a point.
A: They've been working hard. I've done some
consulting, for example, with both SAP and PeopleSoft sites and SAP, even four
or five years ago, had a strong data warehousing direction. They understand what
has to be done to get this stuff out; it just wasn't their highest priority for
a long time. The problem that still plagues them -- and I think it's only been
in the last two or three years that they've really understood -- is that
whatever warehouse they build has to also be able to easily import or integrate
data from legacy systems or other COTS packages. It's not enough to be able to
just report the stuff in the ERP; you have to add the HR and CRM data.
Q: Analytics is being discussed as moving into a commodity area. A
company that has certainly been in the forefront there, and has hardly been a
commodity company, is SAS Institute. Are any kind of off-the-shelf analytics a
threat to a company like SAS?
A: I've worked closely with IBI and SAS over
the years, and SAS has a very hard position to get dislodged from as they're the
vendor of choice for statistical analysis. And, at least in my perception, there
isn't a second choice in that. Plus, over the years they have carved out this
niche of people who are serious analysts vs. the guys who are just doing fancy
reports. I think their real strength has always been on the statistical side and
that's heavy lifting for anybody.
Q: You talked quite interestingly earlier this year about how one goes
about making decisions in the realm of technology. Could you touch a little upon
the Orr framework, perhaps with an example of how it might fit with the
real-time enterprise?
A: The thing that I keep trying to tell people is that
we know a lot about the diffusion of innovation. We know that ever since Everett
Rogers wrote this stuff down, we've known about visionaries and early adopters,
and early maturity, etc. So there's a predictable path that people go through. I
have this chart that says at the front end you have this initial enthusiasm and
then it tends to flatten out for a long time before it takes off again, if it
takes off at all.
Gartner has something called the hype curve or something like that. But it's
a similar concept. And the problem that happens is that in the first part of
that curve we make decisions typically based on technology leaders. At the
beginning, it's usually 'This is a really good technology. These are early
adopter types. They like this stuff.' In the middle, you start having to
understand what's the depth of the support? Who's going to win the marketing
battle? And then when you get to the deployment side, then you have to
re-evaluate.
Typically, what happens with technical guys is that they stay too late into
the game. They fall in love with the first technology that they see, learn to
use it really well, and then somebody comes along and says, 'By the way, those
guys have 5% of the market share. What's the chance that they're going to
survive?' And that's a continuing pattern. People fail to watch what some of the
big guys like Microsoft or Oracle, are actually doing.
I get into discussions all the time when I'm teaching data warehousing design
or data architecture. People will say 'We haven't made a choice yet on our
multidimensional database or on our data cubes.' And I say, 'Let's just use
pivot tables in Excel.' They look at me and I say, 'Let me show you why you want
to use this stuff. It's already sitting on your desktop.' And people look at you
and say, 'That's not real. That's not real multidimensional database.' And
that's right. It's not industrial-grade like a Hyperion or some of these other
things, but it also has a billion users, right?
Q: What about data in context? And what is your opinion of the semantic
Web? That's sort of a large, decade-long issue.
A: Just as aviation right
after World War II bumped into the sound barrier, I think we're bumping into a
semantic barrier. We've gone almost as far as we can go in the current database
technologies in terms of the current sophistication of our meta data. We are
basically still trapped in this fixed-field, fixed-definition, it has to be
exactly five bytes long [phase]. And that's probably the largest problem we have
in terms of being able to build more intelligent systems. We have to put in
increasing semantics, where the systems are smart enough to recognize that
client name and customer name are probably the same, and can then look at the
fields themselves, and the data in the fields and do structuring from that.
We have some experiments. What Tim Berners-Lee and others are calling the
semantic Web, that's a stretch. The RDF stuff is pretty primitive. It's not very
smart; it's a better description of what we have on a Web site, on a report or
in XML, but it's not semantics yet.
Q: One of the biggest developments in the last five years has been XML. Your
thoughts?
A: XML is a self-defining language, in a sense. And to that point, XML
is a breakthrough. It is, in some respects, really old. It's almost a return to
Cobol in that it allows you to embed the structure in your data. But it
separates the physical form of the data from the logical. You can send out the
data itself, and then you can send another file along that says: This is the way
we want to present it. It makes much of what we do in data analytics or
warehousing easier, because we don't send just the report, we send the data
beneath that report as one piece and then we send the structure of the report as
another, so that the guy at the other end can do what he wants to with the
data.
Q: What is a big future issue in your opinion?
A: I've been doing a
lot of work in enterprise architecture recently, and this model-driven stuff.
And I think that one of the dangers is that in the last 10 years we've kind of
de-emphasized requirements and design, and kind of overemphasized the tricks of
the trade.
I think the reason people are suddenly kind of interested in architecture
largely stems from the fact that we don't have a lot of design anymore. And so
we have to re-invent design and call it architecture. I think that's going to be
a big issue for the next five to 10 years.
About the Author
Jack Vaughan is former Editor-at-Large at Application Development Trends magazine.