Q&A: IBM's Janet Perna

Longtime IBM data management chief Janet Perna talks about the state of her operation now, as well as its plans for the future in a wide-ranging interview in her Somers, N.Y., offices with Editor-in-Chief Mike Bucken and Editor-at-Large Jack Vaughan.



Q: What is the focus of IBM’s Data Management Group today? I’m interested in how the database stands now. You’ve gone from the mainframe, and then you were selling the database as a product. But is that still your primary focus?

A: If you start in terms of what we’re doing with software, we’re building a platform for supporting the next generation of applications in support of On-Demand. We’re building on Service-Oriented Architectures [SOAs] for the next wave of applications, and the WebSphere platform with Java and J2EE is the runtime platform for supporting this.

However, the data infrastructure is also a key runtime part of the platform because there are no transactions without information. So the whole information infrastructure plays a key role in the middleware platform that we’re delivering, and it is one of the core business assets that enterprises are trying to exploit. They’re trying to leverage their information to do things like improve their customer support, the productivity of their people and their overall operational efficiency -- all of this stuff is information-driven and information-based. So when you say what are we doing with databases, we’ve seen an evolution here over the last eight or 10 years from database companies, per se.


With our data technology, we’ve taken our database -- we have relational database technology -- from the mainframe up to a wide variety of platforms; our strategy around core database technology is in supporting both transaction-oriented applications as well as analytical applications. Think data warehousing and business intelligence when I say analytical applications because these are the types of applications that companies are beginning to invest in to do things like segmenting their customers.

One investment area that we’re making in information management is our traditional one, which is databases. However, when you look at the universe of information that enterprises want to begin to leverage, record-oriented or transaction-type data is a very small piece of that information. Eighty-five percent of business information is not record-oriented -- it’s e-mail, documents, images, computer-generated reports, multimedia-type information, and Web information, and most of this information when you look at it is text of some kind. And so we’re also investing in a content repository for managing this 85% of information. And we’ve shipped the DB2 Content Manager, which is our content repository.

Q: When did that happen?

A: Probably two or two-and-a-half years ago. And this whole area of enterprise content management is evolving and consolidating very, very quickly. If you look at where it’s evolving from, there have been companies who’ve had solutions for production imaging. A document comes in, you scan it; or you have check processing where you scan checks and you store those images. Companies like FileNet were early pioneers in this area. IBM has a product called Image Plus that’s been out there for well over 10 or 15 years, but it was focused on this problem of production imaging.

Another area it’s converging from is computer-generated reports. When you get your phone bill or financial statements from your bank, those are computer-generated reports that are typically stored on laser disk in a very efficient way. Companies like Mobius and other much smaller players provide those solutions. IBM also had solutions in this area with our On-Demand product.

The third area that we began to see converging into this was the area of document management. This spans the gamut from personal documents, to workgroup documents, to what are called compound or complex documents in which teams of people are working on a document. So if you think of the construction of an airplane, for example, going into the documentation around that airplane are multiple sections, multiple drawings and different components coming from multiple people that need strict version control, check-in/check-out capabilities and things like that. Companies like Documentum were strong in that part of the market. At IBM we’ve done well in the simple document management part of that market with our Lotus products, but not so well in that very complex compound document management. In fact, we made an acquisition last year of a company called Green Pastures, which had that technology that we’ve now integrated as part of our content management offering.

There are two other areas of convergence here. The next to the last one is Web content management. We saw Interwoven, Vignette and others come to the forefront, and this was around the creation of Web content, HTML and storing that content. And IBM didn’t have an offering in this area, so last year we acquired a company called Actrix that provided us with that technology.

The last piece is digital media -- multimedia-type support -- and IBM had a product called Digital Library in which we supported rich media. There were other companies out there like Bulldog and a few other smaller companies in this area that had those solutions. Most of these companies are either gone now, or have been acquired by somebody else.

All of that stuff is converging into enterprise content management, where content repositories are going to evolve, much as databases evolved over the last 15 years. There will be a standard interface for putting content in and taking content out of the content repository. These solutions also have workflow elements associated with them, so if we’re working on a document together, we can electronically flow that document -- that approval process around a piece of content -- as well as hierarchical storage management of the actual content. And if you want to move that content into secondary or tertiary storage, there’s the ability to do that as well. We have about 10,000 customers now on our content management offerings; in fact, IBM is the market leader, according to Gartner, in enterprise content management.

Q: What IBM technologies are used in the content management effort?

A: The database group is leading this enterprise content management charge. We are using technologies from other parts of the software group and other parts of IBM, so these content management solutions have a collaborative aspect to them as well. We use Lotus technology, and we use the WebSphere portal as the window into our content management offerings. We use the Tivoli storage manager as the hierarchical storage component, and the WebSphere application server as the runtime environment for our content offering. The content repository and the related services around the content are part of DB2. When you buy the content manager, what you’re getting under the hood is some Tivoli componentry, some database componentry, some Lotus componentry and some WebSphere componentry.

If you look at other companies in this space, EMC has just acquired Documentum. Each of those companies was making acquisitions in this area to try to fill out that total environment we talked about. If you look at a company like an Oracle, a traditional database company, they have no capability or very limited capability in this type of enterprise content management that we’re talking about. When Microsoft talks about Longhorn and Yukon, they talk about elements of this type of content management but, again, they have nothing today beyond pretty simple Web content management for the Microsoft environment.

If you look at companies today, their problem around information is being able to leverage the information that they have -- and leverage means search, access, share and analyze all of the information they have regardless of where it is.

The real value to enterprises of being able to consolidate this is to be able to provide you with a more holistic service as a client. It also saves them money in the sense that if you call and have a question about something, and the information is accessible at the first touchpoint for them, it costs them four times less than if they had to call you again to respond to your question. Or if they can make that information available electronically, that’s even better.

What’s so difficult about this, is that to provide that kind of integration and search capability today, it requires application developers to hand-code connections to multiple data sources. There’s a lot of money being spent in trying to integrate information. In fact, it’s been estimated that 40% of IT budgets is spent in this kind of integration activity. And so the next area of investment that we’re making in information management is this information integration layer, which essentially abstracts or virtualizes, if you will, all of those data sources and makes them available through a standard programming interface.

Q: Is that where the products you’re selling have aspects of WebSphere or Web services technology?

A: No, this is aspects of what we call federated database technology, which we’ve talked about for a while.

Q: Is that XML now?

A: We do support XML data types, but it’s just one of the types. Typically, in the simplest case, if you have an application and it wants to access data, it will issue, in the relational sense, a SQL statement and that will enable it to go get data that’s in a relational database. That’s a very simplistic view of the world, but that’s what it is. However, if this relational database is two, it has different dialects of SQL to speak to DB2 vs. a dialect of SQL that speaks to Oracle vs. another dialect of SQL that speaks to Microsoft. Even though there are standards, these things are all slightly different and you have different dialects of SQL. This is a very simple case because at least SQL is standard.

What happens, though, if you want to go after e-mail and the information that you want to aggregate or bring together now includes e-mail? E-mail doesn’t know anything about relational database, it doesn’t know anything about SQL. Or if you want to take information that’s in a file system, how do you do that? You’d have to have a way to code whatever the interface into the e-mail is or file system is, and you would hand-code those things. And that begins to show you the integration problem. What if you want to join information across these things? To find out all of the information about a particular client that spans these things?

What we’ve done is we’ve said that through the DB2 dialect of SQL, we’re going to provide access and integration to any relational files, e-mail, Web, unstructured content, and other types of content and data stores as if it were all in one place, through the DB2 SQL API. This product is called DB2 Information Integrator. Now, we’re also going to -- and we haven’t shipped this support yet -- enable it to XQuery that says through the XQuery interface that as that is adopted we will support an XQuery interface into this information and the aggregation of the information. What we are working on in terms of our up-and-coming technology is something code-named Masala, which will have a free-form search capability into these data sources. Imagine being able to do a Google-like search across Oracle databases, Microsoft databases, some e-mail and files, and being able to search all of those data sources in that way.

Q: Do you have the XQuery capability yet?

A: We don’t have this one yet, but it is coming into beta pretty quickly, [but] the XQuery is not coming this year.

Q: So there’s no staging database?

A: Under the hood of this thing is the DB2 Query Optimizer because we have to have optimization technology to know how to go get this stuff. There is also a DB2 store that we use to store meta data, and this is a very powerful capability. We just had a customer who chose this product that has a 100 Oracle databases and they don’t have any DB2. Their problem is the integration of that information. Through the information integrator, they are able to do that kind of federation and integration of the Oracle databases.

Q: Is the XQuery coming along slower than people thought?

A: The standard hasn’t been nailed down, but it’s moving forward.

Q: How long will it take for it to be a mature technology?

A: I think when you look at what has to be done here, there are a couple of things. One is the language piece of it, supporting the XQuery language. That’s the API. The other part is storing XML data natively. If you look at what companies do today, including Microsoft, IBM and Oracle, we store XML data in our databases, but we store it as if it were relational. We all do one of two things. We either store it as one large object in a column, or we do what’s called shredding and spread it out across multiple columns. And then when we want to go get it, we kind of re-materialize it by pulling it out of the columns.

What we’re working on is being able to store that XML data in its native representation, which would be hierarchically, and that says you need ways to index it in a hierarchy vs. having it stored in a table. That’s the kind of work that needs to go on in this area. And it’s not a maturity problem at this point, it’s work that we’re doing and other database companies, other database providers, are doing as well.

Q: Do you all have to agree on how it’s done?

A: Well, at the interface level to standardize it, yes. We have an IBMer who’s instrumental in leading that standards activity; in fact, he’s the same person who did the SQL standard. But there’s agreement, and now it’s dotting the i’s and crossing the t’s on the final specification.

Q: Is this related to the Masala search technology?

A: The other part of this technology, we’ve talked about the federated part of this. The other thing you could have out here is a Web service. So suppose you had information that was available as a Web service, like a real-time feed coming in from stock prices. You can now integrate that real-time information with information that might be portfolio information about a particular client or financial portfolio, and you could get in real-time what the value of this portfolio is given what the market is doing right now.

The other thing we’re doing with this technology is replication. Sometimes we want to be able to replicate information, move information closer to the application. We might want to be able to transform it in some way, and so part of our information integrator also includes replication technology.

Q: Does that come from Lotus?

A: That actually comes from the data management area, so it’s replication that we’ve built in information management. Some of it has come from Informix, actually, as we look at the high-speed data replication problem. So that’s part of this as well. When we think about how we’re evolving our information management products, we’ve gone from relational databases to an information infrastructure. And this information infrastructure has middleware for integrating, placing, analyzing and accessing all forms of information, whether it’s in an IBM data store or not.

Information is all over, and it’s in many different types of forms, data stores and so on. It will continue to be there, and the real value and benefit to enterprises is being able to utilize all of these information assets. That’s the grand challenge that we’ve taken on. The technology we’re building on is called Garlic. It was developed at IBM Research.

Q: I thought that was sort of an XML-centered thing.

A: Right. That’s our federation technology that we’re using.

Q: How is IBM repackaging database technology for SMB? Do you have to look at that from a different point of view, dumb it down a bit or come up with packages?

A: We came up with our Express offerings: WebSphere Express, WebSphere Portal Express and DB2 Express. Essentially what we did was simplify what it takes to install and administer the environments. If you look at DB2, it has scalability up to hundreds of processors, but with that scalability there are a lot of tuning options, parameters and things that one can do. Because we know DB2 Express is going to be running at max on a four-way processor, we were able to pre-set a lot of the tuning, hide a lot of the configuration parameters and options, and significantly simplify the overall installation and up-and-running aspects of the product without dumbing down the function.

Q: What else has come out of the database development labs?

A: We’re in a beta with the next version of DB2 UDB, which has been code-named Stinger. One of the key technologies there is around the area of high availability, and one of the aspects of that is high-speed data replication. That’s part of the technology we acquired with Informix and what we’re able to do now is very, very quickly replicate, synchronously and asynchronously, data to a remote site or a standby site. That’s one of the technology areas we were able to pick up. The other area is in the spatial and geodetic extender area. Informix had geodetic and spatial data blades, and in Stinger we’re taking advantage of that technology on the DB2 base for geodetic and spatial support.

Q: Did you get as much as you expected out of it?

A: Absolutely. This was about the customers more so than the technology, although because of the tremendous skills that we picked up, the people who came over as part of Informix have been able to help us accelerate some of the technology work. We have more people working on the Informix database today than Informix did prior to the acquisition. We have shipped new releases of the Informix products. There’s been tremendous benefit to us and, I hope, to the Informix customers and partners as a result of the acquisition.

Q: What do you think will happen to the database business? Are Oracle and Sybase dead?

A: Well, it’s not dead, but it’s clearly living on its installed base. To some degree Oracle is living on its installed base as well. If you look at Oracle, about 58% of its revenue now comes from maintenance or the annuity stream on the installed base. Their new license revenue went into a decline from the end of 1999 up until the beginning of this year, and it is now back up to where it was in 1999. And DB2 new license revenue has been growing. We’ve been picking up share. In fact, we picked up about 18 points of share between 1997 and the end of 2002 on the distributed platforms.

Q: Is that mostly IBM platforms?

A: Mostly, but not exclusively. So if you look at it, probably 80% is on IBM platforms, 20% is on other platforms. And of course Informix was skewed the other way. As we mix this together, we now have a pretty good mix of IBM and non-IBM platforms.

But when you ask about the relational database business, I think it is not growing at the strong double-digit numbers it was growing at in the client/server heyday. And if you look at what analysts are projecting for relational database, they have it pegged at single-digit growth.

If you look at what’s driving it, the transaction processing types of applications are growing kind of low, single digits, and the data warehousing/business intelligence types of applications -- which all require databases -- are growing at the kind of higher, single digits. That type of application is where we see the growth, and that’s an area where IBM has an advantage from a technology point of view because analytical applications require a robust query optimizer. This is an area where IBM has more than 70 patents just in query optimizing technology. Of course, Oracle made its mark in client/server Unix databases. That market was their claim to fame with the ERP vendors and with their own applications. And as you look at kind of coming up into analytical applications, they become a bit challenged from a technology point of view.