Columns

EII — The return of the virtual data warehouse?

If you have been around the high-tech industry long enough, you know that information technologies never die, they simply morph into something else.

Almost a decade ago, the nascent data warehousing industry fought and won a pitched battle against the reactionary forces of various enterprise reporting, middleware and gateway vendors that claimed data warehousing was an inefficient and costly way to give users direct access to data.

Their argument: Why spend time and money building a redundant data store when you can create a virtual data warehouse (VDW) and grab the data when you need it? Their products provided users with a unified, global view of distributed systems that shielded them from the complexity of locating, querying and joining data from multiple systems on the fly.

Although many CIOs favored this concept because they were reluctant to undertake large data warehousing projects, the VDW concept ultimately failed. CIOs decided to build data warehouses to avoid bogging down operational systems with end-user queries, deliver custom reports more quickly and satisfy the growing user demand for direct access to data.

In addition, data warehousing purists successfully convinced CIOs that operational and analytical processing systems require different architectures to optimize processing. Operational systems are optimized for processing transactions against small volumes of similar data, whereas data warehouses are optimized to handle complex queries against large volumes of dissimilar data. Using one system to handle both types of processing is a compromise solution that ultimately undermines performance and availability.


EII -- The new VDW

However, like all good information technologies, VDW never died. It is now re-emerging under a new guise and with a slightly new mission. Its new name: Enterprise Information Integration (EII), an acronym that plays off the acronyms ETL and EAI, two key technologies in the emerging data integration market. EII vendors would like to promote their technology as the third major component in a data integration toolkit.

But what is EII? And does it have a future?

To me, EII is distributed query processing on steroids. Many categories of products have incorporated this capability for years. Take reporting vendors, for example. Information Builders, Business Objects and Crystal Decisions all build reports by issuing queries against multiple databases to build complex production reports. In EII terms, these reports -- especially if run on demand -- represent virtual data marts. But are reporting vendors EII vendors?

Now take ETL vendors. They also pull data from multiple sources using SQL, native database APIs and other connectors. Without much effort, most could collect data in response to an ODBC request, integrate it and deliver the output to the requesting program on the fly. In fact, ETL vendor Sagent Technology recently declared itself an EII vendor by doing just this.

By the same token, database and Web application server vendors can do the same thing. Most contain SQL query functions, logic and delivery mechanisms to manage distributed queries. And the advent of Web services will make it easier for these types of vendors to connect and deliver data to requesting entities. So do BI, ETL and database vendors make EII obsolete?

The answer lies in what truly constitutes an EII product. The key differentiators are a vendor’s ability to map geographically distributed, heterogeneous data (including unstructured data) in a single logical model and to process complex distributed queries on the fly with fast performance. That’s a tall order. Let’s now look at each of these attributes.

Global maps --— The best EII tools create logical models or global maps of distributed, heterogeneous data. These maps hide the complexity of accessing back-end systems from developers or end users. Some tools define data components in business terms and rules that ordinary business users can understand and use to query a “virtual” database. In the BI world we call this model a “semantic layer.” Before the advent of data warehouses, some desktop query and reporting vendors (notably Business Objects) created semantic layers so that regular users could query distributed legacy systems without knowing SQL.

Until recently, data warehouses obviated the need for rich semantic layers. (Many consider Business Objects’ semantic layer to be overkill in a data warehousing environment.) But organizations now seem to be consuming information more rapidly than IT developers can stuff operational data into a data warehouse. Consumers want to see everything in the data warehouse and then some.

And here’s where EII vendors spot an opportunity. Having learned their lessons, smart EII vendors no longer pitch their technology as a data warehousing replacement. Rather, they offer their tools as a way to complement a data warehouse. In fact, EII might be described as “data warehouse plus.”

As the worlds of operational and analytical computing converge around real-time data warehousing and business performance management (BPM), organizations are (consciously or not) building VDWs. Most firms need EII tools that allow them to capture and integrate historical, transactional and external data. While most of this data may be in a data warehouse, some may not be.

For example, a customer service application may use EII software to deliver a 360-degree view of its customers. The application may use EII software to combine customer demographics and purchasing history from the data warehouse, their Web activity from a Web server and their pending order from an order-entry system. Or an executive may want to analyze the seasonality of today’s inventory levels and manufacturing output; this involves integrating historical data from the warehouse with up-to-the-minute data from inventory and manufacturing systems.

High performance -- Although much of the effort with EII involves creating global maps, the tools must also execute in a high-performance manner. Tools must have a query optimizer that knows the most efficient join paths and how to exploit the processing capacity of existing databases and systems without bogging them down. The unfortunate fact about EII technology is that it is only as fast as its slowest component or join process.

To ensure fast processing, some EII tools distribute their “brains” or processing to various nodes on the network, while others centralize everything. Distributing processing becomes complex to deploy and maintain. Centralized systems create potential bottlenecks and points of failure. A word to the wise is to carefully examine the query processing architecture of an EII tool before taking the plunge.

EII vendors — Which vendors offer pure-play EII products? You can check out Giga Information Group or other analyst firms for complete listings, but here is a short list that I’ve compiled: MetaMatrix by MetaMatrix, XAware’s XA-Suite, BEA Systems’ Liquid Data, Enosys Software’s New Media, Ipedo’s Ipedo3, Nimble Technology’s Nimble Integration Suite, IBM’s Information Integrator, Decision Support’s DQpowersuite, Sagent’s OpenLink and Composite Software’s Composite Data Management System.

Bottom line -- At the beginning of this article, I asked whether EII tools represent a serious market. The bottom line is that EII can and will be an important tool in any company’s IT toolbox. EII is an important vehicle for delivering analytic services in any type of application in which users need to access distributed data.

However, I believe most organizations will be able to exploit their existing tools (e.g., BI, ETL, databases, Web application servers) to support EII requirements. Given the current economic climate and the surfeit of IT technologies within corporations, most companies will not shell out extra cash to bring in yet another technology.

Like most new IT technologies, EII vendors will need to find and exploit a market niche that contains a distinct business application with obvious business value or risk being acquired. For example, Enterworks has transformed itself from a VDW vendor in the early 1990s to a provider of “catalog creation and management software for e-procurement buyers and suppliers.” Information Builders has spun off its Enterprise Data Access product into a successful subsidiary named iWay that focuses on the larger data integration market.

On the acquisition side, nQuire Software, which provided a distributed OLAP product, was acquired by Siebel Systems and forms the basis of the Siebel Analytics product line. Top Tier was acquired by SAP and now forms the basis of SAP’s “drag and relate” technology, which is a key component of the firm’s portal technology and user interfaces.

Ultimately, I don’t see an independent market for pure-play EII vendors. But EII will undoubtedly resurface again in another shape and form. Stay tuned.

About the Author

Wayne W. Eckerson is director of education and research for The Data Warehousing Institute, where he oversees TDWI's educational curriculum, member publications, and various research and consulting services. He has published and spoken extensively on data warehousing and business intelligence subjects since 1994.