In-Depth

Who's minding the meta data?

A chip design software firm opts for an "enterprise mart" strategy to improve reporting from its SAP R/3 system and other systems. A variety of off-the-shelf-packages are used. The project succeeds, but the I/S team finds itself writing a lot more code than expected. The culprit? Coordination of the meta data. Underneath the hood, data warehouses and data marts live or die by the data about data.

The issue grows more critical as organizations opt for data mart strategies that avoid the time and expense of data warehouses, not to mention the problems of premature obsolescence that afflict warehouse projects that take too long to complete. Instead of taking years to build a data warehouse, why not build a "quick start" data mart in weeks while the iron is still hot?

"As technology barriers have gotten lower, we have seen more and more end-user departments building their own data marts that are not integratable," said Dave Gleason, a director within Platinum Technology's data warehouse consulting practice. "The problem is that I/S has not given end-user departments reasonable, quick alternatives."

Is technology the problem?
One school of thought says "Don't get hung up with technology." Some observers, such as data warehousing consultant Ralph Kimball, argue that defining business objects and metrics is the more burning issue. According to Kimball, it is rare to find a company that has defined a set of basic business objects ("conformed dimensions") or the metrics to assess them ("fact definitions") in advance. "If I walked into an organization that had common definitions of customer product, regions and calendar, I would be very impressed," he said.

But without a repository, any report-driven method becomes labor-intensive to maintain, especially if the structure of the organization changes. Cognos Inc., Burlington, Mass., has developed the "24 Ways" methodology, a quick-start program based on a series of generic report templates. The company claims that its two dozen generic reports, such as multi-dimensional income statements, customer product profitability and a supplier scorecard, let users see returns within 90 days if the organization can identify the proper data sources. But Cognos is hedging its bets. Like most of its counterparts, it too has announced support of Microsoft's repository. "In the meantime, we're staying focused on an approach that gives the business manager the reports they need to make business decisions," said Paul Hill, a marketing vice president at the firm.
Enterprise data marts might be that alternative. Based on the idea of incrementally building central warehouses or repositories to feed satellite data marts on an as-needed basis, enterprise marts combine the planning of enterprise warehouses with the rapid execution of data marts.

There are plenty of tools available on the market to assemble enterprise marts, but many have a technology hole in the middle. While the tools used to build enterprise marts provide "meta data" -- information and parameters regarding the structure and operation of enterprise marts -- there are few standards for accessing this meta data and no central storage place.

"The lack of integrated standards is probably the biggest detriment to us and to most data warehouse tool vendors," conceded Sandy Walsh, product specialist for query
tool vendor Andyne Computing Ltd., Kingston, Ontario. In the meantime, users are left holding the bag. "Meta data is very important, but it's a niche that no vendor is addressing," noted Rick Thrasher, data warehouse manager for chip design automation giant Synopsys, Mountain View, Calif. "I'm still writing code and I'd rather not have to do that."

Without meta data, maintenance is practically impossible. Organizations hoping to roll out new marts will probably have to start from scratch because each new mart will require the copying or memorization of a long list of meta data types. First, the source/target data mappings and data transformation processes must be specified, usually in a data extraction tool or program. A data model is then necessary for the database, and if OLAP tools are used, dimensional models are required. Finally, the end-user reporting environment parameters must be set up, including directories of business objects, presentation formats and
report-generation schedules that are generally part of the managed reporting tool.

Lacking a hub repository, frazzled data warehouse administrators are left to search through multiple tools in order to synchronize any change or expansion of a data mart's functionality or coverage. Not surprisingly, they learn a hard fact: While it might be easy to get an enterprise mart off the ground, the devil remains in the details.

Hard work

Synopsys opted for an enterprise mart strategy to improve reporting from its new SAP R/3 and other transaction systems. It was a logical conclusion that the packaged approached used for enterprise resource planning (ERP) would also be applied to designing the marts. And, for most of the project, the strategy worked. The firm used
Oracle as its central data warehouse and the Tapestry product from D2K, San Jose, Calif. -- equipped with the necessary R/3 API -- to move data from SAP to Oracle. End-user data marts were then built using Aperio, a Web-based data warehouse front end from Influence Software, Sunnyvale, Calif., which operates atop a Java application server from NetDynamics, Menlo Park, Calif.

Nonetheless, Synopsys found itself writing a lot more code than it expected, according to the company's Thrasher. The heart of the problem, he said, was the difficulty in coordinating the needed meta data. Because of this, Synopsys had no choice but to write its own central repository, complete with APIs to each tool that automatically updates the listings.

The fact that meta data elements are maintained separately is not necessarily a problem, as long as the data warehouse or data mart remains unchanged. However, this situation is not based in reality. For instance, at MCI Corp. the expectation is that the telecommunications field is changing rapidly and that the decision support needs of key managers are constantly in flux. For proof, just look at the unexpected chain of events that found MCI in the hands of a new owner, Worldcom.

Therefore, MCI builds data marts as business needs dictate, such as for the introduction of a new sales promotion, and there is no requirement that the marts be permanent fixtures. In the past three years, MCI has built 10 data marts, with eight more currently under development. And, like Synopsys, MCI has built a central data warehouse to feed its marts in order to avoid reinventing the wheel each time a new mart is rolled out.

MCI used the Illustra (now Informix) object relational database, using NaviServer, a Web server later purchased by America Online as AOL Server. The repository is linked to Prism and other homegrown data extraction tools, and to MicroStrategy's DSS Server front-end relational OLAP tools. MCI must currently update its repository manually, something that Chief Data Warehouse Architect Ron Patterson characterizes as "not the ideal method."

When Synopsys built its Oracle-based repository, the firm designed interfaces that would automatically replicate meta data updates from Tapestry and Aperio. However, automating the interfaces between tool and repository did not necessarily eliminate all of the housekeeping headaches. At the end of the day, there was still the burden of software maintenance. Nonetheless, compared to MCI's situation of multiple legacy data sources, Synopsys is lucky in at least one respect. With the ERP system as the main data source, the data is well documented and reasonably clean, and the business processes are well-defined due to the reengineering often associated with ERP implementations.

The advantages of doing all of the engineering and reengineering in advance was evident at Dow Chemical, Midland, Mich. By the time it had installed SAP's R/2 (the predecessor to R/3) in the early 1990s, Dow had already mapped its business processes and built a repository for
all of the codes that ran the transaction systems governing its 15 global business units. Better yet, the organization had
just completed an enterprise-wide standardization of its desktop hardware and software, including Windows 95, which involved 30,000 desktops by 1996.

"We had a very good view of our business processes at the outset," noted Mike Costa, Dow's global controller, business reporting and analysis, who manages the data warehouse. Relying on input from more than 400 users, Dow rolled out a series of data marts within a year after the project was conceived. Today, the enterprise mart system, including an Oracle-based master data warehouse for extracted R/2 data, feeds a series of local Cognos PowerPlay OLAP- and Business Objects-managed query environments, and serves more than 6,000 users. And, like MCI and Synopsys, Dow uses an internally developed repository, which in this case is Oracle-based.

Unlike Dow Chemical, most organizations are probably in MCI's place, populating their data warehouses with legacy data that may or may not be in relational databases. And many of these firms are contending with data sources that are not necessarily well documented, noted Greg Jones, data warehouse practice manager for Talus, an Alexandria, Va.-based consulting firm. "You're lucky if the [people] who wrote them are still there."

The Holy Grail

So far, the best that anyone can do is interface one vendor's meta data to another. Most vendors provide some means of exporting their meta data if a user wishes to make the effort (see sidebar, "Approaches to coordinating data marts"). There has also been a half-hearted effort at standardization. Three years ago, the Meta Data Coalition was formed
by 57 vendors. The coalition has since issued a preliminary specification, the Meta Data Interchange Specification (MDIS), which offers a low-level translation format.

However, the scope of the MDIS specification has been quite limited, as it is confined to identifying the elements of files being migrated. It does not address anything pertaining to relational data structures or higher level end-user data. The coalition's work has had relatively little impact so far. "A drawback is that the Meta Data Coalition's work was physically oriented, giving us little if any help with mapping business requirements," said MCI's Patterson. According to Talus' Jones, MDIS has hardly entered the radar screen for his client base for an even more basic reason. "Meta data is not on the critical path to delivering business value," he said.

According to Kay Hammer, president of Austin, Texas-based Evolutionary Technologies Inc., and chairman of the Meta Data Coalition, the group intentionally kept its goals modest in the interest of getting something accomplished. "Our approach was to address quick and dirty file transfers, which is 90% of the data that we deal with," she conceded.

Instead, the coalition -- and most of the data warehousing vendor community -- seems to be awaiting Microsoft's next move. The Meta Data Coalition has announced a bridge to the new Microsoft Repository, whose second version is currently in beta. Initially based on technology from the former Texas Instruments Software, Microsoft is attempting to go where no vendor has successfully gone before -- developing a general-purpose, multivendor repository. The level of the challenge becomes clear when you recall the lack of success that accompanied IBM's AD/Cycle, the last major initiative to develop such an all-encompassing tool.

However, Microsoft's repository initiative has an air of the Holy Grail to it. Virtually all major data warehousing tool vendors have promised to support the repository once it develops the right extensions. According to Lloyd Arrow, Microsoft Repository product manager, Version 2 will store OLAP parameters, such as the type of aggregations and data elements involved in building an OLAP dimension. It will also include data and time stamping to provide rudimentary version control. However, the repository still falls short in one very important area: defining business objects. At this point, business objects can only be described through text descriptions in a "remarks" field. And, because the Microsoft Repository will be based on Windows NT, some will question whether it will ever scale past the workgroup level, regardless of what Platinum does to port the repository to non-NT platforms.

But Microsoft's is not the only answer. IBM, which previously traveled down the single repository road with AD/Cycle, has seen the error of its ways. "There are several different usages [of the repository] that would interfere with each other," said George Zagelow, business intelligence segment manager for IBM. A few examples include build time for application developers, database operational meta data for DBAs and business definitions for end users. IBM is therefore promoting the idea of multiple repositories with meta data exchange mechanisms that would be automatically synchronized. This would involve the firm's DataGuilde front-end meta data tag language in conjunction with Data Atlas, its back-end counterpart.

IBM's and Microsoft's efforts demonstrate that there is probably no single answer to the meta data morass, and that there is no way to avoid the labor-intensive task of synchronizing the management of all of the disparate information. According to Talus' Jones, the best thing to do is to bite the bullet, decide where to store the information -- such as within a spreadsheet, the data model, the front end or a Case tool -- and then get on with defining the business views.

"At the end of the day, all the definitions in the world aren't worth anything if you can't produce reports," said Jones.