In-Depth
Who's minding the meta data?
- By Tony Baer
- June 26, 2001
A chip design software firm opts for an "enterprise mart" strategy to improve
reporting from its SAP R/3 system and other systems. A variety of off-the-shelf-packages are used. The project
succeeds, but the I/S team finds itself writing a lot more code than expected. The culprit? Coordination of the
meta data. Underneath the hood, data warehouses and data marts live or die by the data about data.
The issue grows more critical as organizations opt for data mart strategies that avoid the time and expense of
data warehouses, not to mention the problems of premature obsolescence that afflict warehouse projects that take
too long to complete. Instead of taking years to build a data warehouse, why not build a "quick start"
data mart in weeks while the iron is still hot?
"As technology barriers have gotten lower, we have seen more and more end-user departments building their
own data marts that are not integratable," said Dave Gleason, a director within Platinum Technology's data
warehouse consulting practice. "The problem is that I/S has not given end-user departments reasonable, quick
alternatives."
Is technology
the problem? |
One school
of thought says "Don't get hung up with technology." Some observers,
such as data warehousing consultant Ralph Kimball, argue that defining business
objects and metrics is the more burning issue. According to Kimball, it
is rare to find a company that has defined a set of basic business objects
("conformed dimensions") or the metrics to assess them ("fact
definitions") in advance. "If I walked into an organization that
had common definitions of customer product, regions and calendar, I would
be very impressed," he said.
But without a repository, any report-driven method becomes labor-intensive to maintain, especially if the structure
of the organization changes. Cognos Inc., Burlington, Mass., has developed the "24 Ways" methodology,
a quick-start program based on a series of generic report templates. The company claims that its two dozen generic
reports, such as multi-dimensional income statements, customer product profitability and a supplier scorecard,
let users see returns within 90 days if the organization can identify the proper data sources. But Cognos is hedging
its bets. Like most of its counterparts, it too has announced support of Microsoft's repository. "In the meantime,
we're staying focused on an approach that gives the business manager the reports they need to make business decisions,"
said Paul Hill, a marketing vice president at the firm. |
Enterprise data marts might be that alternative. Based on the idea of incrementally building central warehouses
or repositories to feed satellite data marts on an as-needed basis, enterprise marts combine the planning of enterprise
warehouses with the rapid execution of data marts.
There are plenty of tools available on the market to assemble enterprise marts, but many have a technology hole
in the middle. While the tools used to build enterprise marts provide "meta data" -- information and
parameters regarding the structure and operation of enterprise marts -- there are few standards for accessing this
meta data and no central storage place.
"The lack of integrated standards is probably the biggest detriment to us and to most data warehouse tool
vendors," conceded Sandy Walsh, product specialist for query
tool vendor Andyne Computing Ltd., Kingston, Ontario. In the meantime, users are left holding the bag. "Meta
data is very important, but it's a niche that no vendor is addressing," noted Rick Thrasher, data warehouse
manager for chip design automation giant Synopsys, Mountain View, Calif. "I'm still writing code and I'd rather
not have to do that."
Without meta data, maintenance is practically impossible. Organizations hoping to roll out new marts will probably
have to start from scratch because each new mart will require the copying or memorization of a long list of meta
data types. First, the source/target data mappings and data transformation processes must be specified, usually
in a data extraction tool or program. A data model is then necessary for the database, and if OLAP tools are used,
dimensional models are required. Finally, the end-user reporting environment parameters must be set up, including
directories of business objects, presentation formats and
report-generation schedules that are generally part of the managed reporting tool.
Lacking a hub repository, frazzled data warehouse administrators are left to search through multiple tools in order
to synchronize any change or expansion of a data mart's functionality or coverage. Not surprisingly, they learn
a hard fact: While it might be easy to get an enterprise mart off the ground, the devil remains in the details.
Hard work
Synopsys opted for an enterprise mart strategy to improve reporting from its new SAP R/3 and other transaction
systems. It was a logical conclusion that the packaged approached used for enterprise resource planning (ERP) would
also be applied to designing the marts. And, for most of the project, the strategy worked. The firm used
Oracle as its central data warehouse and the Tapestry product from D2K, San Jose, Calif. -- equipped with the necessary
R/3 API -- to move data from SAP to Oracle. End-user data marts were then built using Aperio, a Web-based data
warehouse front end from Influence Software, Sunnyvale, Calif., which operates atop a Java application server from
NetDynamics, Menlo Park, Calif.
Nonetheless, Synopsys found itself writing a lot more code than it expected, according to the company's Thrasher.
The heart of the problem, he said, was the difficulty in coordinating the needed meta data. Because of this, Synopsys
had no choice but to write its own central repository, complete with APIs to each tool that automatically updates
the listings.
The fact that meta data elements are maintained separately is not necessarily a problem, as long as the data warehouse
or data mart remains unchanged. However, this situation is not based in reality. For instance, at MCI Corp. the
expectation is that the telecommunications field is changing rapidly and that the decision support needs of key
managers are constantly in flux. For proof, just look at the unexpected chain of events that found MCI in the hands
of a new owner, Worldcom.
Therefore, MCI builds data marts as business needs dictate, such as for the introduction of a new sales promotion,
and there is no requirement that the marts be permanent fixtures. In the past three years, MCI has built 10 data
marts, with eight more currently under development. And, like Synopsys, MCI has built a central data warehouse
to feed its marts in order to avoid reinventing the wheel each time a new mart is rolled out.
MCI used the Illustra (now Informix) object relational database, using NaviServer, a Web server later purchased
by America Online as AOL Server. The repository is linked to Prism and other homegrown data extraction tools, and
to MicroStrategy's DSS Server front-end relational OLAP tools. MCI must currently update its repository manually,
something that Chief Data Warehouse Architect Ron Patterson characterizes as "not the ideal method."
When Synopsys built its Oracle-based repository, the firm designed interfaces that would automatically replicate
meta data updates from Tapestry and Aperio. However, automating the interfaces between tool and repository did
not necessarily eliminate all of the housekeeping headaches. At the end of the day, there was still the burden
of software maintenance. Nonetheless, compared to MCI's situation of multiple legacy data sources, Synopsys is
lucky in at least one respect. With the ERP system as the main data source, the data is well documented and reasonably
clean, and the business processes are well-defined due to the reengineering often associated with ERP implementations.
The advantages of doing all of the engineering and reengineering in advance was evident at Dow Chemical, Midland,
Mich. By the time it had installed SAP's R/2 (the predecessor to R/3) in the early 1990s, Dow had already mapped
its business processes and built a repository for
all of the codes that ran the transaction systems governing its 15 global business units. Better yet, the organization
had
just completed an enterprise-wide standardization of its desktop hardware and software, including Windows 95, which
involved 30,000 desktops by 1996.
"We had a very good view of our business processes at the outset," noted Mike Costa, Dow's global controller,
business reporting and analysis, who manages the data warehouse. Relying on input from more than 400 users, Dow
rolled out a series of data marts within a year after the project was conceived. Today, the enterprise mart system,
including an Oracle-based master data warehouse for extracted R/2 data, feeds a series of local Cognos PowerPlay
OLAP- and Business Objects-managed query environments, and serves more than 6,000 users. And, like MCI and Synopsys,
Dow uses an internally developed repository, which in this case is Oracle-based.
Unlike Dow Chemical, most organizations are probably in MCI's place, populating their data warehouses with legacy
data that may or may not be in relational databases. And many of these firms are contending with data sources that
are not necessarily well documented, noted Greg Jones, data warehouse practice manager for Talus, an Alexandria,
Va.-based consulting firm. "You're lucky if the [people] who wrote them are still there."
The Holy Grail
So far, the best that anyone can do is interface one vendor's meta data to another. Most vendors provide some
means of exporting their meta data if a user wishes to make the effort (see sidebar, "Approaches to coordinating
data marts"). There has also been a half-hearted effort at standardization. Three years ago, the Meta Data
Coalition was formed
by 57 vendors. The coalition has since issued a preliminary specification, the Meta Data Interchange Specification
(MDIS), which offers a low-level translation format.
However, the scope of the MDIS specification has been quite limited, as it is confined to identifying the elements
of files being migrated. It does not address anything pertaining to relational data structures or higher level
end-user data. The coalition's work has had relatively little impact so far. "A drawback is that the Meta
Data Coalition's work was physically oriented, giving us little if any help with mapping business requirements,"
said MCI's Patterson. According to Talus' Jones, MDIS has hardly entered the radar screen for his client base for
an even more basic reason. "Meta data is not on the critical path to delivering business value," he said.
According to Kay Hammer, president of Austin, Texas-based Evolutionary Technologies Inc., and chairman of the Meta
Data Coalition, the group intentionally kept its goals modest in the interest of getting something accomplished.
"Our approach was to address quick and dirty file transfers, which is 90% of the data that we deal with,"
she conceded.
Instead, the coalition -- and most of the data warehousing vendor community -- seems to be awaiting Microsoft's
next move. The Meta Data Coalition has announced a bridge to the new Microsoft Repository, whose second version
is currently in beta. Initially based on technology from the former Texas Instruments Software, Microsoft is attempting
to go where no vendor has successfully gone before -- developing a general-purpose, multivendor repository. The
level of the challenge becomes clear when you recall the lack of success that accompanied IBM's AD/Cycle, the last
major initiative to develop such an all-encompassing tool.
However, Microsoft's repository initiative has an air of the Holy Grail to it. Virtually all major data warehousing
tool vendors have promised to support the repository once it develops the right extensions. According to Lloyd
Arrow, Microsoft Repository product manager, Version 2 will store OLAP parameters, such as the type of aggregations
and data elements involved in building an OLAP dimension. It will also include data and time stamping to provide
rudimentary version control. However, the repository still falls short in one very important area: defining business
objects. At this point, business objects can only be described through text descriptions in a "remarks"
field. And, because the Microsoft Repository will be based on Windows NT, some will question whether it will ever
scale past the workgroup level, regardless of what Platinum does to port the repository to non-NT platforms.
But Microsoft's is not the only answer. IBM, which previously traveled down the single repository road with AD/Cycle,
has seen the error of its ways. "There are several different usages [of the repository] that would interfere
with each other," said George Zagelow, business intelligence segment manager for IBM. A few examples include
build time for application developers, database operational meta data for DBAs and business definitions for end
users. IBM is therefore promoting the idea of multiple repositories with meta data exchange mechanisms that would
be automatically synchronized. This would involve the firm's DataGuilde front-end meta data tag language in conjunction
with Data Atlas, its back-end counterpart.
IBM's and Microsoft's efforts demonstrate that there is probably no single answer to the meta data morass, and
that there is no way to avoid the labor-intensive task of synchronizing the management of all of the disparate
information. According to Talus' Jones, the best thing to do is to bite the bullet, decide where to store the information
-- such as within a spreadsheet, the data model, the front end or a Case tool -- and then get on with defining
the business views.
"At the end of the day, all the definitions in the world aren't worth anything if you can't produce reports,"
said Jones.