Columns
Ignore Meta Data Strategy At Your Peril
Although meta data is a hot topic in data warehousing circles, few data warehousing managers
feel compelled to do more than just talk about the subject. This statement is backed up by
a survey showing that 54% of data warehousing managers have "no plans" to develop a meta data
strategy, while 21% have developed a plan but have not deployed it. Only 25% of respondents
have deployed or are currently deploying a meta data strategy. The survey is based on responses
from 175 members of The Data Warehousing Institute, a data warehousing educational association.
Meta data describes the information in the data warehouse: what it means, where it came from,
how it was calculated, when it was loaded, who owns it and so on. In practice, meta data is
what most databases, applications and information processes use to define, relate and manipulate
data objects within their environments.
In recent years, managing meta data has become a top concern for data warehousing managers.
Many vendors now offer meta data management tools and open meta data interfaces, and there is
ongoing debate in the industry about establishing meta data standards.
Although many companies are aware of the importance of meta data, few are willing to invest
the time, money and resources required to implement a robust meta data management system.
Part of the problem is that the benefits are difficult to quantify; in addition, designing
a data warehouse typically takes precedence over maintaining one - which requires robust
meta data to do well. Consequently, most companies poorly document the nature and origins
of the data they are warehousing, and even fewer leverage this information to automate
management tasks.
Another obstacle is the lack of meta data standards. The absence of standards, in effect,
creates islands of incompatible meta data. Applications that define data using different
semantics, structures and syntax are difficult to integrate. This impedes the free flow
of information across application boundaries, which is necessary to support complex business
processes and solutions. For example, meta data integration is required to pass dimensional
data sets from one business intelligence tool to another without losing context.
To meld apps together, firms must integrate or synchronize meta data from disparate products.
Ultimately, proper meta data management requires firms to build a
second data warehouse - a data warehouse about the data warehouse, if you will. Few firms
have the skill, energy or time to build two data warehouses at the same time.
Applications and analysis
The lack of meta data interoperability hampers the development and efficient deployment
of numerous business solutions. These include data warehousing, business intelligence,
business-to-business exchanges, enterprise information portals and software development.
For example, developers use extraction, transformation and loading (ETL) tools to extract
data from operational systems, map that data to a target data model, transform the data,
load it into a data warehouse and update end-user or application views of the data. Each
part of this process requires various applications or systems to interact to exchange and
manipulate data. Because these applications generally do not use compatible meta data, it
is difficult to automate these processes from end to end.
Many ETL vendors have created open APIs in their repositories, and are working with other
vendors to establish interfaces for data modeling, business intelligence, scheduling and
administrative tools, among others. Unfortunately, not every vendor supports all the APIs
and these APIs generally offer lowest-common-denominator functionality. In addition, ETL
repositories generally focus on "back-end" meta data for populating and managing data
warehouses, rather than "front-end" application and tools meta data.
Some vendors that have created highly integrated suites of business intelligence tools or
analytic apps may offer seamless integration among tools. However, this is the exception
rather than the rule. In addition, most firms purchase multiple types of business intelligence
tools, none of which are well integrated.
Meta data interoperability is also critical for managing data warehouses. For example,
administrators need to be able to add and delete users, change permissions and maintain
authentication in a central repository rather than in multiple apps. Administrators also
need to know the parts of various apps that will be affected if they add or delete a column
from the data warehouse or alter a job schedule. Ideally, administrators should be able to
analyze the impact of changes, and execute updates and changes in an automated fashion.
Integrated meta data can help companies create a "lights out" data warehousing environment.
A key benchmark for success should be the number of administrators required to maintain and
manage a data warehouse. The fewer administrators the better. Unfortunately, most companies
rely on high-priced systems analysts to maintain their data warehouses. Therefore, the lack
of a common meta data language, access method and interchange mechanism increases the cost
of a data warehouse and reduces its overall effectiveness.
Many companies are now beginning to deploy enterprise information portals (EIPs). EIPs
retrieve data from many different applications and systems, and present the data to
users within a unified, Web-enabled graphical interface. Currently, companies must build
a separate interface between the EIP and each application, service or information resource
that users want to access. This interface pulls data into the EIP, but does not relate
data among applications. To correlate data, users must manually tag each data object using
a consistent set of semantics or develop a hard-wired app relating data among resources.
On the other hand, interoperable meta data would enable an EIP to automatically identify all
documents related to a user request, such as "the impact of last week's promotions on sales
in the southwest." The EIP's search engine could then dynamically create a complex report
that integrates a sales revenue chart, an image of the promotional coupon, a written
description of the campaign and competing promotions from other companies' Web sites.
Emerging standards
There are currently two prospective standards that promise to alleviate the pain involved
in creating a meta data infrastructure for data warehousing that can support the seamless
integration of multiple applications.
The Object Management Group's (OMG) proposed Common Warehouse Metamodel (CWM) would greatly
enhance meta data sharing and interoperability in data warehousing environments. CWM is a
meta model designed by a coalition of vendors — including IBM, Unisys, NCR, Hyperion
Solutions and Oracle — that complies with the OMG's Meta Object Facility (MOF) for
defining meta models and modeling languages. The group's proposal also uses XMI to
interchange data warehousing meta data.
The proposed CWM specifies meta models for management, transformation and operational
processes within a data warehousing environment. It also specifies meta models for
various data resources, including object-oriented, relational, record-oriented,
multidimensional and XML data. Adoption of the CWM standard is expected shortly.
Competing with the proposed CWM is the Meta Data Coalition's (MDC) Open Information Model
(OIM), which was originally developed by Microsoft with help from 20 software vendor partners.
The MDC is a coalition of about 50 companies (mostly vendors) dedicated to providing an
easy-to-deploy solution for accessing and interchanging corporate meta data.
The OIM specifies a meta model for component modeling, knowledge management and application
development, as well as data warehousing. OIM's schema consists of 200 object types and
100 relationships described in the Unified Modeling Language (UML). It uses SQL as a query
language and calls for XML as an interchange format
between OIM-compliant repositories, such as the Microsoft Repository.
Since gaining control of the OIM, the MDC has worked to make it vendor-neutral. Working with
Microsoft, the MDC eliminated the OIM's dependency on a COM API and is revising the model's
XML interchange format, which was developed for use with the Microsoft Repository.
The MOF model is general enough to encompass OIM, and some vendors have already represented
OIM as a MOF meta model. However, members of the CWM working group are currently trying to
achieve an even tighter integration between CWM and OIM meta models. This would provide the
same set of interfaces to repositories using these meta models, as well as the same
interchange format, making life easier for programmers.
Finally, a group of vendors have teamed up to provide a Java API for meta data based on the
OMG's MOF. The Java API for meta data will allow Java applications to specify, store,
access and interchange meta data using standard meta data services. The Java API is
likely to increase the adoption of standards-based meta data and, hence, accelerate the
creation of robust applications and solutions in which there are no barriers to information
exchange.
The way companies manage meta data can spell the difference between a mediocre data warehouse
and a stellar one that is easy to manage and continues to provide business value over time.
Now that many companies have deployed their first or second data warehouse, they need to
step back and create a meta data management plan that provides insurance against unexpected
changes in the business environment. A fine-tuned meta data strategy can help administrators
rebuild their data warehouses or applications to support new requirements without undertaking
a major, expensive overhaul.
Meanwhile, companies should also follow emerging meta data standards, even though these
standards may provide more hope than help at present. And, managers may need to select
products that can be integrated through point-to-point interfaces managed by individual
vendors. In the long term, an ascendant meta data standard may free managers to deploy
commercially available meta data repositories that provide robust integration or interchange
with the majority of data warehousing and analytic tools on the market.