Ignore Meta Data Strategy At Your Peril

Although meta data is a hot topic in data warehousing circles, few data warehousing managers feel compelled to do more than just talk about the subject. This statement is backed up by a survey showing that 54% of data warehousing managers have "no plans" to develop a meta data strategy, while 21% have developed a plan but have not deployed it. Only 25% of respondents have deployed or are currently deploying a meta data strategy. The survey is based on responses from 175 members of The Data Warehousing Institute, a data warehousing educational association.

Meta data describes the information in the data warehouse: what it means, where it came from, how it was calculated, when it was loaded, who owns it and so on. In practice, meta data is what most databases, applications and information processes use to define, relate and manipulate data objects within their environments.

In recent years, managing meta data has become a top concern for data warehousing managers. Many vendors now offer meta data management tools and open meta data interfaces, and there is ongoing debate in the industry about establishing meta data standards.

Although many companies are aware of the importance of meta data, few are willing to invest the time, money and resources required to implement a robust meta data management system. Part of the problem is that the benefits are difficult to quantify; in addition, designing a data warehouse typically takes precedence over maintaining one - which requires robust meta data to do well. Consequently, most companies poorly document the nature and origins of the data they are warehousing, and even fewer leverage this information to automate management tasks.

Another obstacle is the lack of meta data standards. The absence of standards, in effect, creates islands of incompatible meta data. Applications that define data using different semantics, structures and syntax are difficult to integrate. This impedes the free flow of information across application boundaries, which is necessary to support complex business processes and solutions. For example, meta data integration is required to pass dimensional data sets from one business intelligence tool to another without losing context.

To meld apps together, firms must integrate or synchronize meta data from disparate products. Ultimately, proper meta data management requires firms to build a second data warehouse - a data warehouse about the data warehouse, if you will. Few firms have the skill, energy or time to build two data warehouses at the same time.

Applications and analysis

The lack of meta data interoperability hampers the development and efficient deployment of numerous business solutions. These include data warehousing, business intelligence, business-to-business exchanges, enterprise information portals and software development.

For example, developers use extraction, transformation and loading (ETL) tools to extract data from operational systems, map that data to a target data model, transform the data, load it into a data warehouse and update end-user or application views of the data. Each part of this process requires various applications or systems to interact to exchange and manipulate data. Because these applications generally do not use compatible meta data, it is difficult to automate these processes from end to end.

Many ETL vendors have created open APIs in their repositories, and are working with other vendors to establish interfaces for data modeling, business intelligence, scheduling and administrative tools, among others. Unfortunately, not every vendor supports all the APIs and these APIs generally offer lowest-common-denominator functionality. In addition, ETL repositories generally focus on "back-end" meta data for populating and managing data warehouses, rather than "front-end" application and tools meta data.

Some vendors that have created highly integrated suites of business intelligence tools or analytic apps may offer seamless integration among tools. However, this is the exception rather than the rule. In addition, most firms purchase multiple types of business intelligence tools, none of which are well integrated.

Meta data interoperability is also critical for managing data warehouses. For example, administrators need to be able to add and delete users, change permissions and maintain authentication in a central repository rather than in multiple apps. Administrators also need to know the parts of various apps that will be affected if they add or delete a column from the data warehouse or alter a job schedule. Ideally, administrators should be able to analyze the impact of changes, and execute updates and changes in an automated fashion.

Integrated meta data can help companies create a "lights out" data warehousing environment. A key benchmark for success should be the number of administrators required to maintain and manage a data warehouse. The fewer administrators the better. Unfortunately, most companies rely on high-priced systems analysts to maintain their data warehouses. Therefore, the lack of a common meta data language, access method and interchange mechanism increases the cost of a data warehouse and reduces its overall effectiveness.

Many companies are now beginning to deploy enterprise information portals (EIPs). EIPs retrieve data from many different applications and systems, and present the data to users within a unified, Web-enabled graphical interface. Currently, companies must build a separate interface between the EIP and each application, service or information resource that users want to access. This interface pulls data into the EIP, but does not relate data among applications. To correlate data, users must manually tag each data object using a consistent set of semantics or develop a hard-wired app relating data among resources.

On the other hand, interoperable meta data would enable an EIP to automatically identify all documents related to a user request, such as "the impact of last week's promotions on sales in the southwest." The EIP's search engine could then dynamically create a complex report that integrates a sales revenue chart, an image of the promotional coupon, a written description of the campaign and competing promotions from other companies' Web sites.

Emerging standards

There are currently two prospective standards that promise to alleviate the pain involved in creating a meta data infrastructure for data warehousing that can support the seamless integration of multiple applications.

The Object Management Group's (OMG) proposed Common Warehouse Metamodel (CWM) would greatly enhance meta data sharing and interoperability in data warehousing environments. CWM is a meta model designed by a coalition of vendors — including IBM, Unisys, NCR, Hyperion Solutions and Oracle — that complies with the OMG's Meta Object Facility (MOF) for defining meta models and modeling languages. The group's proposal also uses XMI to interchange data warehousing meta data.

The proposed CWM specifies meta models for management, transformation and operational processes within a data warehousing environment. It also specifies meta models for various data resources, including object-oriented, relational, record-oriented, multidimensional and XML data. Adoption of the CWM standard is expected shortly.

Competing with the proposed CWM is the Meta Data Coalition's (MDC) Open Information Model (OIM), which was originally developed by Microsoft with help from 20 software vendor partners. The MDC is a coalition of about 50 companies (mostly vendors) dedicated to providing an easy-to-deploy solution for accessing and interchanging corporate meta data.

The OIM specifies a meta model for component modeling, knowledge management and application development, as well as data warehousing. OIM's schema consists of 200 object types and 100 relationships described in the Unified Modeling Language (UML). It uses SQL as a query language and calls for XML as an interchange format between OIM-compliant repositories, such as the Microsoft Repository.

Since gaining control of the OIM, the MDC has worked to make it vendor-neutral. Working with Microsoft, the MDC eliminated the OIM's dependency on a COM API and is revising the model's XML interchange format, which was developed for use with the Microsoft Repository.

The MOF model is general enough to encompass OIM, and some vendors have already represented OIM as a MOF meta model. However, members of the CWM working group are currently trying to achieve an even tighter integration between CWM and OIM meta models. This would provide the same set of interfaces to repositories using these meta models, as well as the same interchange format, making life easier for programmers.

Finally, a group of vendors have teamed up to provide a Java API for meta data based on the OMG's MOF. The Java API for meta data will allow Java applications to specify, store, access and interchange meta data using standard meta data services. The Java API is likely to increase the adoption of standards-based meta data and, hence, accelerate the creation of robust applications and solutions in which there are no barriers to information exchange.

The way companies manage meta data can spell the difference between a mediocre data warehouse and a stellar one that is easy to manage and continues to provide business value over time. Now that many companies have deployed their first or second data warehouse, they need to step back and create a meta data management plan that provides insurance against unexpected changes in the business environment. A fine-tuned meta data strategy can help administrators rebuild their data warehouses or applications to support new requirements without undertaking a major, expensive overhaul.

Meanwhile, companies should also follow emerging meta data standards, even though these standards may provide more hope than help at present. And, managers may need to select products that can be integrated through point-to-point interfaces managed by individual vendors. In the long term, an ascendant meta data standard may free managers to deploy commercially available meta data repositories that provide robust integration or interchange with the majority of data warehousing and analytic tools on the market.