Meta data repository architecture techniques

Over the next few years, many companies will have the unenviable task of completely rebuilding their data warehousing systems. Why? Because many of these systems were built on flawed architectures. The architecture used to build a meta data repository is every bit as critical to its long-term viability as the architecture used for the data warehouse. By taking the time to build a sound architecture, your repository effort will be able to grow and mature over time to support all of your company's meta data requirements.

A meta data repository is the logical place to uniformly retain and manage corporate knowledge (meta data) within or across different organizations in a company. During the past several years, several meta data repository architectures have emerged to address the challenges in administering and sharing meta data within an enterprise. A centralized meta data architecture and a decentralized meta data architecture are two such approaches to building a meta data repository architecture. A third approach, a distributed meta data architecture, is an advanced architecture that can be used in conjunction with a centralized or decentralized architecture.

A centralized meta data repository architecture is the most common architecture implemented by corporations. The key concept in this type of architecture is a uniform and consistent meta model that mandates the schema for defining and organizing the various meta data stored in a global meta data repository. The strength of this approach is that it integrates all the meta data and stores it in one meta model schema that can be accessed easily. In addition, because of its popularity, many of the meta data integration tools utilize this architecture.

A decentralized meta data architecture creates a uniform and consistent meta model that mandates the schema for defining and organizing the various meta data stored in a global meta data repository, as well as the shared meta data elements that appear in the local meta data repositories. All meta data that is shared and reused among the various repositories must first go through the central global repository; however, sharing and access to the local meta data is independent of the central repository. Therefore, the global repository is a subset of the local repositories.

This type of architecture is the preferred approach for large, decentralized corporations that have very heterogeneous lines of business. A decentralized architecture provides a means of centrally managing and sharing common meta data across multiple local repositories while allowing each business unit to have an autonomous repository for its own requirements. Large organizations, such as Fortune 500 and government entities, tend to use this approach.

A distributed architecture is an advanced meta data repository technique that can be used in conjunction with the centralized or decentralized approaches (see my column, "Advanced meta data architecture," Application Development Trends, August 2000, for additional advanced architectures). This architecture includes several disjointed and autonomous repositories each with its own meta model to dictate internal meta data content and organization. Each repository is solely responsible for the sharing and administration of its meta data. Thus, local changes have no effect on any other repository. As a result, users and administrators of meta data can freely control the meta model and content of each local repository, as well as how and by whom they are altered.

While this architecture provides autonomy of meta data to its local repositories, the disparate meta models pose a challenge to maintaining the consistency and uniformity of meta data across different repositories. This challenge must, in turn, be addressed on a repository-to-repository level, given that there is no centrally governing meta model nor are there management guidelines.

As a result, a distributed architecture relies heavily on locally developed protocols for meta data sharing and administration while alleviating the constraints associated with the central management of information. While this architecture has a great deal of value, a corporation will not be able to store all of its meta data in a distributed environment, nor should all meta data be stored in a distributed fashion.

I believe there is a place and a need for each of the three architectures presented here. The centralized approach works very well for the majority of corporations, while a decentralized approach is best suited for large companies with disparate lines of business. A distributed approach can be used with a centralized or decentralized architecture for those meta data sources that are located in tools. Define your requirements and choose your architecture wisely, and your repository will support your company's requirements for many years to come.

About the Author

David Marco is the author of Building and Managing the Meta Data Repository: A Full Life-Cycle Guide from John Wiley & Sons. He is founder and president of Enterprise Warehousing Solutions Inc. (EWS), a Chicago-based system integrator. He can be reached at 708-233-6330 or via E-mail at [email protected].