Columns
Meta data repository architecture techniques
- By David Marco
- March 1, 2002
Over the next few years, many companies will have the unenviable task of completely
rebuilding their data warehousing systems. Why? Because many of these systems
were built on flawed architectures. The architecture used to build a meta data
repository is every bit as critical to its long-term viability as the architecture
used for the data warehouse. By taking the time to build a sound architecture,
your repository effort will be able to grow and mature over time to support
all of your company's meta data requirements.
A meta data repository is the logical place to uniformly retain and manage
corporate knowledge (meta data) within or across different organizations in
a company. During the past several years, several meta data repository architectures
have emerged to address the challenges in administering and sharing meta data
within an enterprise. A centralized meta data architecture and a decentralized
meta data architecture are two such approaches to building a meta data repository
architecture. A third approach, a distributed meta data architecture, is an
advanced architecture that can be used in conjunction with a centralized or
decentralized architecture.
A centralized meta data repository architecture is the most common architecture
implemented by corporations. The key concept in this type of architecture is
a uniform and consistent meta model that mandates the schema for defining and
organizing the various meta data stored in a global meta data repository. The
strength of this approach is that it integrates all the meta data and stores
it in one meta model schema that can be accessed easily. In addition, because
of its popularity, many of the meta data integration tools utilize this architecture.
A decentralized meta data architecture creates a uniform and consistent meta
model that mandates the schema for defining and organizing the various meta
data stored in a global meta data repository, as well as the shared meta data
elements that appear in the local meta data repositories. All meta data that
is shared and reused among the various repositories must first go through the
central global repository; however, sharing and access to the local meta data
is independent of the central repository. Therefore, the global repository is
a subset of the local repositories.
This type of architecture is the preferred approach for large, decentralized
corporations that have very heterogeneous lines of business. A decentralized
architecture provides a means of centrally managing and sharing common meta
data across multiple local repositories while allowing each business unit to
have an autonomous repository for its own requirements. Large organizations,
such as Fortune 500 and government entities, tend to use this approach.
A distributed architecture is an advanced meta data repository technique that
can be used in conjunction with the centralized or decentralized approaches
(see my column, "Advanced meta data
architecture," Application Development Trends, August 2000, for
additional advanced architectures). This architecture includes several disjointed
and autonomous repositories each with its own meta model to dictate internal
meta data content and organization. Each repository is solely responsible for
the sharing and administration of its meta data. Thus, local changes have no
effect on any other repository. As a result, users and administrators of meta
data can freely control the meta model and content of each local repository,
as well as how and by whom they are altered.
While this architecture provides autonomy of meta data to its local repositories,
the disparate meta models pose a challenge to maintaining the consistency and
uniformity of meta data across different repositories. This challenge must,
in turn, be addressed on a repository-to-repository level, given that there
is no centrally governing meta model nor are there management guidelines.
As a result, a distributed architecture relies heavily on locally developed
protocols for meta data sharing and administration while alleviating the constraints
associated with the central management of information. While this architecture
has a great deal of value, a corporation will not be able to store all of its
meta data in a distributed environment, nor should all meta data be stored in
a distributed fashion.
I believe there is a place and a need for each of the three architectures presented
here. The centralized approach works very well for the majority of corporations,
while a decentralized approach is best suited for large companies with disparate
lines of business. A distributed approach can be used with a centralized or
decentralized architecture for those meta data sources that are located in tools.
Define your requirements and choose your architecture wisely, and your repository
will support your company's requirements for many years to come.
About the Author
David Marco is the author of Building and Managing the Meta Data Repository: A Full Life-Cycle Guide from John Wiley & Sons. He is founder and president of Enterprise Warehousing Solutions Inc. (EWS), a Chicago-based system integrator. He can be reached at 708-233-6330 or via E-mail at [email protected].