Which comes first, the meta data repository or the data warehouse?

Which application did most organizations build first: a data warehouse/data marts or a meta data repository? The obvious answer is a data warehouse. While most Global 2000 corporations have some form of a data warehouse (typically several) in-house, many companies still have not built a meta data repository. A much more interesting question is "If a company only has the time, money or resources to build one of these applications, which should it build first: the meta data repository or the data warehouse?"

But before we address this question, I want to make it clear to everyone that almost every firm needs to have both a meta data repository and a data warehouse to compete effectively in today's marketplace. Companies that neglect these applications, or do not build them properly, will be replaced by their competitors that do.

Over the years, I've had the opportunity to work with Enterprise Warehousing Solutions' more than two dozen clients, and have given more than a hundred keynotes and seminars on data warehousing and meta data repositories. During this time, I've often been asked if a company should build a meta data repository or a data warehouse first. Based on my experience, I have come to the conclusion that the optimal approach is to first build the meta data repository. Let's examine the reasons why.

When a corporation undertakes a major information technology (IT) initiative, like a customer relationship management (CRM), enterprise resource planning (ERP), data warehouse or e-commerce solution, the likelihood of project failure ranges between 65% and 80%, depending on which study you look at. This is especially alarming when you consider that these same initiatives traditionally have executive management support and cost companies many millions of dollars.

For example, I have a large client that is looking to roll out a CRM system (Siebel or Oracle, for example) and an ERP system (SAP or PeopleSoft, for example) globally in the next four years. Their initial project budget is more than $125 million. Now think about this: When was the last time you saw an ERP or CRM initiative delivered on time or on budget?

Meta data repositories enable all IT apps
When we examine the causes of failure of so many projects, several themes become apparent. First, the projects that failed did not address a definable and measurable business need. This is the number one reason for the failure of any type of project—data warehouse, CRM, meta data repository or otherwise. As IT professionals, our goal is to solve business problems or capture business opportunities. In other words, every application we build must increase revenue or decrease costs. Second, projects often fail when they aren't based on a clear understanding of the company's existing IT environment. This includes custom apps, vendor applications, external data sources, government regulations, security protocols, data elements, entities, data flows, data heritage and data lineage. A meta data repository (and, specifically, technical meta data) allows an organization to decipher its IT environment and reduce the systems development life cycle for ERP, CRM, data warehouse and e-commerce applications.

For most of these systems (specifically, data warehouses), a meta data repository is a critical project enabler and long-term sustainer of the application. But in their enthusiasm to build a data warehouse, many companies did so at the expense of architecture and quality—and without a meta data repository to support their efforts. Not surprisingly, most Global 2000 companies will spend the better part of this decade completely rebuilding their data warehouse investments.

Vendor tools
As I mentioned earlier, most companies select data warehousing tools and build a data warehouse before implementing a meta data repository. While data warehousing tools have certainly matured over the years, organizations that select data warehousing tools without addressing their meta data repository requirements will most likely end up with tools that do not support their meta data repository. Conversely, the tools used to build the meta data repository typically do not hamper the development of the data warehouse, while an incorrectly built meta data repository does.

Oftentimes, a corporation will not want to wait to attain the substantial benefits of a meta data repository and a data warehouse, and will look to build both of these apps in parallel. This approach makes sense, because a meta data repository is an absolute necessity for the success of the data warehouse. Conversely, data warehouses and the tools that build them typically provide some of the most valuable meta data for the repository.

The number of companies looking to build a meta data repository is growing at a rapid pace. While meta data repository initiatives are certainly not without their fair share of project failures, firms that have worked hard and been methodical in their approach have built repositories that provide them with a tremendous competitive advantage in their marketplace.

About the Author

David Marco is the author of Building and Managing the Meta Data Repository: A Full Life-Cycle Guide from John Wiley & Sons. He is founder and president of Enterprise Warehousing Solutions Inc. (EWS), a Chicago-based system integrator. He can be reached at 708-233-6330 or via E-mail at [email protected].