Columns

Tame global meta data

A common question data warehousing and business intelligence professionals ask me is, “What are companies doing about managing global meta data?” My usual answer is, “Not much.”

Mastering meta data is one of the most vexing challenges DW/BI managers face today. Why? Because coming up with global definitions for metrics and dimensions used in a data warehouse requires getting representatives from each constituency in the same room and not letting them out until they achieve consensus. Usually, you need the CEO to stand guard at the door to ensure results.


Second, there are few good options for maintaining meta data once it is defined. Typically, organizations put meta data definitions into an Access database and point users to it. The administrative group rarely updates this repository in a timely fashion, if they remember to do it at all. Despite vendor pronouncements, few tools exist today that can automatically synchronize meta data within your DW/BI environment.

Ideally, users should be able to right-click on any field or metric in a BI report and get its technical and business definition along with other relevant information, like the business owner of that data and contact information. Unfortunately, we’re still years away from achieving this.

Duplicate records, twice the trouble
There is some encouraging news. In the past few years, many firms have made headway in maintaining consistent reference data -- meta data about core sets of records. Typically, these records are stored in multiple operational systems, each of which maintains the data in a unique format with overlapping and unique fields. The amount of duplicate records among these systems ranges between 30% and 50%.

Duplicate, non-conforming reference data wreaks havoc on efficiency and profitability. For example, duplicate customer records increase direct mailing costs and annoy customers who receive multiple copies of the same direct mail piece. Duplicate product records create problems on invoices and often lead to inventory shortages. Duplicate or non-matched supplier information causes organizations to lose millions of dollars in volume discounts because purchasing managers don’t understand relationships among suppliers, their subsidiaries and distributors.

To consolidate reference data or master data, organizations are taking three approaches. The first is to standardize the entire corporation on a single vendor’s ERP app suite. Many companies have tried this, but none has succeeded entirely. Usually, they can’t turn off every legacy system and different groups upgrade to different versions of the ERP software, creating internal inconsistencies. Nonetheless, the often painful process of implementing an ERP app worldwide forces a company to come to terms with its meta data problem. Invariably, it takes gargantuan steps toward aligning the organization on a standard set of terms and definitions. This alone is worth the tens of millions of dollars in implementation costs.

Designate a system of record
The second approach organizations are taking to standardize reference data is to designate a single app as the repository and originator of that data. For example, companies may anoint a new packaged supplier management app to serve as the system of record for supplier profiles. The application distributes this information to other apps upon request. The business owners of this application are responsible for maintaining the data, which includes updating records when changes occur and notifying all downstream systems of those changes.

When there is no obvious app to anoint as a system of record, firms are turning to data integration hubs -- server engines that parse, match, standardize and synchronize reference data among multiple, distributed operational systems. In many cases, a hub works like an operational data store since it also stores a master copy of the standardized records. Many vendors now offer customer integration software.

For example, Microsoft has created an internal synchronization hub to consolidate and standardize data about corporate customers. The hub currently consists of 1.8TB of customer data culled from 60 different source systems.

The hub uses a matching engine to identify duplicate data in the source systems and assigns a unique key to each. It then standardizes a few fields across all the source systems, which it stores along with unique fields from each source system. Authorized groups can download this data from the hub to carry out marketing campaigns or other activities.

In addition, there are three operational systems at Microsoft that synchronize their data with the hub’s data on a regular basis. For example, Microsoft’s sales app sends new or updated customer records to the hub each night. The hub then matches and standardizes the records against its reference set and sends a cleansed version back to the sales app the next day, along with other attribute data in the hub that the sales application has subscribed to.

“Our [customer data integration hub] keeps customer data in synch for our key operating groups, such as sales, service and CRM,” says Kevin Mackey, IT manager at Microsoft. “We expect more groups to become active subscribers to [the hub] in the future and, eventually, I expect our system to work in near real-time using Web services.”

There are no easy answers to the meta data question. That’s because meta data -- or a lack of consistent meta data -- is a symptom of a deeper problem, which is that our organizations are continually changing. Although some organizations are starting to master meta data, most are not. Even early adopters are still feeling their way to success. And the software available to support such initiatives is also very new, so most organizations are building their own solutions. Despite signs of progress, our industry still has a way to go before it masters meta data.

About the Author

Wayne W. Eckerson is director of education and research for The Data Warehousing Institute, where he oversees TDWI's educational curriculum, member publications, and various research and consulting services. He has published and spoken extensively on data warehousing and business intelligence subjects since 1994.