In-Depth

Wake up and smell the meta data

Several months ago, I was talking to one of the leading analysts in the area of Enterprise Application Integration (EAI) about meta data. His comment? "Americans don't get meta data. Europeans do, but Americans don't. Say the word meta data, and they fall asleep."

I am not so sure that this is true nowadays, as large corporations like IBM, Oracle and Microsoft have all launched major meta data initiatives in the last 18 months. It might be more accurate to say that most IT customers have a fair amount of skepticism about meta data solutions, and with good reason -- most of the data dictionary and repository efforts to date have failed to take hold.

If one had to choose a single reason why repository-centric architectures have not succeeded in larger organizations, it would be because vendors assumed that too much could be achieved through analysis. Because the state of most operational databases reside on legacy systems, it is almost impossible to reasonably analyze the meta data in place at a large organization. Among the difficulties is the fact that:

  • These systems are frequently extremely complex. As a vendor of products in the data integration management space, we have run into customers with 400-page IDMS schemas and 100,000-line COBOL copybooks.
  • The meta data required is often distributed across multiple environments. In most production IT shops, it is not possible to rebuild historical data when an application requires new data -- either because the organization does not have the necessary additional data or there is simply not enough downtime. As a result, DBAs frequently deploy legacy DBMS products in such a way as to reduce the necessity of rebuilding the database. In the IMS world, this is frequently done using the Data Base Definition language to define data only at the SEGMENT level; COBOL or PL/1 copybooks are then used to flesh out the details of each record. The REDEFINES option in COBOL lets people use the same space in a COBOL record in multiple ways. As a result, to understand what a legacy database really looks like may require collating information from the DBMS data definition language, one or more file descriptions and the application code.
  • Operational schemas change with some regularity -- if not with respect to the layout itself, then with respect to the legal values that can be stored in a particular field.

Given these facts, is it any wonder that strategic initiatives listing their initial deliverable as an enterprise data model were often canceled before any implementation began? Or that no repository company or major repository initiative (such as A/D Cycle) has been wildly successful?

The importance of a strategy

Despite the failure of many of these earlier efforts, an enterprise meta data strategy is becoming a critical factor if one is to cut the growing cost of interfacing. For example, in its report on Enterprise Application Integration, Minneapolis-based investment banking and institutional brokerage firm Dain Rauscher Wessels estimates that organizations spent $85 billion writing and maintaining interfaces by hand in 1998. The company believes this figure will increase by 30% in 1999.

But why is the data integration management problem becoming so visible today? Two primary causes are at work: 1) The rush to use the Web to better serve customers and to maximize the efficiency of the supply chain; and 2) The volatility of today's business environment, which is characterized by the deregulation of industries and merger/acquisition mania. If companies do not keep meta data audit trails of how they have implemented interfaces, they run the risk of imploding under the weight of being so connected.

We have already determined that it is unreasonable to attack the meta data problem by analysis. A better approach would be to adopt a strategy in which information is acquired and shared over time. Companies should therefore look for products that keep a meta data audit trail describing what the user has done; these products should also be able to exchange meta data with other related products in the IT infrastructure. However, this will benefit an organization only if there is a methodology and/ or a set of processes that enforces the sharing of this information.

Difficulties remain

When dealing in the abstract, it is easy to envision an architecture that shares a common set of APIs against a common meta model, or that utilizes an active repository that automates the percolation of related changes throughout the organization. These are interesting concepts; but to be successful, it is important to recognize that even with standards and automated meta data exchange between products, meta data management will remain difficult for a number of reasons.

IT WILL REMAIN COMPLEX. Legacy databases with their "secret" schemas are not the only complex definitions one will encounter. The schema for an enterprise resource planning manufacturing module may utilize anywhere from 3,000 to 5,000 table definitions. Moreover, a new release of an application may not be compatible with a previous release. As a result, as with legacy systems, versioning may be required to represent
differences in the business rules applied.

IT WILL REMAIN HETEROGENEOUS AND DISTRIBUTED Most tools and applications will require
the meta data needed for execution to be stored in its own repository. Legacy systems will remain in use for the foreseeable future.

THE DEFINITION OF META DATA WILL BE OWNED BY DIFFERENT ORGANIZATIONS. The concept of a data steward or data architect, or even an organization in charge of providing interface services, is a good way to enforce an enterprise's meta data strategy. However, it is not reasonable to expect an IT organization maintaining a mission-critical application to "ask permission" about whether or how they represent an application's key information. Rather, data stewards would be valued more if they were a resource to answer questions such as "Does anyone else use a look-up table that relates X to Y?," and/or seen as keepers of what the division actually does.

DIFFERENT VENDORS WILL "MISSION" PRIVATE META DATA THAT IS KEY TO THEIR VALUE PROPOSITION. For any standard to work, there must be some way for this information to be maintained as related common objects and relationships are exchanged and modified.

Because of these complexities, it seems a file-based mechanism for exchanging meta data between products is more likely to succeed -- at least for the short term -- than a more elegant but simplistic meta model. In other words, it is more important that products know how to map their internal meta model to an interchange format representing the appropriate entities and relationships than to insist that they share a common meta model.

The Meta Data Coalition

To arrive at a short-term solution, the key is to get a sufficient number of vendors to agree on and support some interchange format. The Meta Data Coalition (MDC) may be just the vehicle needed to accomplish this task. Founded in 1995, the purpose of the MDC is to develop and provide a tactical means of standardized meta data exchange. In 1996, the group introduced the Meta Data Interchange Specification (MDIS), which "provides a metamodel that addresses the main types of commonly shared metadata objects and a standard import/export mechanism that enables the exchange of these metadata objects between tools."

The MDC initially received support (albeit with a little skepticism) from a number of industry analysts. Over the past two years, however, the attitude has been predominantly cynical -- and with good reason. While the MDC has always been able to attract and keep a membership of between 50 and 60 vendors, only seven have actually implemented support for MDIS. The other vendors -- some of which participated on the technical subcommittee -- have preferred to take a "wait-and-see" approach.

The addition of Microsoft to the MDC's roll call -- and the reciprocal membership established between the Coalition and the Object Management Group (OMG) -- has changed things. The MDC now appears to be gaining sufficient critical mass to achieve its goal of defining a tactical (but technically sound) meta model and exchange format.

Recently, the MDC completed a technical review of the MDC-OIM, a technology-independent and vendor-neutral information model describing the structure and semantics of meta data. The MDC-OIM is based on the Microsoft Open Information Model, a meta data model and specification that is part of Microsoft Repository, a meta data management product.

Note that in keeping with the MDC's initial charter, members do not expect the meta model to be static or sacrosanct. Changes will be required as the concept of what vendors and users consider to be key meta data evolves. What is important is not that the standard changes, but that the vendors associated with the MDC are committed to supporting the standard.

Business rules: The missing link

When discussing the meta data "wars," the irony is that the areas of technical disagreement to date are relatively small with respect to what must be represented. However, there is one very important aspect of meta data that has been ignored -- the specification of business rules. While the GUIDE initiative has spent considerable effort studying this area, the group has focused more on the semantic definition and full range of what can be considered a business rule (such as "employees cannot smoke in the building."). On the other hand, the MDC's concern in this area is to define what would constitute a user's language for specifying the kind of test and transformation logic most often used in interfacing applications. Note that this can range from simple math to complex conditional logic.

The problem today is that most software vendors require technical users to specify this kind of logic, whether Visual Basic, COBOL, SQL, C or some proprietary language (except in the case of products that only support the specification of business rules as documentation). However, there are several drawbacks to this situation:

  • Business users are frequently best suited to specify this type of information.
  • Business users cannot readily understand business rules specified in one of the technical languages outlined above.
  • Because technical users can employ different techniques for referencing objects, it is usually very hard for the specification of a business rule in one product to be understood by another.

Given that 40% to 80% of the data values moved in interface programs are transformed, it is critical that any meta data interchange mechanism provide a standard means of expressing this kind of meta data that is also suitable for end users. Such a mechanism would need to define a subset of natural language that is sufficiently constrained to easily write parsers for, and yet sufficiently rich enough to
allow for the specification of arbitrarily complex logic.

In addition, such a language would need:

  • 3Context sensitivity. This would provide, for example, a means of indicating that the legality of choosing element X can be determined by looking at the value of some previous tokens in the statement. Another example would be the ability to ensure that the second numeric value specified in a range statement like "between X and Y" is larger than the first. Context sensitivity allows the parser to obtain the value for X in order to validate the value for Y.
  • A set of conventions on how to establish reference -- the entities about which statements are being made.
  • A means for adding business-specific functions; for example, a user routine to compute return on a bond or fuel tank capacity.

As of Q4/1998, the Meta Data Coalition has said that business rules will be the technical subcommittee's next major topic. The initial draft of a specification will be available for review and input by mid-year 1999.

What IT can do

Many of the implementation and maintenance problems IT organizations face are due to the fact that software products are generally designed to offer some range of functionality in an environment characterized by a set of assumptions that are simply not valid. As a result, architects and programmers must figure out processes and techniques for getting around what is not provided. In earlier times, organizations would develop internal applications or utilities to fill these gaps. However, proprietary application development is no longer cost-effective due to the increased rate of change and the growing complexity of solutions that must be "glued" together to make technology work. Consequently, it is important for vendors and users to work together to try to close the gap between the products being sold and customers' needs.

The Meta Data Coalition is a venue where vendors and users can work effectively to better understand the meta data problem space in a timely manner. The MDC actively encourages end-user companies to join and participate. Companies committed to developing an enterprise meta data strategy should consider membership and participation in these efforts. For more information, consult the Meta Data Coalition's Web site at www.mdcinfo.com.