In-Depth
Wake up and smell the meta data
- By Katherine Hammer
- June 7, 2001
Several months ago, I was talking to one of the leading analysts
in the area of Enterprise Application Integration (EAI) about meta data. His comment? "Americans don't get
meta data. Europeans do, but Americans don't. Say the word meta data, and they fall asleep."
I am not so sure that this is true nowadays, as large corporations like IBM, Oracle and Microsoft have all launched
major meta data initiatives in the last 18 months. It might be more accurate to say that most IT customers have
a fair amount of skepticism about meta data solutions, and with good reason -- most of the data dictionary and
repository efforts to date have failed to take hold.
If one had to choose a single reason why repository-centric architectures have not succeeded in larger organizations,
it would be because vendors assumed that too much could be achieved through analysis. Because the state of most
operational databases reside on legacy systems, it is almost impossible to reasonably analyze the meta data in
place at a large organization. Among the difficulties is the fact that:
- These systems are frequently extremely complex. As a vendor of products in the data integration management
space, we have run into customers with 400-page IDMS schemas and 100,000-line COBOL copybooks.
- The meta data required is often distributed across multiple environments. In most production IT shops, it is
not possible to rebuild historical data when an application requires new data -- either because the organization
does not have the necessary additional data or there is simply not enough downtime. As a result, DBAs frequently
deploy legacy DBMS products in such a way as to reduce the necessity of rebuilding the database. In the IMS world,
this is frequently done using the Data Base Definition language to define data only at the SEGMENT level; COBOL
or PL/1 copybooks are then used to flesh out the details of each record. The REDEFINES option in COBOL lets people
use the same space in a COBOL record in multiple ways. As a result, to understand what a legacy database really
looks like may require collating information from the DBMS data definition language, one or more file descriptions
and the application code.
- Operational schemas change with some regularity -- if not with respect to the layout itself, then with respect
to the legal values that can be stored in a particular field.
Given these facts, is it any wonder that strategic initiatives listing their initial deliverable as an enterprise
data model were often canceled before any implementation began? Or that no repository company or major repository
initiative (such as A/D Cycle) has been wildly successful?
The importance of a strategy
Despite the failure of many of these earlier efforts, an enterprise meta data strategy is becoming a critical
factor if one is to cut the growing cost of interfacing. For example, in its report on Enterprise Application Integration,
Minneapolis-based investment banking and institutional brokerage firm Dain Rauscher Wessels estimates that organizations
spent $85 billion writing and maintaining interfaces by hand in 1998. The company believes this figure will increase
by 30% in 1999.
But why is the data integration management problem becoming so visible today? Two primary causes are at work:
1) The rush to use the Web to better serve customers and to maximize the efficiency of the supply chain; and 2)
The volatility of today's business environment, which is characterized by the deregulation of industries and merger/acquisition
mania. If companies do not keep meta data audit trails of how they have implemented interfaces, they run the risk
of imploding under the weight of being so connected.
We have already determined that it is unreasonable to attack the meta data problem by analysis. A better approach
would be to adopt a strategy in which information is acquired and shared over time. Companies should therefore
look for products that keep a meta data audit trail describing what the user has done; these products should also
be able to exchange meta data with other related products in the IT infrastructure. However, this will benefit
an organization only if there is a methodology and/ or a set of processes that enforces the sharing of this information.
Difficulties remain
When dealing in the abstract, it is easy to envision an architecture that shares a common set of APIs against
a common meta model, or that utilizes an active repository that automates the percolation of related changes throughout
the organization. These are interesting concepts; but to be successful, it is important to recognize that even
with standards and automated meta data exchange between products, meta data management will remain difficult for
a number of reasons.
IT WILL REMAIN COMPLEX. Legacy databases with their "secret" schemas
are not the only complex definitions one will encounter. The schema for an enterprise resource planning manufacturing
module may utilize anywhere from 3,000 to 5,000 table definitions. Moreover, a new release of an application may
not be compatible with a previous release. As a result, as with legacy systems, versioning may be required to represent
differences in the business rules applied.
IT WILL REMAIN HETEROGENEOUS AND DISTRIBUTED Most tools and applications will
require
the meta data needed for execution to be stored in its own repository. Legacy systems will remain in use for the
foreseeable future.
THE DEFINITION OF META DATA WILL BE OWNED BY DIFFERENT
ORGANIZATIONS. The concept of a data steward or data architect, or
even an organization in charge of providing interface services, is a good way
to enforce an enterprise's meta data strategy. However, it is not reasonable
to expect an IT organization maintaining a mission-critical application to "ask
permission" about whether or how they represent an application's key information.
Rather, data stewards would be valued more if they were a resource to answer
questions such as "Does anyone else use a look-up table that relates X
to Y?," and/or seen as keepers of what the division actually does.
DIFFERENT VENDORS WILL "MISSION" PRIVATE
META DATA THAT IS KEY TO THEIR VALUE PROPOSITION. For any standard
to work, there must be some way for this information to be maintained as related
common objects and relationships are exchanged and modified.
Because of these complexities, it seems a file-based mechanism for exchanging meta data between products is
more likely to succeed -- at least for the short term -- than a more elegant but simplistic meta model. In other
words, it is more important that products know how to map their internal meta model to an interchange format representing
the appropriate entities and relationships than to insist that they share a common meta model.
The Meta Data Coalition
To arrive at a short-term solution, the key is to get a sufficient number of vendors to agree on and support
some interchange format. The Meta Data Coalition (MDC) may be just the vehicle needed to accomplish this task.
Founded in 1995, the purpose of the MDC is to develop and provide a tactical means of standardized meta data exchange.
In 1996, the group introduced the Meta Data Interchange Specification (MDIS), which "provides a metamodel
that addresses the main types of commonly shared metadata objects and a standard import/export mechanism that enables
the exchange of these metadata objects between tools."
The MDC initially received support (albeit with a little skepticism) from a number of industry analysts. Over
the past two years, however, the attitude has been predominantly cynical -- and with good reason. While the MDC
has always been able to attract and keep a membership of between 50 and 60 vendors, only seven have actually implemented
support for MDIS. The other vendors -- some of which participated on the technical subcommittee -- have preferred
to take a "wait-and-see" approach.
The addition of Microsoft to the MDC's roll call -- and the reciprocal membership established between the Coalition
and the Object Management Group (OMG) -- has changed things. The MDC now appears to be gaining sufficient critical
mass to achieve its goal of defining a tactical (but technically sound) meta model and exchange format.
Recently, the MDC completed a technical review of the MDC-OIM, a technology-independent and vendor-neutral information
model describing the structure and semantics of meta data. The MDC-OIM is based on the Microsoft Open Information
Model, a meta data model and specification that is part of Microsoft Repository, a meta data management product.
Note that in keeping with the MDC's initial charter, members do not expect the meta model to be static or sacrosanct.
Changes will be required as the concept of what vendors and users consider to be key meta data evolves. What is
important is not that the standard changes, but that the vendors associated with the MDC are committed to supporting
the standard.
Business rules: The missing link
When discussing the meta data "wars," the irony is that the areas of technical disagreement to date
are relatively small with respect to what must be represented. However, there is one very important aspect of meta
data that has been ignored -- the specification of business rules. While the GUIDE initiative has spent considerable
effort studying this area, the group has focused more on the semantic definition and full range of what can be
considered a business rule (such as "employees cannot smoke in the building."). On the other hand, the
MDC's concern in this area is to define what would constitute a user's language for specifying the kind of test
and transformation logic most often used in interfacing applications. Note that this can range from simple math
to complex conditional logic.
The problem today is that most software vendors require technical users to specify this kind of logic, whether
Visual Basic, COBOL, SQL, C or some proprietary language (except in the case of products that only support the
specification of business rules as documentation). However, there are several drawbacks to this situation:
- Business users are frequently best suited to specify this type of information.
- Business users cannot readily understand business rules specified in one
of the technical languages outlined above.
- Because technical users can employ different techniques for referencing objects, it is usually very hard for
the specification of a business rule in one product to be understood by another.
Given that 40% to 80% of the data values moved in interface programs are transformed, it is critical that any
meta data interchange mechanism provide a standard means of expressing this kind of meta data that is also suitable
for end users. Such a mechanism would need to define a subset of natural language that is sufficiently constrained
to easily write parsers for, and yet sufficiently rich enough to
allow for the specification of arbitrarily complex logic.
In addition, such a language would need:
- 3Context sensitivity. This would provide, for example, a means of indicating that
the legality of choosing element X can be determined by looking at the value of some previous tokens in the statement.
Another example would be the ability to ensure that the second numeric value specified in a range statement like
"between X and Y" is larger than the first. Context sensitivity allows the parser to obtain the value
for X in order to validate the value for Y.
- A set of conventions on how to establish reference -- the entities about which statements are being made.
- A means for adding business-specific functions; for example, a user routine to compute return on a bond or
fuel tank capacity.
As of Q4/1998, the Meta Data Coalition has said that business rules will be the technical subcommittee's next
major topic. The initial draft of a specification will be available for review and input by mid-year 1999.
What IT can do
Many of the implementation and maintenance problems IT organizations face are due to the fact that software
products are generally designed to offer some range of functionality in an environment characterized by a set of
assumptions that are simply not valid. As a result, architects and programmers must figure out processes and techniques
for getting around what is not provided. In earlier times, organizations would develop internal applications or
utilities to fill these gaps. However, proprietary application development is no longer cost-effective due to the
increased rate of change and the growing complexity of solutions that must be "glued" together to make
technology work. Consequently, it is important for vendors and users to work together to try to close the gap between
the products being sold and customers' needs.
The Meta Data Coalition is a venue where vendors and users can work effectively to better understand the meta
data problem space in a timely manner. The MDC actively encourages end-user companies to join and participate.
Companies committed to developing an enterprise meta data strategy should consider membership and participation
in these efforts. For more information, consult the Meta Data Coalition's Web site at www.mdcinfo.com.