In-Depth
Meta data conundrum carries on
- By Richard Adhikari
- September 30, 2001
A new approach to managing meta data currently in the works will free CIOs from having to cope with constantly changing technological standards. The Needham, Mass.-based Object Management Group (OMG) is leading work on this approach with the support of major software vendors, including IBM, Sun Microsystems and Oracle. The new approach consists of a model-driven architecture that will transcend individual technologies so it will be valid even when underlying technologies change.
To create the new architecture, its builders have expanded the meaning of meta data. They are also leveraging the object-oriented paradigm so that corporations can create reusable meta data objects from data about their existing resources—including computer languages, file systems and software architecture—by wrapping the data with Extensible Markup Language (XML) descriptions.
The new architecture will be the sole enterprise meta data management standard available because the OMG merged last year with another standards body, the Meta Data Coalition, whose efforts were built around Microsoft's Open Information Model (OIM), and OMG is merging OIM-based work with its own.
But XML is fragmented, and pulling together cross-industry definitions is a Herculean task. Users do not care how elaborate the standard is; all they want to know is whether it will help them do their jobs better.
The existing definition of meta data—that it is data about data—is dated, said Sridhar Iyengar, chief software architect at Blue Bell, Pa.-based Unisys Global Industries, one of four Unisys Fellows and a prime mover and shaker in the meta data management arena. According to Iyengar, meta data "is the missing link between data and meaningful information" and, as such, it can be a wrapper around anything as long as it "describes the essence of what data means, how it is used, how it is extracted and how it is manipulated."
Iyengar said that meta data is now defined as "any kind of information that is used in the development, design, deployment and management of a computing infrastructure." It also contains "descriptions of business, data, warehouses and other things."
This expanded definition of meta data will let developers use meta data as a tool for integrating computer systems, said Stephen Brodsky, software architect and development manager at IBM's Application Integration and Middleware (AIM) division in San Jose, Calif. While it does not matter which language the meta data is expressed in, XML has become the language of choice because it is a good means of transmitting data between different systems.
One example of leveraging XML meta data wrappers is an application created by custom software developers Interface Technologies Inc., Raleigh, N.C., for a Boston-based client. The client, startup firm Virtual Access Networks, wanted an application that would help end users migrate personalization data such as bookmarks and address books from their existing desktop systems to new ones during upgrades. Normally, such data is not migrated during hardware or operating system upgrades.
Interface developed a client/server application that created XML meta data wrappers around end users' personalization data so they could upload and store the data with its wrappers on Virtual Access Networks' servers. The end users could then download the data once their new desktops or operating systems had been installed.
Wrapping a data object with a description of that data in XML allowed Interface to "make a self-describing package that can be sent back and stored, and can be easily translated into any other format like HTML and WAP" because the meta data description gives developers the context to know what they are manipulating, said Interface president Kelly Campbell. So, Virtual Access Networks can provide users access to their meta data on its servers over any client, including cell phones. "The data itself is well represented so all they have to worry about is the presentation layer," Campbell said.
The OO paradigm reapplied
There are many kinds of meta data from various domains—computer resources, languages, file systems and databases, as well as software architecture like Enterprise JavaBeans (EJBs) and messaging. To integrate systems, developers need an architecture to consistently describe these different types of meta data, IBM's Brodsky said.
That is where object-orientation comes in. UML is a common modeling language for application development and it is an OMG standard. It was extended to create an object-oriented meta data standard, Meta Objects Facility (MOF). MOF consists of the core OO model from the Unisys repository, Urep, which was integrated with UML. MOF lets developers build meta data for the various domains in a consistent, object-oriented fashion, Brodsky said. IBM and other major vendors are working with the OMG to standardize these key meta data domains by creating models of the type of information to be obtained from the domains.
Model-driven architecture
The OMG's new approach will take meta data management to the next level. Called the Model-Driven Architecture (MDA), it is being built around MOF and UML.
"Models and meta data are at the core of this new architecture, not RPC-based architectures like Java or SOAP," said Unisys' Iyengar, who is the primary architect of MOF. That will make MDA middleware-neutral, because "instead of mapping middleware to small concepts like interfaces, you use UML to model interfaces, relationships and business rules, so you can work at a higher level of abstraction," Iyengar said.
CIOs will be insulated from changes in technology. "The problem with our industry is that the technology changes so fast that you spend more time changing to newer technology than getting your work done," said OMG chairman Richard Soley. The modeling approach "lets you bridge between technologies because, if you start from a common model, translating between technologies gets easier and that means meta data is stored in one way for all your applications while it may be expressed in different applications differently." For example, Soley said, meta data could be stored as Java code in one application and as a Sybase database schema in another.
MDA will provide best practices and industry-standard models, standards and meta data formats, and as new technologies emerge, the OMG will provide mappings to them in MDA, Unisys' Iyengar said. MDA will be mapped to CORBA, Enterprise JavaBeans, DCOM, SOAP, the OMG's Common Warehouse Metamodel (CWM), Microsoft .NET and other standards. SOAP, which has been jointly agreed to by major vendors, including IBM and Microsoft, is described as "the middleware in the XML standard" by Mike Blechar, vice president, Internet and e-business area, Gartner Inc., Stamford, Conn.
The OMG is expected to make MDA its formal architecture in the third quarter of this year, and MDA "will be OMG's architecture for the next 10 years, and includes its previous architecture, OMA," Iyengar said.
A unified model
Until August 2000, there were two meta data management standards: the OMG's Common Warehouse Metamodel, and the Meta Data Coalition's Object Information Model (OIM), which was created by Microsoft. At the end of August, the OMG and Meta Data Coalition merged, and OIM is being subsumed by CWM.
CWM is built on UML, XML and XMI. It establishes a common meta model (a model about models) for warehousing and also standardizes the syntax and semantics needed for import, export and other dynamic data warehousing operations. CWM supports data mining, transformation, OLAP, information visualization and other end user processes. Its specifications include application programming interfaces (APIs), interchange formats and services that support the entire life cycle of meta data management, including extraction, transformation, transportation, loading, integration and analysis.
Standards spaghetti
A whole slew of standards revolve around MDA. There is MOF, which lies at the core of MDA. Then there is the Java Meta data Interface (JMI), a mapping from MOF to Java. Because MOF is an abstract model, there are also mappings from MOF to XML and Interface Definition Language (IDL), an OMG standard for any CORBA environment, similar to Microsoft's and DCE's IDLs. MOF is an extension of UML; it ties in with CWM, which includes OIM. MOF will be integrated with J2EE, and a New York-based company called MetaMatrix is "the first company that said they will do this integration," Iyengar said. JMI will be integrated with J2EE.
Gartner's Blechar provides an overview: Evolving standards for communications and interoperability such as XML and J2EE components, Microsoft's SOAP, COM and .NET are beginning to converge "because of the need for companies doing B2B or B2C or having partnerships in the supply chain and value chain to communicate with their partners or suppliers."
For middleware interoperability, the de facto standard is XML, inside which is SOAP, which "is being jointly agreed to by major vendors like IBM and Microsoft as a means of passing objects back and forth," Blechar said. Beneath that, there will be the Universal Description, Discovery and Integration (UDDI) standard. This will be a sort of Yellow Pages in which companies can list their services and contact information. The services will be defined in Web Services Description Language (WSDL), an XML-based language that defines Web services and describes how to access them.
Meanwhile, the market is being divided into Microsoft and non-Microsoft camps with J2EE and XML being outside of both sides, Blechar said. Business partners will communicate through XML and within their organizations they will have to build components either as .NET or J2EE/CORBA components, he said.
XML: Wishful thinking or reality?
While XML is an excellent means for exchanging information between systems, some industry experts have doubts about whether it can provide the backbone for standardization efforts of the magnitude that OMG is eyeing. There are "hundreds" of XML protocols, according to Razmik Abnous, chief architect and technology vice president at content management software vendor Documentum Corp., Pleasanton, Calif. These protocols need to converge to the point where there is "one protocol in a vertical industry for a specific domain of expertise" before XML can pay off as a common meta data language, Abnous said.
That standardization of business definitions and naming conventions sounds easier than it is, according to Art D'Silva, manager, data warehouse planning and integration at the Royal Bank Financial Group, Waterloo, Ontario, which is Canada's largest banking institution with approximately $230 billion in assets. "You have to figure out which businesses you are working on and how you map the definitions to different businesses," he said.
Data definitions can differ even within a business. For example, different groups within a bank will look at the same data, such as interest amounts, from different perspectives and know the data by different names, D'Silva said, so corporations have to make sure each business unit understands the information in the proper context. Wrapping meta data around the data to give it context is easier said than done because "you have a variety of contexts and I'm not sure anyone has stepped up to doing that just yet," D'Silva said.
For meta data capture and management, Royal Bank Financial Group uses the ETI.Extract tool suite from Evolutionary Technologies International, Austin, Texas. The company has the standard corporate mix of mainframes, mid-range boxes and Windows NT boxes. Meta data captured from legacy applications consists both of definitions of data as well as technical constructs of what information exists. This meta data is then stored in Islandia, N.Y.-based Computer Associates' Platinum Repository.
The Royal Bank Financial Group's legacy applications include "a lot of fairly complex file structures that are hierarchical structures in a flat MVS environment," said D'Silva, as well as files in an NCR Teradata environment. ETI.Extract lets users extract, transform, consolidate and load data from incompatible data sources, generating conversions that automate data transformation, reorganization and integration. It also provides meta data management capabilities that let users document, track and manage progress as they develop repeatable processes to consolidate data.
Documentum's Abnous agreed that it will be difficult to enforce a set of all-encompassing XML standards. Instead, he sees a Unix-type compromise where there is one core definition in XML on which everyone agrees, with each vendor building its own specialized flavor of XML on top of that.
There is also the issue of performance. Addressing a lot of entities within XML does not allow for a high-performance repository, and, in the short term, there will not be high-scale, high-performance XML repositories, Abnous said. Instead, relational databases will continue to be a source of meta data because "we've spent years perfecting performance of meta data coming out of relational database engines."
Even if an all-encompassing XML standard could be established, it would have to be dynamic because "the requirements of what's needed and the information that would be included in meta data today may not be the same tomorrow," said Charles Meyers, vice president of technology at Rockville, Md.-based Computer Technology Associates Inc., a company that provides Internet solutions and IT services to the private and public sectors nationwide. Also, the standards must be highly extensible so users can capture new meta data without having to revise them.
Pragmatism drives users
As a solutions provider, Computer Technology Associates employs a classic user's viewpoint. "Our focus and approach to projects is situational—it's based on users' requirements," said Meyers. "You can have all things to all users but it starts becoming overwhelming and then you have recursive layers of management—meta data to manage meta data to manage meta data." If a corporation has an enterprise-level solution in mind, it can create a meta data layer across all the corporate applications; if it is a project-oriented solution, an application that tracks and captures statistical information for a historical view may be all that is necessary.
"There is no silver bullet—no ultimate meta data interface," Meyers said. XML is one standard, but only one, for meta data. Meyers said that the OMG's MDA will only be adopted if it provides increased efficiency to users. "Just because the definition has been designed doesn't necessarily mean it will be used," he said.
As the OMG and vendors move ahead with plans for XML and the MDA, they would do well to remember that users will buy what is useful, and, especially in these days of tight budgets, will not be blinded by technoglitter.
Nine tips for managing meta data
Christine Mandracchia, principal consultant at KPIUSA, a Flanders, N.J.-based consultancy focusing on data administration, business rules and other areas, co-developed a meta data framework with one of her colleagues some years ago. She says IT managers have to bear the following points in mind as they manage meta data:
- Whatever you have to do with the data, you also have to do with the meta data. "When you model and analyze data, you also have to model and analyze meta data, as well as build meta data stores and figure out how you get it into and out of the stores," Mandracchia said.
- Therefore, it takes as much effort to work with meta data as it does to work with the data itself.
- Prioritize the meta data as you would prioritize the data, and scope the meta data. "You may not need meta data about every part of your system, but only for the more mission-critical and volatile systems," she said. "Otherwise, there's too much work involved. Figure out from which systems you can get an IT and business benefit."
Una Kearns, XML architect at Pleasanton, Calif.-based Documentum and a member of the board of directors of the Organization for the Advancement of Structured Information Standards (OASIS), said corporations need to establish correct definitions for information being managed so they can reuse it across the organization. According to Kearns, to do this, corporations must:
- Understand what type of information is being managed in the organization.
- See how that information is used across the organization.
- Establish a steering committee across different departments and business units within the enterprise to help understand the type of data being managed and how it is used.
- Get together with different customers and partners in its supply chain if it is looking at vertical standardization.
- Provide effective ways of capturing the information correctly and entering it into its system once it has defined the data. This task includes making provisions for automatically checking new information entered into data dictionaries and updating data dictionaries.
- Maintain multiple data dictionaries.
—Richard Adhikari |