In-Depth
Integrating Components in the Real World
- By John D. Williams
- September 1, 2001
Integrating components at the level of the language the API presents is usually a straightforward task. Integrating at higher levels of granularity can require several different approaches because you must typically deal with services, subsystems or whole systems. Integration becomes even more problematic when components are distributed throughout an environment. The last article in these pages on integrating homegrown and purchased components ("Can components solve integration conundrum?,"
ADT, July 2001, p. 27), examined several issues involved in integrating different types of components and stressed the importance of managing components. This article explores component integration issues that focus on different levels of granularity and the impact of distributed computing.
API
Integrating components at the API level is about as simple as component integration gets: include a component in a system, instantiate an instance of the component and then call its methods as defined in its interface. This is just basic programming. If this were all there was to component integration, this article would be very short. However, life for developers is rarely so simple. The process becomes much harder when all of the components do not reside in the same environment. Such issues led to the origin of technologies such as CORBA, DCOM and RMI, and are essentially the API answer to distributed components. And while these technologies can help developers, they are not so simple to use.
Top developers do say that such distributed API-level technologies can be a real boon in building distributed systems. In the past, they were often the only alternative for distributed development. Each has been used to build robust distributed enterprise systems. However, these technologies can be difficult for developers to use and often require an additional infrastructure beyond the normal language deployment environment.
For example, CORBA uses Object Request Brokers (ORBs) to house distributed objects, and also provides a host of services that help programs find and manage what they need in the distributed environment. Not surprisingly, the added capabilities come with a price: increased complexity for developers and those setting up the operational environment.
These technologies also have limits. For example, many companies refuse to create the additional ports needed to use DCOM or CORBA through a firewall for security reasons, which means these approaches will not be viable candidates for integrating components across the Internet. For component integration, this type of technology is often best deployed behind a firewall. It can provide a powerful infrastructure for internal distributed computing. However, distributed API-level technologies often require more work on the part of developers and may not be appropriate in some technology environments.
Frameworks
In many ways, a framework is like an API. Developers can use different procedures to expand and use frameworks. Experts dubbed many of the early systemswhich make source code available to the application developerwhite-box frameworks. White-box frameworks are typically expanded by subclassing existing abstract classes, a method that requires developers to understand the internal workings of the abstract parent classes. The best way to understand the internal workings of a class is to look at its source code.
Other frameworks are constructed as so-called black boxes, and are customized through parameterized interfaces. Black-box frameworks are more like components, because they are accessed only through the interfaces and provide developers with no insight into the internal workings of the frameworks. Some developers view the issue of white- vs. black-box frameworks as a choice of good vs. bad, but it is not. Actually, the issue represents a range of choices for how developers can interact with a framework. White-box frameworks are easier to customize, so they represent a reasonable choice for relatively new frameworks that have not stabilized completely. Black-box frameworks need more stability, because developers need to anticipate all usages in the available interfaces. At the same time, black-box frameworks reflect the typical interface one would expect with components. Such frameworks are closer in approach to component development.
Framework interfaces usually work at the component API level. Typically, no additional underlying infrastructure is required to use them. Nonetheless, there are a couple of issues for developers.
There are two basic ways of interfacing to a framework. The first calls for the framework designer to create hotspots, which are typically abstract classes or templates into which developers can plug their own code. The framework designer defines the interface, and the developer adds the capability. The second method of interfacing to a framework allows the developer to utilize hooks. Typically, hooks are virtual methods in classes that developers are expected to override with their own capabilities. Abstract classes can also be used as hooks.
Treating a framework as a component and integrating non-related frameworks presents some significant challenges. For instance, a developer may need to use custom "glue" code to fill in a gap in the functionality of frameworks. At other times, frameworks can overlap, requiring developers to create code that can manage the overlap and ensure that each framework is updated appropriately. Integrating legacy frameworks can also present problems to development organizations. Developers often must create adaptors that can integrate both legacy and new components.
Finally, there is the issue of inversion of control. Inversion of control means the framework, not the application, controls the flow of processing. This can present a challenge when integrating unrelated frameworks, as developers must still determine which framework will control the flow of processing. Inversion of control integration is easier on developers if the source code for both frameworks is available, because it may then be possible to integrate the control loops. Otherwise, the developer may need to write synchronization code for the frameworks.
SOAP
The Simple Object Access Protocol, or SOAP, is the new kid on the block when it comes to integration. The SOAP specifications were initiated by Microsoft Corp. to create a means of sending transient XML documents to remote hosts to invoke operations. In response, the remote host would take some action and perhaps return a value to the invoking object. Responsibility for the SOAP specifications has since been assumed by the World Wide Web Consortium (W3C).
The initial SOAP goal seemed like a fairly straightforward method of doing Remote Procedure Calls (RPCs). The underlying approach has been aligned with HTTP, so this mechanism can work over the Internet without requiring firewalls to be modified. More technically, according to the W3C in its SOAP 1.1 document, "SOAP is a lightweight protocol for [the] exchange of information in a decentralized, distributed environment. It is an XML-based protocol that consists of three parts: an envelope that defines a framework for describing what is in a message and how to process it, a set of encoding rules for expressing instances of application-defined data types, and a convention for representing remote procedure calls and responses. SOAP can potentially be used in combination with a variety of other protocols; however, the only bindings defined in this document describe how to use SOAP in combination with HTTP and HTTP Extension Framework."
Ideally, SOAP can work with some form of meta data to provide added value over a simple wire protocol. For example, meta data could allow Web-based service providers to describe their services to potential clients. One example is the Web Services Description Language (WSDL), a new specification created to describe networked XML-based services. WSDL is a part of the effort of the Universal Description, Discovery and Integration (UDDI) initiative to provide directories and descriptions of such online services.
SOAP is not yet widely deployed and, as a relatively young technology, comes with its own set of challenges. For example, the specification does not address bi-directional communication. There are also concerns about security with SOAP. The protocol is more verbose than binary integration approaches like DCOM and CORBA, which often means that technologies such as DCOM and CORBA will be preferred within a firewall, while SOAP is a better candidate for use across firewalls. Still, with the move to component-based Web services, SOAP will likely have a strong role to play in component integration.
EAI
Developers must look at other options when components are whole systems, and when a simple API integration plan may not work. Clearly, such components make the integration task more complex. Tools like SOAP can still be part of the solution, but they cannot address all of the integration issues. Instead, developers must look at Enterprise Application Integration (EAI) technologies to gain the requirements for integrating such components with others.
EAI has been a buzzword for a few years now, but its definition is often unclear. According to Ovum Ltd., a London-based consulting firm, "EAI solutions are software products that completely or partially automate the process of enabling custom-built and/or packaged business applications to exchange business-level information in formats and contexts that each understands."
Such a technology certainly can help integrate system-level components. However, is that all there is to EAI? According to the Edina, Minn.-based IT investment banking firm of Cherry Tree and Co., "It is essential to keep in mind that it is the chaining together of discrete transactions, in the form of a business process, from one application to the next that constitutes EAI." This statement adds what we typically think of as a workflow component to the integration mix. But how does this all fit together?
Figure 1: An EAI framework
EAI not only helps integrate system-level components, it also "chains together discrete transactions in the form of a business process" or workflow.
For a variety of reasons, corporate developers can easily become confused about what constitutes middleware and what makes up EAI. Figure 1 shows a framework for understanding EAI and its different components. At the lowest level is the Transportation Layer. This is typically where you find middleware tools and what most people think of as EAI solutions. Figure 2 shows Gartner Group's five communication models applied to the Transportation LayerConversational, Request/Reply, Message Passing, Messaging and Queuing, and Publish and Subscribe. These models describe direct communication mechanisms between multiple systems. Any model can be used to integrate components and systems. In fact, the messaging models can be used in conjunction with the others. It is important to note these communication models can be used as higher levels of the EAI framework are adopted.
Figure 2: Five communication models
Gartner's five communication modelsshown here applied to the Transportation Layerdescribe direct communication mechanisms between multiple systems.
The next layerthe Data Transformation Layeris shown in Figure 3. The Data Transformation Layer is said to cover the range of tools typically used to extract information from one database, transform it in some fashion and put it in a new database. Data Transformation tools are often used in data warehouse environments, and were widely used in Year 2000 projects. Clearly, such tools are not the first ones that come to mind for interfacing one component to another. But some experts maintain they are appropriate for transferring large chunks of information from one system to another.
Figure 3: Data transformation
The Data Transformation Layer is said to cover the range of tools typically used to extract information from one database, transform it in some fashion and put it in a new database.
The Business Rules Layer is the realm of what is now known as the traditional EAI tool (see Fig. 4). In this layer, one system is said to communicate with others based on business rules defined for the communication. In addition to the business rules, communication typically occurs through one of the models defined in the Transportation Layer. The Publish and Subscribe integration model is most frequently touted for this layer. It is often used in conjunction with one of the Messaging communication models. The Transportation Layer is touted for its power to simplify the number and complexity of interfaces needed to communicate between systems. Experts say this approach to integration can lead to significant cost savings.
Figure 4: Business rules
In the Business Rules layer, one system is said to communicate with others based on business rules defined for the communication.
The highest level is the Business Processes Layer (see Fig. 5). While previous layers defined mechanisms for communicating between one system and another, the Business Process Layer shows a component monitoring and managing the flow of integrated information across multiple systems or components. Such capabilities are becoming a critical requirement for operating distributed systems and components. Ensuring the flow of information across a whole process, not just between individual systems, is often important. Tools in this layer can provide this capability to make distributed systems more manageable, experts say.
Figure 5: Business process
The Business Process Layer shows a component monitoring and managing the flow of integrated information across multiple systems of components.
The view of systems as components typically offers three places where integration can occur. Assuming a three-tier architecture, integration can happen at the User Interface Level, the Business Logic Level or the Data Management Level. When integrating at the User Interface, developers will typically use tools from the Transportation and Business Rules framework layers. When integrating at the Business Logic level, developers will typically use tools from the Transportation, Business Rules and Business Process framework layers. Finally, to integrate at the Data Management level, developers use tools from the Data Transformation and Business Process layers.
And cutting across all the framework layers is meta data, or information about information. Meta data can also be a powerful tool for integrating components. Models capture information about what is exchanged between components and document all of the details of exchanged information. They can also provide the semantics that allow systems to work together. The meta data captured by the models can then support the mechanism for exchange. This can help to shape the way information is exchanged, as well as provide common semantics for components. Such an exchange can be more than just data transfer. It can also include an understanding about the data being transferred. This approach to using meta data for exchange supports the dynamic integration of components.
The eXtensible Markup Language, or XML, has burst onto the scene as a significant technology for meta data exchange projects. XML can provide a mechanism for the dynamic interchange of meaningful information. In particular, specific components of XML can have tremendous value in system integration projects. The most useful components are Document Type Definitions (DTDs), XML Schema (and related variations), eXtensible Stylesheet Language Transformation (XSLT) and XML Metadata Interchange (XMI). DTDs and XML Schema define the structure of a document or information interchange. DTDs are the standard today, but they will quickly be replaced by Schema now that a standard is in place. DTDs do not use XML syntax and have some important limitations. For example, DTDs do not support the automatic validation of values. On the other hand, XML Schema does support the automatic validation of values. XML Schema also has the ability to define recurring blocks of elements or attributes once. XSLT allows users to transform one document type into another. XMI is the XML Meta data Interchange format from the Object Management Group (OMG), a Needham, Mass.-based standards consortium. XMI is currently an OMG proposal for the open interchange of application components and assets. The combination of these capabilities can create a powerful approach to using meta data to enhance component integration.
New approaches required
Clearly, component integration is not a simple issue. As granularity increases, different solutions for integrating the components become necessary. Simple APIs can work with the smallest components. However, particularly in a complex distributed environment, component integration requires new approaches. Tools like SOAP can provide a basic RPC and information exchange mechanism in the Internet environment. One of SOAP's strengths is its ability to pass through firewalls using the standard HTTP port. It requires much less setup than typical distributed component mechanisms. But SOAP does have its limits.
For integrating higher level components, such as systems, additional approaches to integration are useful. The area of EAI provides a framework for five levels of integration. These levels reach from the lowest type of integration to the management of overall process integration. Often, these levels can be used in conjunction with one another. Which integration mechanism is best? That depends on the application and the environment. However, there is a wide range of possibilities from which to choose. Developers are no longer forced to shoehorn every interface into one type of mechanism. Remember, though, one key to building robust component-based systems is to match the right type of interface to the need.