Unifying Data, Documents and Processes
Disparate systems make business process automation inefficient. Semantic integration of structured, unstructured and process data simplifies implementation of a robust SOA model.
- By Boris Lublinsky
- July 1, 2004
The three things that define any enterprise are its data, documents, and processes. Data is perceived to be the main asset of every enterprise. In the last 30 years, IT has captured terabytes of data, defining both the historical and current state of the enterprise. All of today's enterprise IT systems are geared toward entering, processing, and storing data.
Documents are legal entities that define the obligations of the enterprise and its partners at a certain point in time. Documents such as financial reports, insurance policies and claims, and government regulations typically contain data, but they are usually entered, processed, and stored in completely different systems.
Business processes define an enterprise's operations and constitute its most critical assets. Processes describe how to tie together existing applications and manual activities into a coherent enterprisewide execution.
Despite the similarities between data and documents, it is a common practice today to have completely different systems and business processes working with data and documents processing, which leads to the following issues facing today's enterprises:
- Infrastructure is duplicated; there are often two parallel middleware infrastructures, one for data and one for the documents.
- The links between data and documents containing data are often nonexistent, with the result that it is not easy to realign data with the documents.
- Enterprise business processes, which need to incorporate both data and document processing, become extremely difficult to implement.
Two major developments in today's software architecture point the way toward resolving these issues. First, services-oriented architecture (SOA) embraces business process as one of its major components. It also separates services' internal data representation from their exposed interfaces, thus allowing for the introduction of enterprise semantics with minimal impact on applications. Second, the introduction of XML as a universal data/documents representation allows for a unified model of documents and data.
Taking advantage of these technology trends, organizations can finally realize a strategy that begins to unify data, documents, and business processes into a single harmonious execution.
The Process-Centric Enterprise
SOA is an architectural style that promotes the concept of business process orchestration of enterprise-level business services. Figure 1 illustrates the relationship among the three major elements of SOA: services, processes, and organization. SOA models the enterprise as a collection of business services that are accessible across the enterprise. Monolithic stovepipe applications are broken down in favor of self-contained business services aligned with enterprise business artifacts. These services can be invoked using a standard protocol, thus ensuring their availability across the enterprise and beyond.
Business processes orchestrate the execution of enterprise services to fulfill required enterprise business functions such as order processing or claims processing. The organization owns all of the SOA artifacts (services and processes) and governs their creation, usage, access, and maintenance.
This definition is very different from the standard SOA architecture as defined by the W3C (see Resources). Instead, this approach emphasizes that SOA is not purely a new approach to application integration, but a way of rationalizing IT functionality against business artifacts. Thus, it serves as a foundation for a convergence of IT with the business. This said, it should be noted that services themselves are highly integratable and use integration for their implementation.
Also, the definition stresses the enterprise nature of SOA. Applying SOA to specific applications does not ensure services reuse between applications and consequently leads to building of yet another set of stovepipe applications based on the new technology. Only enterprisewide implementation of SOA allows for complete exploitation of SOA power.
Business process should be considered to be one of the main underpinnings of SOA. Although it is technically feasible to access one service from another service, business process provides a way of orchestrating execution of the business services.
The application of an SOA usually leads to employing of an enterprise service bus (ESB) and business process execution engine (BPE). The BPE allows for externalization of the business process implementation by orchestrating service execution. Use of the BPE provides separation of business process definition and execution from services implementation. A service locator provides dynamic discovery of service end points, thus supporting service location transparency (see Figure 2).
In this architecture, the ESB is used only for communications between BPE and services. The services themselves may use other integration mechanisms, such as EAI, for accessing enterprise applications and implementing service functionality (see Figure 3).
This approach assumes an incremental implementation of SOA, in which services are introduced as a separate layer in the overall enterprise architecture. The responsibility of this layer is to rationalize an existing applications portfolio against meaningful business services. This approach allows for a gradual upgrade of the service implementations for eventual transition to a "true" services implementation.
As already noted, business process is one of the cornerstones of SOA and an inherent component of the enterprise business model. As an example in the insurance industry, the whole of the requirements model in IBM's Insurance Application Architecture Solution (see Resources) is based on the definition of the business processes.
Business process has two major roles in the SOA. It drives the definition of services, so that the service functionality and interface are defined to support the business process, and it is used to orchestrate service execution, invoking appropriate services as required.
Business process also sometimes serves as the service implementation mechanism. It is completely feasible to reuse business processes throughout the enterprise, in which case a process will effectively became a business artifact and service implementation. Finally, because business process orchestrates service execution, it controls and implements business transactionsthe basic units of business work.
This view positions business process as a centerpiece of the SOA (see Figure 4). When deciding which parts of the system should be implemented as services and which as processes, consider the expected rate of change. Business services support stable business artifacts, incorporating business processing and rules that change fairly rarely. Business processes support more fluid business processing and rules that may change every few months or weeks. Thus, a process-centric approach supports increased agility of enterprise IT systems.
SOA does not require that all services be codified. Modern business process engines allow for incorporation of business services that are executed manually. They just need to be defined using standard services interfaces. Putting business process in the middle of SOA allows seamless integration of the services execution with the manual processing.
Decoupled Data Models
The SOA implementation introduces two completely different data models: the underlying data models used by the services implementations, and the interface data model exposed by the services. The use of two data models can substantially simplify overall enterprise data strategy since the interface data model, which is used to access the service, can be decoupled completely from the underlying data model used by a service implementation.
Unfortunately, the most prevalent use of SOA technology todayin Web services for automated generation of required implementation and deployment artifactsusually uses the bottom-up approach to Web services creation. For example, services artifacts might be generated based on existing EJBs or even Java classes. These generation tools make it difficult to take full advantage of separation between the interface and the underlying data models.
Considering the number of applications (each with its own database) that exist in the typical company, the number of different representations of the same data (on the interface data level) within the company may become unmanageable. The situation worsens because though the data existing in multiple applications may be the same it has different syntactic representations.
This situation presents several major problems. Tight coupling between the underlying and interface data models leads to very expensive maintenance of existing services. Also, the incorporation of massive data transformation support into the ESB and process engines can lead to either significant changes in the data model of the process or the introduction of additional transformations in the ESB and BPE.
A better approach to the interface data model is from the top down by building a shareable enterprise data semantic. This approach starts with building of a business semantic model of the enterprise by defining standard business objects for a given enterprise. For example, in the IAA, standard definitions are set for elements such as insurance policy and claim.
These objects effectively create an ontology of the enterprise data by defining common concepts (and their content) that describe functions of the enterprise. Using such an enterprise data model for defining the enterprise services interfaces leads to creation of interoperable semantic interface definitions and implementation of a semantic SOA.
The advantage of a semantic SOA is significantly enhanced interoperability between services; all of them work with the same objects on the interface level. This approach also eliminates the necessity of mediating messages at the ESB and BPE levels since all communications now support the same semantic model. Also, this approach introduces decoupling between the enterprise data model and the encapsulated database model of any particular service. This decoupling simplifies management and upgrade of existing services, allowing changing of the underlying databases with minimal impact on the enterprise communications model.
Of course, this architecture does not come for free. If the service acts as a faade to a legacy application or there is any mismatch between the enterprise model and service database, then data transformation is required. This transformation has to be done by the service implementation itself, adding an additional level of complexity (see Figure 5). In such cases, the data transformations can occur at the level of the service faade (transformation of the interface data) and/or the level of the legacy adapters (transformation of the legacy data).
Based on the data semantic introduced previously, it is now possible to express the service's interfaces in the form of XML data documents containing semantic data. In this case, the service method's interfaces will be massively polymorphic and expressed as:
Service.method (XML in, XML out)
The service method's interface never changes, but the input and output document will change to reflect parameter changes. Because the underlying plumbing always deals with the same signature, it will not be impacted by the parameter changes, and it is up to service implementations themselves to be sufficiently resilient to the message change.
The Grand Unification
The majority of today's enterprises store digitized versions of their documents in native document formats (PDF, JPEG, and others) in dedicated document management systems. These documents usually have no "semantic linkage" either between themselves or with enterprise data objects, which significantly complicates incorporation of document processing in the SOA architecture. Preparing documents for inclusion in "unified" data processing involves several steps.
First, the document contents must be unlocked from proprietary formats. Some formats, such as Microsoft Office 2003, support direct mapping of documents into XML. This support is a great choice for creating new documents because it allows processing of data and documents in the same interchangeable format. Unfortunately, the bulk of existing documents are in "legacy" formats. A prevalent approach to XMLizing of these documents is creating XML-based document metadata, usually a document itself either in the form of an attachment or a pointer to the document location.
Second, document taxonomies must be created to classify documents according to the metadata introduced during the previous step and aligned with the enterprise semantic models. Finally, links between documents and between documents and enterprise data supporting the documents must be created. Although these steps are complex and labor intensive, they must be performed for documents to be included in the SOA. These activities are also important if the enterprise wants to achieve any kind of useful enterprise content management.
Based on such work, enterprise data semantics can be enhanced to include enterprise documents. In the simplest case, there is a one-to-one correspondence between business objects and documents supporting these objects. For example, in the insurance industry, a policy data object is associated directly with an insurance policy document. In more complex situations, relationships can be one to many, many to one, or many to many.
After enterprise documents are rationalized and linked to enterprise data, they can be expressed in XML. This XML can be included in the XML documents and used for the service method's invocation, thus making documents and data indistinguishable to the service interface. To make this work, documents are passed around as references, enhancing the ESB architecture by adding access to the document repository (see Figure 6).
There are several practical considerations to keep in mind. Physical documents are not passed around between services; only document references (metadata) are included in the service interfaces. The service itself accesses documents on an as-needed basis. This characteristic is important because the size of the documents can be significant, which can lead to increased network traffic and performance degradation of the ESB. Also, like databases, document management systems usually provide quasi ACID properties through a check in/check out mechanism. Removing the documents from the processes and minimizing their usage time can improve the throughput of the document management system.
Although Figure 6 shows access to the document management systems from the ESB, that does not mean that services are used for document access. Most of the existing document systems provide remote client APIs. Building reusable components based on these APIs is usually a better approach.
Role Reversal
Implementation of business processes requires an underlying data model that the process operates on. Because the business processes and business data objects are developed as part of the same enterprise business model, the business processes and services interfaces (including document support) are driven by the same business model. This situation makes integration between services and business processes very straightforward. Since documents are already integrated in the enterprise data model, the same process can support both transactional data and document processing. This paradigm allows starting of a business process from the business document and incorporating data processing into them and vice versa.
Alignment of business process with business data also allows for rethinking the notion of the service interfaces. As discussed previously, services are driven by the semantic interfaces. Considering that the business process data model adheres to the same semantics as the service interface data model, it is possible to create a semantic message for services that uses the process data model as service interface documents both for input and output. Since both the process and the interface data model are driven by the same semantics, any particular service can extract data that it needs from the process model.
This solution reverses responsibilities: instead of the business process building a specific interface for a participating service, business services are responsible for accessing required information from the process data model and updating the data model with the results of their execution. Such an approach minimizes the impact of service interface changes, as long as the required data is available in the process data model. This approach puts additional burden on the service implementations, but these are negligible compared to the expenses of realigning the process implementations with the service interface changes.
Putting all of this together leads to the extended SOA model, resulting in a highly robust enterprise architecture (see Figure 7). This approach requires significant up-front investment in designing an enterprise business model, driving design and execution of the enterprise business services, and creating the enterprise data ontology (see Resources).
However, the benefits realized from implementing a truly semantic enterprise make the effort worthwhile. Establishing a data ontology simplifies linking of business documents and enterprise data, and provides a foundation for integration of data and documents processing. Finally, usage of a process business data model as a service interface allows for achieving even looser coupling between services and processes.