Re-engineering legacy systems

In recent years a lot of interest has been generated in software architectures and the measures for architectural "goodness."

The emphasis on object-orientation during the 1990s has intertwined our thinking about software architecture with an object philosophy for developing software. This article presents an approach for applying this contemporary architectural thinking to re-engineering legacy, mainly mainframe-based, apps. It will focus on a scenario-based evaluation of architectural "goodness."

Desmond D'Souza and Alan Cameron Wills describe several architectural "view-ports" into architecture descriptions in their book, Objects, Components and Frameworks with UML (Reading, Mass.: Addison–Wesley, 1999), in which each view focuses on a specific part of a system. Some views are focused on design and development, some on testing and runtime, and others on deployment and upgrades. This article will focus on the "module" view that deals with design time units of development work. This view includes the design decisions hidden in the work units, or modules, as well as the refinements used for sub-module decompositions and justifications, including the rationale for choices made and choices rejected.

A hypothetical case study
Consider, for example, a project involving domain and requirements analysis and high-level design for the pricing system for a hypothetical car rental company. The company's IT organization manages an existing IMS-Assembler-COBOL-based legacy system running on an IBM mainframe. Like most mainframe-based legacy systems, there is little or no documentation. Most documentation that is on hand is outdated and does not reflect the implemented system. The only way to be sure what the system is and does is to examine the program code closely.

The system in this example is in the process of a re-engineering project to prepare the code for a migration to relational, client/server platforms. The goals of this phase of the project are to capture as much knowledge about the existing system as possible and to identify the high-level modules of the new system, that is, to develop the module view of the target system. Once this phase is done, the developers must sit with the business users to discuss the state of the existing system as well as any desired improvements.

Develop architectural vision and structure charts
To ensure common understanding and standardization, an architectural vision for the new rates system as a whole and its interface with other elements of the car rental system must be developed first and shared with the team. Each vision element should then be analyzed to come up with concrete design guidelines.

The overall architecture of the rates system and its translation into specific design guidelines is deliberately not included in this high-level phase. The architectural vision and structure charts are developed to guide the various design teams and to ensure a common shared architectural vision.

There are two objectives for developing the high-level structure charts. The first is to better understand the existing system from a business angle without getting bogged down in the specific implementation details. And the second is to identify the high-level system modules and develop the overall modular structure—the module architecture—of the new system.

Next, the high-level structure charts are manually "reverse engineered" in facilitated sessions with each design team based on the available knowledge of the legacy system. The facilitator of these sessions should be familiar with the car rental industry as well as IT systems architecture and design.

To facilitate development of the high-level structure charts, the rental car system is broken down into a few major processes (car reservation, check out, check in, display "best buys" and the like). All programs associated with these processes are looked at as one unit, since that level of granularity is sufficient at this stage of the process.

To help the design teams conduct self-reviews of their designs and also cross-team reviews of one another's designs, a scenario-based approach to architecture evaluation would be effective for this project. This approach can be used as a formal basis for technical reviews and feedback on an ongoing basis while ensuring a certain level of consistency across teams and systems.

Identify key quality attributes
The team's first task is to identify the few critical quality attributes against which the system's architecture will be evaluated. This is a mandatory step in scenario-based, architecture-based evaluation. Achieving one quality attribute can have an effect—sometimes positive and sometimes negative—on the achievement of other quality attributes. For example, security and fault tolerance exist in a state of mutual tension; the most secure system has the fewest points of failure, whereas the most fault-tolerant system has the most points of failure.

There are no universally good or bad architectures. "Goodness" can only be measured relative to the set of quality attributes of importance in a specific context. It is imperative that these be identified up front to guide architectural tradeoffs.

The hypothetical car rental system requires modifiability. Typical changes to the system should involve a small set of distinct modules. In this method, the more local the changes are, the better it is for the system as a whole. The system should facilitate phased implementation of new functionality and gradual migration of all company applications to the new client/server platforms. The ease of modification when the database platform, database design the user interface changes is also important.

The specifications for the system also call for it to be reusable. A module or group of modules should be reusable within and outside the system. The car rental company might also have several other businesses in the travel industry, which would create ample potential for module reuse.

Identify architectural constraints
In addition to implementing the key quality attributes, the team should also agree to a few key constraints to follow while they are attempting to meet the key quality requirements for the project. Performance can be consciously traded off to boost modifiability and reusability. Our hypothetical team agreed to use existing program code to dictate the feasibility and ease with which the existing system could be revamped. And change complexity/feasibility was consciously traded off to ensure the satisfaction of key quality attributes as necessary.

Thus, the goal was to meet the key quality goals, while explicitly studying the constraints and evaluating possible tradeoffs. It was also decided that in the initial stages of design, the team would not focus more on developing a high-level design that met the key quality attributes than on the constraints. The constraint imposition was consciously deferred, because of the objectives behind structure chart development.

Structure chart analysis
The segment of the structure used to illustrate the scenario-based evaluation approach is shown in Figure 1. The latest structure chart closely resembles a version of the structure chart that was initially developed by one of the design teams; the structure chart was used as an example to explain the architecture evaluation process to the other design teams. It has been used here because it is simple, yet it brings up the key elements of the approach. These were not program-level structure charts, but higher-level design structure charts.

Figure 1
Figure 1
This section of the overall structure chart relates to the "Determine Insurance Rate" module, which computes insurance charges for each rental.

The overall structure chart relates to the program modules that facilitate online inquiry of rental car prices based on user-specified rental requirements (such as the check-out and check-in stations, duration of rental, pick-up date, and so on). The specific section of the structure chart used for this illustration relates to "Determine Insurance Rate," a module that computes insurance charges for the rental.

The ellipse in Figure 1 refers to two global segments used by the "Determine Insurance Rate" module—the location database and the customer database. These segments contain global variables from the location and customer databases, respectively. The "Determine Insurance Rate" module uses these global segments to determine the insurance types (such as personal accident insurance, personal effects, property insurance and so on), for which the amounts are to be displayed. The applicable insurance types are dependent on the checkout location and on the customer.

The other modules in Figure 1 retrieve the required insurance data for the applicable insurance and compute the insurance amounts for each type of insurance.

Consider the structure chart's modifiability. Under the development plan, the location and customer databases were to eventually be rewritten and migrated to the new platform, which would probably change the structure of these databases. So it is important to consider how the application design will be affected when other databases are redesigned.

With the existing structure, "Determine Insurance Rate" will need to be redesigned, since it is dependent on the global data structures, which in turn were dependent on the current implementation of the databases. A better alternative calls for moving the responsibility for determining the insurance types for which amounts were needed to a separate module and only pass these insurance types as a parameter to the "Determine Insurance Rate" module. This section of the structure would be drawn differently (see Fig. 2). With this design, "Determine Insurance Rate" is buffered from changes to the location and customer databases.

Figure 2
Figure 2
An alternative structure for "Determine Insurance Rate" that improves reusability.

Obviously, the module "Rate Inquiry" (specifically "Determine Applicable Insurance Types") is now dependent on the location and customer databases so it is susceptible to change. However, this course is better for three reasons. First, the higher-level module was likely to be dependent on these databases (given the other responsibilities assigned to this module). Second, in all probability, there would be fewer such control modules that would need to be changed. Third, more commonly used modules such as insurance, taxes, surcharges and so on could all be buffered from location and customer database changes.

Now consider the other key quality attribute—reusability. Our team considered the following scenario: What if we need a new screen inquiry program for displaying the insurance amounts pertaining to insurance types that users input?

As the structure chart stood originally (in Fig. 1), "Determine Insurance Rate" was vested with the responsibility of determining the applicable insurance types, and obtaining and processing the data. In the program in the scenario above, there is no need for determining the applicable insurance types by reading the location and customer databases, since these are input by the user.

"Determine Insurance Rate" (see Fig. 1), cannot be reused in this new inquiry program. Note that the modules below "Determine Insurance Rate" in Figure 1 are reusable. The alternative solution presented in Figure 2 facilitates reuse for the new program because the list of insurance types is passed as input into the "Determine Insurance Rate" module. In the new distributed program the insurance types input by the user will be passed to the "Determine Insurance Rate" module instead of from the "Determine Applicable Insurance Types" module, which does not limit sources of input. This increases the granularity of reuse because it makes the entire "Determine Insurance Rate" module, not just its individual sub-modules, reusable.

Additional thoughts on the approach
Interestingly, increasing the cohesiveness of the "Determine Insurance Rate" module (by taking away the responsibility for determining applicable insurance types) and reducing coupling (by taking away the need for accessing global data) in our example enhanced both modifiability and reusability.

The approach used is similar to a "what-if" kind of analysis common to spreadsheet applications. Developing representative scenarios requires domain knowledge.

Design involves tradeoffs; what works for one scenario may not work for another. Ultimately, application needs should provide the basis for making intelligent tradeoffs. Experts recommend that two cross-reference tables be built: scenarios vs. modules affected, and modules vs. scenarios that affect them. And two situations need to be examined closely with a view to improving the modular structure—scenarios that affect many modules, and modules that are affected by a large number of scenarios.

These situations may work for most IT shops under the following conditions:

  • Only a few scenarios affect a large number of modules. These scenarios may be the system "hot spots," requiring extensive care and attention during development, testing, implementation and so on.
  • Modules are affected by scenarios of the same type, like location database changes or one-way fee changes. This may indicate good, cohesive modules.
  • Modules affected by many scenarios have a well-defined modular separation of responsibilities at the sub-module level. This separation can be determined by looking at the modules in the structure chart at a more granular level.