Effective Test Strategies for Enterprise-Critical Applications

  From the Pages Kamesh Pemmaraju is a senior consultant at Reliable Software Technologies, Sterling, VA. He can be contacted at [email protected].

JAVA HAS QUICKLY evolved into more than a simple-minded language used to create cute animations on the Web. These days, a large number of serious Java developers are building enterprise-critical applications that are being deployed by large companies around the world. Some applications currently being developed on the Java platform range from traditional Spreadsheets and Word Processors to Accounting, Human Resources, and Financial Planning applications. Because these applications are complex and use rapidly evolving Java technology, companies need to employ a vigorous quality assurance program to produce a high-quality and reliable product. Quality assurance and test teams must get involved early in the product development life cycle, creating a sound test plan, and applying an effective test strategy to insure that these enterprise-critical applications provide accurate results and are defect-free. Accuracy is critical for users who apply these results to crucial decisions about their business, their finances, and their enterprise. I present a case-study of how an effective testing strategy, focused on sub-system level automation, was applied to successfully test a critical real-world Java-based financial application.

The application is being developed by a leading financial services company (hereafter referred to as client) and is targeted toward both individual investors and institutional investors, particularly those investors that are interested in managing their finances and retirement plans. Reliable Software Technologies (RST) provided the QA/Test services and the test team (hereafter referred to as the QA/Test team) to the client. Because RST carried out third-party independent testing of the client application, some details about the development of the product were missing. However, RST received enough information to perform an effective test cycle on the product.

The application to be tested is a personal financial planning package. In addition to investment planning, the package features financial planning for college, retirement, and estate. The software is designed for, and tested on, Windows 95, Windows 98, and Windows NT platforms and both Sun's JDK 1.1.6 and Microsoft's JVM. The application is a very complex, data-driven, and computationally intensive Java-based application capable of performing several thousand calculations while processing more than 1000 input variables. The number of calculations and their complexity almost rules out any manual validation and demands an automated solution to execute tests and an Oracle to verify the accuracy of the results. The Oracle is not to be confused with the popular database software of the same name; rather the Oracle is a Reference Spreadsheet that does the same calculations (for validation purposes) as the application under test. The Reference Spreadsheet is implemented in Microsoft Excel.

The client design team decided early on to separate the user-interface software and the business software (referred hereafter as the backend). This decision was in part due to the client management's goal to eventually deploy the application on the Web. For future Web implementation of the application, the design team decided to re-use the backend software (e.g., on the application server) and re-implement the GUI as a thin-client. A thin-client could be a simple Java applet or application that handles user input and display while the server software could perform all the complex calculations on a dedicated application server.

For testing purposes, the decision to separate the backend from the GUI proved to be crucial. The application performs hundreds of calculations behind the scenes and the user interface only partially shows what happens inside. Sometimes, a very complicated internal processing produces only a single final result. Clearly, in this situation, testing only through the GUI is not enough. Separating the GUI from the backend allowed comprehensive automated verification of the backend calculations at a much higher level of granularity.

During the design phase, the client application designers used Rational Rose to design the backend software and to develop use cases and Object Models for the application. Use cases are used as a means of describing the functionality of a system in object-oriented software development. Each use case represented a specific flow of events in the system. The description of a use case defined what should happen in the system when the use case is performed. These use cases were later used to define the Objects (Java classes) required in the system. At the end of the design phase, the backend consisted of nearly a hundred Java classes and several hundred use cases.

The client management made an important early decision: to develop a Reference Spreadsheet that can be used to validate the thousands of calculations performed by the backend. The Reference Spreadsheet was developed using several inter-linked Microsoft Excel worksheets that implemented the same functionality as the application under test. As described later, the creation of the Reference Spreadsheet proved to be extremely useful for automated validation of the calculations. The medium for automation here was Visual Basic for Applications (VBA).

Finally, the client development team created a data-dictionary that contained detailed information on the range and limits of all the input variables of the application. The data-dictionary proved to be an invaluable test entity that was later used for automatic generation of thousands of semantically correct and constrained test data. In addition, test data that fell outside the bounds of the defined ranges was generated. This test data was used to verify the error and range handling of the application. Because the data-dictionary was in Microsoft Excel, the test-data generation software was written in VBA.

Testing a relatively large Java application under an aggressive schedule requires that the QA/Test teams get involved early in the life cycle creating a sound test plan and then applying an effective test strategy. The following is a description of some factors that helped ensure a successful test life cycle for the application.

Early Life Cycle Testing
Test plan analysis and design during the initial stages of a software project aided in the success of the test implementation, efficiency, rigor, coverage, and overall quality of a product. The early life cycle testing approach employed was based on the development model of the project. The development model was evolutionary: the design team incrementally modeled the Use Cases, then carried out the OO design, and finally constructed and unit-tested the code. At the end of each iteration, the design team handed over the use cases, the OO models, and alpha code to the QA/Test team. Based on these documents, the QA/Test team identified a strategy that would improve the testability of the code (testability in this context is defined as the ability to minimize the effort required to apply the chosen test strategy to the system), evaluated tools that supported test automation, and prepared detailed test requirements. Specifically, the QA/Test team's activities were: development of a use case test plan, development of class level test requirements, evaluation of Sun Microsystems's GUI test tool JavaStar for system-level GUI test, and some initial system level testing of the alpha versions.

A number of benefits were realized by the QA/Test team initiating quality assurance work early in the life cycle and concurrent with the development process:

  • An early understanding of the design structure and functionality of the product and develop effective test strategies.
  • Increased awareness of design issues to be considered along with items or issues, which were a high priority in facilitating early test planning.
  • To execute a more rapid and effective response as the product moved to completion.

Test Strategy
The test plan defined a multi-pronged approach consisting of the following five activities to test various aspects of the application:

  • Sub-system level testing
  • GUI Integration system testing
  • System level testing
  • Use case testing
  • Range/Error handling testing
The test approach clearly defined exit criteria to determine when to stop testing activities and when to ship the product. Finally, all test activities focused on maximizing automation of tests whenever possible. What follows is a brief description of these five activities.

Sub-System Level Testing of the Backend
Given the limited testing time and the large number of backend Java classes to test (nearly 100), an approach toward unit testing based purely on individual class-level testing was neither feasible nor cost-effective. Moreover, many individual classes served only as support classes or utility classes in the class framework (especially those at the lower end of the class hierarchy) and did not, therefore, produce application-level results that could be validated against the Reference Spreadsheet. Consequently, the QA/Test team chose a collection of class sub-systems to test at the "unit" level. The choice of sub-systems to test was made in the Reference Spreadsheet, which defined 25 distinct sub-systems.

The test approach for sub-system testing was essentially black box in nature requiring generation of test inputs, development of test drivers, and development of utilities for automated verification of the outputs against the Reference Spreadsheet. Unfortunately, black box testing has one notable drawback—it fails to exercise significant portions of the code. However, coverage measurement and analysis can determine the portions of the code unexercised by black box testing. The QA/Test team utilized Reliable Software Technology's Deepcover™ as its Java coverage tool to identify areas of inadequate coverage in the code. Additional test cases were created as needed to improve the coverage levels achieved.

A test framework was developed to support an automated verification of the backend subsystems. The framework consisted of the following components:

  • Test Data Generation tool
  • Test Drivers
  • Comparator
The following is a brief description of these components:
Test data generation—The QA/Test team developed a test data generation tool that could automatically generate thousands of test cases containing constrained test data for each sub-system of the product. The tool was developed using VBA taking advantage of the powerful constraint satisfaction algorithms available in Microsoft Excel. To automatically generate test data, the tool used the information on the range and limits of the input variables in each sub-system. The tool obtained this information from the data-dictionary, which contained the range information for each input variable. The range information was available as minimum, maximum, typical minimum, and typical maximum values. The test data generation tool used this range information in the data dictionary to generate test data such as random values within the maximum range, random values within the typical range, inner bounds, and outer bounds. The data dictionary also defined error messages that should be displayed when the user inputs values outside the valid bounds. This error message information in the data-dictionary was later used for Range/Error testing.

Table 1. Example test case file.
ID String Contributions
Testcase 1
Testcase 2
Testcase 3
Testcase 4

Using the data generation tool, the QA/Test team generated 1500 test cases (a few test cases are defined next) for each of the 25 different sub-systems, totaling 37,500 test cases. Each test case consisted of a row of test data (see Table 1) used to populate the input variables of the sub-system under test. As shown in the table, the input variables are represented in columns and the test cases are represented in rows.

The last column (ErrorStatus) shows whether the test case represents normal processing or exception processing. In this example, the last test case exercises the exception processing because the RothIRA4 contribution ($1 million) exceeds the maximum allowable contribution. Notice the values for StartYear and StopYear: these are represented in random double values between 0 and 1 instead of regular year format. The test drivers (described next) use these random numbers and other context-sensitive dates (e.g., of birth, year of retirement, etc.) to generate the actual start and stop years. This was a simple and elegant way to deal with multi-constraint variables without complicating the process of data generation. The test data was stored in a flat file, one for each sub-system, in a fixed-width format so that it was easy to export it to a database or spreadsheet.

Test drivers—To exercise the backend code using the generated test data, the QA/test team developed component-level Java drivers for each sub-system. The test drivers read in the previously generated test data (see Table 1) from a file, used the set methods of the component to assign test data to the input variables, called the functions that calculate the results, and finally wrote the results to a file. Simultaneously, spreadsheet drivers were developed in VBA to perform this exact same process on the Reference Spreadsheet. The results of the Reference Spreadsheet were then compared with those of the backend. Unfortunately, sometimes the Reference Spreadsheet itself produced incorrect results. Nevertheless, discrepancies provided warnings and prompted further investigation.

After the sub-system level Java drivers were completed, a suite of Java integration drivers was developed, which incorporated methods in the module-level classes, as well as new functionality needed for integration testing. A similar process was followed for the spreadsheet drivers. Overall, 25 Java test drivers and 25 VB drivers were developed comprising 3,700 and 8,500 lines of test code, respectively. Together, the Java and VB test driver code accounted for 12,200 lines of code, which are almost 50% of the total lines of code of the backend. Studies have often shown that for well-tested applications, test code comprises 50% of all code written for a project.

Comparator—Because the calculations performed by the backend often produced hundreds of output variables, an automated comparison tool was developed in Visual Basic to examine and compare the backend results with those of the spreadsheet. The comparator tool discarded unneeded text strings before making comparisons of the output results. A backend-calculated value was deemed correct if it deviated by less than .0001% from the spreadsheet calculation. This represented a variation of, at most, one dollar in a million dollars and this was deemed acceptable if the variation is caused due to drifts in floating point accuracy in the Java system and Microsoft Excel. At the end of the entire test suite execution, the total number of actual comparisons of the backend results and the spreadsheet was close to three-quarters of a million (693,122). Nearly 98% of these comparisons passed. The overall scheme is illustrated in the Figure 1.

From the Pages

Figure 1. Examining the backend results.

GUI Integration Testing
GUI integration testing verified end-to-end functionality of the application. The input data was entered through the GUI and the results obtained were verified against the spreadsheet. GUI integration tests were automated using JavaStar in only one area of the application. Other areas were tested manually due to the limitations of JavaStar (these are described in "Lessons Learned" section) and lack of time.

System Testing
System testing was performed to verify that the application as a whole performed correctly when used in real-life conditions. System-level testing was done manually due to limitations with JavaStar. Some of the difficulties faced with JavaStar are described next. Most problems found during system level testing were incorrect error messages, minor functionality failures, "look-and-feel" problems, and usability issues.

Use Case Testing
Use case testing verified specific paths through the application as specified by use case documentation provided by development team The use cases were executed at least once, with all outputs being analyzed for correctness and usability. More than 3,200 use case tests were executed manually against the application, and about 90% of these passed the tests. The remaining 10% of the use case tests were either not verifiable or did not behave correctly, and were therefore reported as bugs. The test results were tracked manually during the testing cycle and then cross-referenced in an Excel spreadsheet for further validation and verification.

Range and Error Handling Testing
The approach to range testing focused on testing maximum/minimum range values, invalid characters, and error handling.

  • Minimum and maximum values were identified for each field in the application as specified by the data dictionary. Test cases ensured that values were within the specified range (including the boundary values) and all values outside the range produced appropriate error messages.
  • Error messages were also validated based on incorrect input. By entering an invalid value, error messages were verified and analyzed for correctness.

Table 2 presents a summary of the test results and some interesting statistics on the test cycle.

Table 2. Test summary and statistics.
Total number of sub-system test cases
Total number of output comparisons
Number of passed comparisons
Total lines of test code (Java and VB)
Total number of use cases verified
Number of passed use cases
Total number of versions tested
Statement coverage obtained for the entire backend code
Total number of defects found
679,259 (Pass percentage = 98%)
12,200 (50% of total backend code)
2880 (Pass percentage = 90%)
30 (9 Alpha and 21 Beta)
80.016% (43 modules out of 152 achieved 100% coverage)
162 (52% functional failures, 30% critical defects)


Early Life-Cycle Issues and Developer—Tester Interactions
The QA/Test team had an opportunity early in the design process to review the Rational Rose design diagrams, the data dictionary, and the spreadsheet documentation. The data dictionary and the spreadsheet documentation were particularly useful in the development of the test data generation scheme and test requirements for class level testing. These documents were also very helpful for the development of the VB drivers. The data dictionary, however, was suitable for neither test data generation nor was it in a format amenable for data access from other applications. The QA/Test team modified the data dictionary so that it was possible to access the information in a uniform manner.

While this early involvement was clearly helpful in several ways, there are still some areas that could be improved. One is the lack of one-on-one interaction with the design/development team early in the life cycle. While use cases and object models were provided early, these were not sufficient. Without interaction with the design team—to understand the reasoning behind their design—it was sometimes difficult to effectively use the object models and use cases. One-on-one interaction between the QA/Test teams and the developers earlier in the life cycle would have provided several advantages:

  • Complex development issues, such as understanding the object models, integration issues, etc., could have been quickly addressed.
  • Certain financial algorithms, limits, maximums, and boundaries could have been better understood and thereby reduced the troubleshooting time of sub-system and Reference Spreadsheet drivers.
  • Better understanding of the intended customers and their needs would also have helped to reduce troubleshooting time.
The early personal level of interaction would have also facilitated understanding of factors such as purpose of software, user requirements, typical environment, business logic, and performance requirements. This in turn could have helped the QA/Test team to create a much more effective and accurate test plan.

GUI—Backend Separation
The separation of the GUI and the backend, enabled backend testing without requiring access to the GUI functionality. However, testing the GUI functionality and its communication with the backend was made difficult due to the tight coupling between the GUI and the backend. GUI integration testing checks to see whether the GUI communicates data to and from the backend correctly. If these two systems are tightly integrated, it is difficult to determine whether entered information is being processed, and passed on to the backend properly by the GUI. Providing hooks into the GUI and backend for accessing state information would also greatly enhance the value and efficiency of GUI testing.

Issues With the Reference Spreadsheet
The Reference Spreadsheet was implemented by the client and contained the same functionality as the application under-test. The Reference Spreadsheet was just another implementation of a complex application and was therefore as faulty as the implementation of the software-under-test: it was not completely accurate and it did not always provide the correct answers. This is not surprising because if the Reference Spreadsheet has to provide the correct answers accurately and completely, it must be at least as complex (and hence likely to be as faulty) as the program under verification.

Nevertheless, the Reference Spreadsheet proved to be very useful for automated verification of results, as it was possible to write scripts to drive the Spreadsheet and obtain results from it. Moreover, the Reference Spreadsheet was the only specification for the backend calculations. Any discrepancies in the results from the backend and spreadsheet provided warnings. Sometimes the Oracle was incorrect, sometimes the backend was incorrect (real bug), and sometimes both were wrong. The analysis of these discrepancies often helped in identifying faults made in the backend and the spreadsheet and at times even caught errors of omission in the code and requirements that were totally overlooked.

Java Standards
There are several standard interfaces designed to be used by Java objects to aid in debugging and testing. Using these interfaces can increase efficiency of testing and debugging with minimal effort.

  • Java.awt.Component.setName()

All GUI components that derive from java.awt.Component should use the setName() method to identify instances of the component. This allows GUI testing programs to identify the GUI component quickly, regardless of position on the page. This makes it possible to change the layout of a page without affecting GUI testing scripts.
  • Object.toString()

The purpose of the toString() method, as stated in the Java specifications, is to provide a standard interface for displaying state information. Every class should provide a toString() method that dumps the values of current state variables, possibly also calling the toString() function of the superclass. This will speed both debugging and development of test drivers.

Issues With Test Automation Using JavaStar
Despite some initial drawbacks, JavaStar was deemed usable enough to support automated testing for use case, system, and integration testing. However, many problems surfaced throughout the test life cycle, which caused delays and hindered the testing effort. Because the GUI code-base did not explicitly name the GUI components using the setName() method, JavaStar automatically provided its own names. These names were usually incomprehensible because they had no correlation to the application names. For example, an "OK" button on the first GUI form was named Nav3Button2l(). This posed a problem for long-term test script comprehension and maintenance. An effort was also made to use JavaStar for automated use case testing. Because use case scenarios involved testing of exceptions, it was attempted to synchronize the application's exceptions with JavaStar exceptions to facilitate the throwing of exceptions. However, JavaStar would catch the exceptions and terminate the test because it assumes that if an exception was thrown, the user did not want to continue. Because use case testing is not data driven and does not require as much repetition, JavaStar was abandoned because of these technical difficulties.

Extracting data from some screens of the application was also problematic. In one of the sections of the application, data was painted onto the screen and therefore could not be selected and copied into a results file for later comparison with Java driver generated results. The only solution was to write a traversal procedure and insert it unobtrusively into backend code to gather results. A call to the output function was inserted in the required class files for report generation.

Finally, JavaStar had problems handling warning pop-ups that the application was creating when exceptions were thrown. When encountering such a window, it would throw its own exception and terminate the current test as well as other tests. To handle these warnings, subroutines were written to check whether a warning window should come up in a certain case, at which time "OK" would be clicked on the warning window.

Using the approaches described in this article, the test team reached an average of 80% statement coverage for the entire backend code. Critical modules reached 100% statement coverage and less critical modules reached 95% statement coverage.

Several defects were uncovered and fixed during the test cycle. Java-specific tools for GUI testing (Sun Microsystem's JavaStar) and coverage measurements (RST's Deepcover) simplified the tasks of implementation and execution of tests. The high levels of test productivity achieved through automation and the demonstration of reliability of the product resulted in a highly satisfied client and a high-quality Java product.

Reliable Software Technologies
Sun Microsystems, Inc.