Kamesh Pemmaraju is a senior consultant at Reliable Software Technologies, Sterling, VA. He
can be contacted at
JAVA HAS QUICKLY evolved into more than a simple-minded language used to create cute
animations on the Web. These days, a large number of serious Java developers are building
enterprise-critical applications that are being deployed by large companies around the world.
Some applications currently being developed on the Java platform range from traditional Spreadsheets
and Word Processors to Accounting, Human Resources, and Financial Planning applications. Because
these applications are complex and use rapidly evolving Java technology, companies need to employ
a vigorous quality assurance program to produce a high-quality and reliable product. Quality
assurance and test teams must get involved early in the product development life cycle, creating
a sound test plan, and applying an effective test strategy to insure that these enterprise-critical
applications provide accurate results and are defect-free. Accuracy is critical for users who apply
these results to crucial decisions about their business, their finances, and their enterprise. I
present a case-study of how an effective testing strategy, focused on sub-system level automation,
was applied to successfully test a critical real-world Java-based financial application.
The application is being developed by a leading financial services company (hereafter referred to
as client) and is targeted toward both individual investors and institutional investors,
particularly those investors that are interested in managing their finances and retirement plans.
Reliable Software Technologies (RST) provided the QA/Test services and the test team (hereafter
referred to as the QA/Test team) to the client. Because RST carried out third-party
independent testing of the client application, some details about the development of the product
were missing. However, RST received enough information to perform an effective test cycle on the product.
The application to be tested is a personal financial planning package. In addition to investment
planning, the package features financial planning for college, retirement, and estate. The software
is designed for, and tested on, Windows 95, Windows 98, and Windows NT platforms and both Sun's JDK
1.1.6 and Microsoft's JVM. The application is a very complex, data-driven, and computationally
intensive Java-based application capable of performing several thousand calculations while processing
more than 1000 input variables. The number of calculations and their complexity almost rules out
any manual validation and demands an automated solution to execute tests and an Oracle to
verify the accuracy of the results. The Oracle is not to be confused with the popular database software
of the same name; rather the Oracle is a Reference Spreadsheet that does the same calculations
(for validation purposes) as the application under test. The Reference Spreadsheet is implemented in
The client design team decided early on to separate the user-interface software and the business
software (referred hereafter as the backend). This decision was in part due to the client
management's goal to eventually deploy the application on the Web. For future Web implementation
of the application, the design team decided to re-use the backend software (e.g., on the application
server) and re-implement the GUI as a thin-client. A thin-client could be a simple Java applet or
application that handles user input and display while the server software could perform all the
complex calculations on a dedicated application server.
For testing purposes, the decision to separate the backend from the GUI proved to be crucial.
The application performs hundreds of calculations behind the scenes and the user interface only
partially shows what happens inside. Sometimes, a very complicated internal processing produces
only a single final result. Clearly, in this situation, testing only through the GUI is not
enough. Separating the GUI from the backend allowed comprehensive automated verification of the
backend calculations at a much higher level of granularity.
During the design phase, the client application designers used Rational Rose to design the backend
software and to develop use cases and Object Models for the application. Use cases are used as a
means of describing the functionality of a system in object-oriented software development. Each
use case represented a specific flow of events in the system. The description of a use case defined
what should happen in the system when the use case is performed. These use cases were later used to
define the Objects (Java classes) required in the system. At the end of the design phase, the backend
consisted of nearly a hundred Java classes and several hundred use cases.
The client management made an important early decision: to develop a Reference Spreadsheet that can be
used to validate the thousands of calculations performed by the backend. The Reference Spreadsheet was
developed using several inter-linked Microsoft Excel worksheets that implemented the same functionality
as the application under test. As described later, the creation of the Reference Spreadsheet proved to
be extremely useful for automated validation of the calculations. The medium for automation here was
Visual Basic for Applications (VBA).
Finally, the client development team created a data-dictionary that contained detailed information
on the range and limits of all the input variables of the application. The data-dictionary proved to
be an invaluable test entity that was later used for automatic generation of thousands of semantically
correct and constrained test data. In addition, test data that fell outside the bounds of the defined
ranges was generated. This test data was used to verify the error and range handling of the application.
Because the data-dictionary was in Microsoft Excel, the test-data generation software was written in VBA.
Testing a relatively large Java application under an aggressive schedule requires that the QA/Test
teams get involved early in the life cycle creating a sound test plan and then applying an effective
test strategy. The following is a description of some factors that helped ensure a successful test life
cycle for the application.
Early Life Cycle Testing
Test plan analysis and design during the initial stages of a software project aided in the success
of the test implementation, efficiency, rigor, coverage, and overall quality of a product. The early
life cycle testing approach employed was based on the development model of the project. The development
model was evolutionary: the design team incrementally modeled the Use Cases, then carried out the OO
design, and finally constructed and unit-tested the code. At the end of each iteration, the design team
handed over the use cases, the OO models, and alpha code to the QA/Test team. Based on these documents,
the QA/Test team identified a strategy that would improve the testability of the code (testability in
this context is defined as the ability to minimize the effort required to apply the chosen test strategy
to the system), evaluated tools that supported test automation, and prepared detailed test requirements.
Specifically, the QA/Test team's activities were: development of a use case test plan, development of class
level test requirements, evaluation of Sun Microsystems's GUI test tool JavaStar for system-level GUI test,
and some initial system level testing of the alpha versions.
A number of benefits were realized by the QA/Test team initiating quality assurance work early in
the life cycle and concurrent with the development process:
- An early understanding of the design structure and functionality of the product and develop effective
- Increased awareness of design issues to be considered along with items or issues, which were a high
priority in facilitating early test planning.
- To execute a more rapid and effective response as the product moved to completion.
The test plan defined a multi-pronged approach consisting of the following five activities to test
various aspects of the application:
The test approach clearly defined exit criteria to determine when to stop testing activities and when to
ship the product. Finally, all test activities focused on maximizing automation of tests whenever possible.
What follows is a brief description of these five activities.
- Sub-system level testing
- GUI Integration system testing
- System level testing
- Use case testing
- Range/Error handling testing
Sub-System Level Testing of the Backend
Given the limited testing time and the large number of backend Java classes to test (nearly 100), an
approach toward unit testing based purely on individual class-level testing was neither feasible nor
cost-effective. Moreover, many individual classes served only as support classes or utility classes in
the class framework (especially those at the lower end of the class hierarchy) and did not, therefore,
produce application-level results that could be validated against the Reference Spreadsheet.
Consequently, the QA/Test team chose a collection of class sub-systems to test at the "unit" level.
The choice of sub-systems to test was made in the Reference Spreadsheet, which defined 25 distinct
The test approach for sub-system testing was essentially black box in nature requiring generation
of test inputs, development of test drivers, and development of utilities for automated verification of
the outputs against the Reference Spreadsheet. Unfortunately, black box testing has one notable
drawbackit fails to exercise significant portions of the code. However, coverage measurement
and analysis can determine the portions of the code unexercised by black box testing. The QA/Test
team utilized Reliable Software Technology's Deepcover as its Java coverage tool to identify
areas of inadequate coverage in the code. Additional test cases were created as needed to improve
the coverage levels achieved.
A test framework was developed to support an automated verification of the backend subsystems.
The framework consisted of the following components:
The following is a brief description of these components:
- Test Data Generation tool
- Test Drivers
Test data generationThe QA/Test team developed a test data generation tool that could
automatically generate thousands of test cases containing constrained test data for each sub-system
of the product. The tool was developed using VBA taking advantage of the powerful constraint satisfaction
algorithms available in Microsoft Excel. To automatically generate test data, the tool used the
information on the range and limits of the input variables in each sub-system. The tool obtained this
information from the data-dictionary, which contained the range information for each input variable.
The range information was available as minimum, maximum, typical minimum, and typical maximum values.
The test data generation tool used this range information in the data dictionary to generate test data
such as random values within the maximum range, random values within the typical range, inner bounds,
and outer bounds. The data dictionary also defined error messages that should be displayed when the
user inputs values outside the valid bounds. This error message information in the data-dictionary was
later used for Range/Error testing.
Table 1. Example test case file.
Using the data generation tool, the QA/Test team generated 1500 test cases (a few test cases are
defined next) for each of the 25 different sub-systems, totaling 37,500 test cases. Each test case consisted
of a row of test data (see Table 1) used to populate the input variables of the sub-system under test. As
shown in the table, the input variables are represented in columns and the test cases are represented in
The last column (ErrorStatus) shows whether the test case represents normal processing or
exception processing. In this example, the last test case exercises the exception processing because the
RothIRA4 contribution ($1 million) exceeds the maximum allowable contribution. Notice the
values for StartYear and StopYear: these are represented in random double values
between 0 and 1 instead of regular year format. The test drivers (described next) use these random numbers
and other context-sensitive dates (e.g., of birth, year of retirement, etc.) to generate the actual start
and stop years. This was a simple and elegant way to deal with multi-constraint variables without
complicating the process of data generation. The test data was stored in a flat file, one for each sub-system,
in a fixed-width format so that it was easy to export it to a database or spreadsheet.
Test driversTo exercise the backend code using the generated test data, the
QA/test team developed component-level Java drivers for each sub-system. The test drivers read in the
previously generated test data (see Table 1) from a file, used the set methods of the component to assign
test data to the input variables, called the functions that calculate the results, and finally wrote the
results to a file. Simultaneously, spreadsheet drivers were developed in VBA to perform this exact same
process on the Reference Spreadsheet. The results of the Reference Spreadsheet were then compared with
those of the backend. Unfortunately, sometimes the Reference Spreadsheet itself produced incorrect results.
Nevertheless, discrepancies provided warnings and prompted further investigation.
After the sub-system level Java drivers were completed, a suite of Java integration drivers was
developed, which incorporated methods in the module-level classes, as well as new functionality needed
for integration testing. A similar process was followed for the spreadsheet drivers. Overall, 25 Java
test drivers and 25 VB drivers were developed comprising 3,700 and 8,500 lines of test code, respectively.
Together, the Java and VB test driver code accounted for 12,200 lines of code, which are almost 50%
of the total lines of code of the backend. Studies have often shown that for well-tested applications,
test code comprises 50% of all code written for a project.
ComparatorBecause the calculations performed by the backend often produced hundreds
of output variables, an automated comparison tool was developed in Visual Basic to examine and compare
the backend results with those of the spreadsheet. The comparator tool discarded unneeded text strings
before making comparisons of the output results. A backend-calculated value was deemed correct if it
deviated by less than .0001% from the spreadsheet calculation. This represented a variation of, at
most, one dollar in a million dollars and this was deemed acceptable if the variation is caused due to
drifts in floating point accuracy in the Java system and Microsoft Excel. At the end of the entire test
suite execution, the total number of actual comparisons of the backend results and the spreadsheet was
close to three-quarters of a million (693,122). Nearly 98% of these comparisons passed. The
overall scheme is illustrated in the Figure 1.
Figure 1. Examining the backend results.
GUI Integration Testing
GUI integration testing verified end-to-end functionality of the application. The input data was entered
through the GUI and the results obtained were verified against the spreadsheet. GUI integration tests were
automated using JavaStar in only one area of the application. Other areas were tested manually due to the
limitations of JavaStar (these are described in "Lessons Learned" section) and lack of time.
System testing was performed to verify that the application as a whole performed correctly when used
in real-life conditions. System-level testing was done manually due to limitations with JavaStar. Some of
the difficulties faced with JavaStar are described next. Most problems found during system level testing
were incorrect error messages, minor functionality failures, "look-and-feel" problems, and usability issues.
Use Case Testing
Use case testing verified specific paths through the application as specified by use case documentation
provided by development team The use cases were executed at least once, with all outputs being analyzed for
correctness and usability. More than 3,200 use case tests were executed manually against the application,
and about 90% of these passed the tests. The remaining 10% of the use case tests were either not
verifiable or did not behave correctly, and were therefore reported as bugs. The test results were tracked
manually during the testing cycle and then cross-referenced in an Excel spreadsheet for further validation
Range and Error Handling Testing
The approach to range testing focused on testing maximum/minimum range values, invalid characters, and
- Minimum and maximum values were identified for each field in the application as specified by the data
dictionary. Test cases ensured that values were within the specified range (including the boundary values)
and all values outside the range produced appropriate error messages.
- Error messages were also validated based on incorrect input. By entering an invalid value, error
messages were verified and analyzed for correctness.
Table 2 presents a summary of the test results and some interesting statistics on the test cycle.
Table 2. Test summary and statistics.
Total number of sub-system test cases
Total number of output comparisons
Number of passed comparisons
Total lines of test code (Java and VB)
Total number of use cases verified
Number of passed use cases
Total number of versions tested
Statement coverage obtained for the entire backend code
Total number of defects found
679,259 (Pass percentage = 98%)
12,200 (50% of total backend code)
2880 (Pass percentage = 90%)
30 (9 Alpha and 21 Beta)
80.016% (43 modules out of 152 achieved 100% coverage)
162 (52% functional failures, 30% critical defects)
Early Life-Cycle Issues and DeveloperTester Interactions
The QA/Test team had an opportunity early in the design process to review the Rational Rose design
diagrams, the data dictionary, and the spreadsheet documentation. The data dictionary and the spreadsheet
documentation were particularly useful in the development of the test data generation scheme and test
requirements for class level testing. These documents were also very helpful for the development of the
VB drivers. The data dictionary, however, was suitable for neither test data generation nor was it in a
format amenable for data access from other applications. The QA/Test team modified the data dictionary
so that it was possible to access the information in a uniform manner.
While this early involvement was clearly helpful in several ways, there are still some areas that could
be improved. One is the lack of one-on-one interaction with the design/development team early in the
life cycle. While use cases and object models were provided early, these were not sufficient. Without
interaction with the design teamto understand the reasoning behind their designit was
sometimes difficult to effectively use the object models and use cases. One-on-one interaction between
the QA/Test teams and the developers earlier in the life cycle would have provided several advantages:
The early personal level of interaction would have also facilitated understanding of factors such as
purpose of software, user requirements, typical environment, business logic, and performance requirements.
This in turn could have helped the QA/Test team to create a much more effective and accurate test plan.
- Complex development issues, such as understanding the object models, integration issues, etc., could
have been quickly addressed.
- Certain financial algorithms, limits, maximums, and boundaries could have been better understood and
thereby reduced the troubleshooting time of sub-system and Reference Spreadsheet drivers.
- Better understanding of the intended customers and their needs would also have helped to reduce
The separation of the GUI and the backend, enabled backend testing without requiring access to
the GUI functionality. However, testing the GUI functionality and its communication with the backend
was made difficult due to the tight coupling between the GUI and the backend. GUI integration testing
checks to see whether the GUI communicates data to and from the backend correctly. If these two systems
are tightly integrated, it is difficult to determine whether entered information is being processed, and
passed on to the backend properly by the GUI. Providing hooks into the GUI and backend for accessing state
information would also greatly enhance the value and efficiency of GUI testing.
Issues With the Reference Spreadsheet
The Reference Spreadsheet was implemented by the client and contained the same functionality as the
application under-test. The Reference Spreadsheet was just another implementation of a complex application
and was therefore as faulty as the implementation of the software-under-test: it was not completely accurate
and it did not always provide the correct answers. This is not surprising because if the Reference
Spreadsheet has to provide the correct answers accurately and completely, it must be at least as complex
(and hence likely to be as faulty) as the program under verification.
Nevertheless, the Reference Spreadsheet proved to be very useful for automated verification of results,
as it was possible to write scripts to drive the Spreadsheet and obtain results from it. Moreover, the
Reference Spreadsheet was the only specification for the backend calculations. Any discrepancies in the
results from the backend and spreadsheet provided warnings. Sometimes the Oracle was incorrect, sometimes
the backend was incorrect (real bug), and sometimes both were wrong. The analysis of these discrepancies
often helped in identifying faults made in the backend and the spreadsheet and at times even caught errors
of omission in the code and requirements that were totally overlooked.
There are several standard interfaces designed to be used by Java objects to aid in debugging and
testing. Using these interfaces can increase efficiency of testing and debugging with minimal effort.
All GUI components that derive from java.awt.Component should use the setName()
method to identify instances of the component. This allows GUI testing programs to identify the GUI
component quickly, regardless of position on the page. This makes it possible to change the layout of
a page without affecting GUI testing scripts.
The purpose of the toString() method, as stated in the Java specifications, is to provide a
standard interface for displaying state information. Every class should provide a toString() method
that dumps the values of current state variables, possibly also calling the
toString() function of the superclass. This will speed both debugging and development of test
Issues With Test Automation Using JavaStar
Despite some initial drawbacks, JavaStar was deemed usable enough to support automated testing for use
case, system, and integration testing. However, many problems surfaced throughout the test life cycle,
which caused delays and hindered the testing effort. Because the GUI code-base did not explicitly name
the GUI components using the setName() method, JavaStar automatically provided its own names.
These names were usually incomprehensible because they had no correlation to the application names. For
example, an "OK" button on the first GUI form was named Nav3Button2l(). This posed a problem
for long-term test script comprehension and maintenance. An effort was also made to use JavaStar for
automated use case testing. Because use case scenarios involved testing of exceptions, it was attempted
to synchronize the application's exceptions with JavaStar exceptions to facilitate the throwing of
exceptions. However, JavaStar would catch the exceptions and terminate the test because it assumes that
if an exception was thrown, the user did not want to continue. Because use case testing is not data
driven and does not require as much repetition, JavaStar was abandoned because of these technical
Extracting data from some screens of the application was also problematic. In one of the sections of
the application, data was painted onto the screen and therefore could not be selected and copied into a
results file for later comparison with Java driver generated results. The only solution was to write a
traversal procedure and insert it unobtrusively into backend code to gather results. A call to the
output function was inserted in the required class files for report generation.
Finally, JavaStar had problems handling warning pop-ups that the application was creating when
exceptions were thrown. When encountering such a window, it would throw its own exception and terminate
the current test as well as other tests. To handle these warnings, subroutines were written to check
whether a warning window should come up in a certain case, at which time "OK" would be clicked on the
Using the approaches described in this article, the test team reached an average of 80% statement
coverage for the entire backend code. Critical modules reached 100% statement coverage and
less critical modules reached 95% statement coverage.
Several defects were uncovered and fixed during the test cycle. Java-specific tools for GUI testing
(Sun Microsystem's JavaStar) and coverage measurements (RST's Deepcover) simplified the tasks of
implementation and execution of tests. The high levels of test productivity achieved through automation
and the demonstration of reliability of the product resulted in a highly satisfied client and a
high-quality Java product.