Load testing: A developer's crystal ball

The reason to load test your client/server application is to detect multi-user interaction errors, right? Wrong! If this is your view of load testing, you are seeing only one dimension of this multifaceted tool. Consider the following.

More than a few client/server systems have been blind-sided by scalability issues. The term "hitting the scalability wall" evolved from the far too common occurrence of a client/server application's inability to accommodate a growing number of users. While proponents of a three-tier model will argue its alternative merits, the truth of the matter is that it is virtually impossible to predict if and/or when any client/server application (two, three or n-tier) will begin to degrade and whether a given configuration can support the desired number of users.

Most discussions on scalability deal with technological reasons for a degradation. What tends to be ignored is the impact the degradation has on the business. As more and more client/server applications can rightfully be termed mission-critical, the impact of systems that fail to scale becomes more acute. Research conducted by Software Productivity Group, Natick, Mass., in 1995 showed that nearly 31.1% of the Fortune 1000 respondents to our survey were using client/server systems for critical business applications. In 1996, that number had risen to 41.1%. Client/server is growing up -- the applications are more central to the heart of the business -- and the stakes are getting higher.

Some companies have already discovered the one way to determine if their system will scale or fail. These organizations are using automated load testing tool suites that provide a virtual software crystal ball into an application's performance characteristics before the application is deployed. Not only can these organizations graphically see the characteristics of an application as it is being implemented, they also can look into the future and see what an increasing load will do to the applications -- and more importantly, to the business functions and users of that application.

Load testing in the financial services sector

Physically moving a multimillion-dollar-a-day financial trading application in the space of a weekend is not for the faint-hearted. Add the unknowns of a totally new environment -- new terminals, new LAN, new servers, new database -- and the move takes on another dimension of complexity and pressure. Such a move was undertaken recently by a large Wall Street securities-brokerage firm, which ADT was asked not to identify.

In the firm's preparation for the combined physical move/database transition, each major system component was attended to by its team of experts -- database experts, networking experts, operating system experts and other experts. Each subsystem had to be checked, re-checked, and checked again. Yet in this system, as in the vast majority of client/server systems, there was no unifying element that could pull everything together and orchestrate a full-bore system level test.

Real-Time Technology Solutions (RTTS), a New York City-based consulting firm specializing in automated testing software, was initially brought in to develop a test methodology, then implement and deploy SQA's Team Test automated testing system. According to Bill Hayduk, President of RTTS, he started the project looking for application errors. As the project advanced, he and the other members of the development team found that load testing offered far more than multi-user error detection. It provided the basis for consistent, repeatable system level testing. It is one thing to run a subsystem test to validate that records can be read from and written to the database. It is another thing to validate transactions and responses when a few users are driving the application from their PCs. It is yet another plateau to systematically emulate an increasing number of users performing a variety of typical operations and graphically watch response times and system resource utilization. Said Bill Hayduk, "We started the project looking for errors. We ended with a framework for an integrated system level test."

Following the close of trading on a Friday afternoon, the system was powered down and moved to its new home. The following Monday morning, a switch was toggled and the system went live. To date, no major errors have been detected.

Orchestrating an environmental test

The subsystems of a client/server system are like an orchestra. If each component is tuned and working well with the other subsystems, the result is harmony. If only one subsystems is off, the result is discord.

Load testing introduces a conductor into the picture -- a conductor that can direct the operation of the entire environment. If an aberration develops and a certain part of the score needs work, test cases that isolate the errant operation can be run again and again until the problem is solved and the system is functioning properly and within specs.

Just as a conductor cannot guarantee that a member of the orchestra will not make some unexpected faux pas, automated testing software cannot guarantee error-free code. However, it can trap -- and has in an increasing number of high-profile installations -- many of the show stoppers that lie in wait in today's complex client/server systems. Equally important, it can also provide the knowledge and confidence that the system will function under a growing user load.

Different approaches

Today's load testing tool suites generally take one of two approaches. Each approach involves the use of test scripts. Scripts can be generated by recording actual user input, programmed using a test tool's scripting language, or generated using a combination -- first recording actual user input and then modifying those scripts with an editor provided with the test tool. Executing a script is essentially (and simply) the same as a user sitting at a PC and interacting with the application.

Single test per processor

In one approach to load testing, the load is driven by test scripts, which are generated by recording actual user input, executing on multiple PCs.
Source: Software Productivity Group


Multiple test per processor

Test scripts programmatically drive the application from the PCs essentially in the same manner as multiple users would drive an application in a live production environment.
Source: Software Productivity Group

Fig. 2

Hybrid environment

The above graphic is an example of PCs executing individual scripts and the multithreaded systems executing multiple simulated users.
Source: Software Productivity Group

Fig. 3


In one approach to load testing, the load is driven by these test scripts executing on multiple PCs. (See Fig. 1.) In the other approach, the load testing software drives multiple instances of the tests on a single system. (See Fig. 2.)

The multiple PC approach offers the most realistic approach to system load testing. Test scripts programmatically drive the application from the PCs in essentially the same manner as multiple users would drive an application in a live production environment. While this approach is intuitively sound, the reality of the matter is that it is impracticable to build a test system that reflects a full-blown production configuration. Modern day client/server systems comprise hundreds, and in some cases thousands, of PC clients. Physically amassing even a small percentage of those PCs would be a monumental task. For testing such high-end systems, the simulation approach to load testing is more appropriate. If the application demands more simulated users than can be accommodated on one system, multiple systems -- each running multiple virtual users -- can be added. It is also not unusual to see a combination of the approaches with some PCs executing individual scripts and the multithreaded systems executing multiple simulated users. (See Fig. 3.)

Initially, test tool vendors offered just one of the testing capabilities. The single-test-per-PC camp held that performance characteristics could be determined and then interpolated to ascertain the results of adding more users. The simulated user camp retorted that interpolation is not reliable when talking about high-end systems of tremendous complexity. The reality is there is a valid case for both approaches. In fact, it is now common for vendors to offer both capabilities.

Some companies provide internally built product suites, while others augment product sets through alliances. The two leaders in the field today are Mercury Interactive Corp., Sunnyvale, Calif., and the McLean, Va.-based Performix unit of Pure Atria, Sunnyvale, Calif. Mercury Interactive offers virtual user testing via its LoadRunner product suite. Performix, acquired by Pure Atria in 1995, focuses on simulated load testing. Pure Atria will add a GUI-based system from technology recently acquired with Integrity QA. Those plans have yet to be disclosed.

Newton Centre, Mass.-based Segue Software Inc. and Burlington, Mass.-based SQA Inc. had initially offered their own PC-based load-testing facilities (in addition to a merger agreement with Rational Software Corp., Santa Clara, Calif.). By the time this article appears in print, Segue Software Inc. will have followed suit with its own simulation facility.

Other vendors offering various load testing facilities include companies such as: the Dallas-based AutoTester unit of Software Recording Corp.; Compuware Corp., Farmington Hills, Mich.; Performance Awareness Corp., Raleigh, N.C.; and Softbridge Inc., Cambridge, Mass.

More than one nugget

Load testing tool suites can offer a veritable gold mine of benefits. The list below identifies only a handful of possibilities. Note that the terms used below may vary from vendor to vendor. The important issue is the range of capabilities these tools offer and the benefits they can bring to the test tool user.

Indeed, determining the focus of a test is one of the challenges facing organizations that opt to use the tools. Every vendor offers classes on how to use their respective toolsets. But the methodology of automated testing -- what to test versus the mechanics of using the tool suite -- is still a relatively new discipline. These tool suites offer so much power and scope that the most appropriate approach is not always clear to an automated testing neophyte.

Multi-user testing This is the classic error detection functionality. Today's client/server systems have an extraordinary number of moving parts. Not only must each part work correctly on its own, they must work together in an extremely complex environment involving application, operating system, middleware, networking and database software. Unfortunately, what may work correctly with only a few users can fall apart under the load of several users. By stressing the system with multi-user testing, many of these insidious errors can be uncovered.

Architectural validation Many organizations are currently in the process of developing computing architectures that will carry them into the next century. It is one thing to talk of strategies and products. It is another thing entirely to put the architecture into action -- to select the products, and then build the first application that can make or break the architecture. One company is well on its way in this process -- Gap Inc., a $4 billion clothing retailer.

Load testing and Gap Inc.

With a chain of over 1700 customer-oriented retail stores, Gap Inc. has made its mark in the retail industry. Like many organizations of this size, the company is looking at its computing architecture and beginning the process of developing an architecture that will span its computing environment. According to Phil Wilkerson, director of Technical Architecture, Gap Inc. first had to develop its strategy and then select the components to implement that strategy. Core elements of the company's strategy include a three-tier client/server model, a move to object technology, plus an ability to access corporate data in an MVS-based mainframe environment. For the mainstay system components, Gap selected Sun from Sun Microsystems Inc., Mountain View, Calif., for its server engines, Forté from Forté Software Inc., Oakland, Calif., for the application development tool, Tuxedo for the transaction processing monitor and Informix for the database.

With many of the development components selected, Wilkerson believed there was one more piece to the puzzle -- automated testing software that would objectively manage the system level tests and would prove the architecture could handle the load. According to Wilkerson, "We needed a mechanism to reduce the fear factor -- to control the client/server monster."

Mercury Interactive's automated testing tool suite was selected as that mechanism and put to work on a series of tests. The testing was a success, according to Wilkerson. The Mercury tool suite allowed Gap to test whether "the client/server monster" is functioning properly end-to-end, from the interfaces to network connectivity to the build process to the business process. Validating the architecture was not the only hard payback Gap received from these tools, he said. The retailer said the tools also discovered a memory allocation and a RAID channel problem. As a result of system readjustments, the end-to-end response times became phenomenal -- sub-second response time on a test query into a 40,000-product database. Having this kind of information enables an organization to scale networks and hardware to handle production loads accessing huge amounts of information.

Configuration tuning Not only are there more discrete pieces in today's systems, each component alone often involves complex configuration options that can ripple through a system and ultimately affect response time. Manual testing cannot provide the consistent and repetitive testing environment needed to truly stress a system and objectively capture the performance metrics. Load testing, on the other hand, can take configuration optimization from a subjective realm into a world of objective charts and graphs that show the effect of various configuration optimizations. Another issue here is whether a performance bottleneck is the result of a problem with the client/server model (such as two-tier versus three-tier) or because of a hardware configuration problem. At Gap, for example, the hardware caused the problem and a modification to the configuration had a very positive impact on system performance.

System upgrade comparisons Some of the most interesting aspects of the growing acceptance and maturity of the load testing market are the various ways companies use these tools. One of the most resourceful uses involves a system upgrade "shoot-out."

With the flexibility in and competitiveness between today's hardware systems, choosing from the available options for a system upgrade can be a daunting task. Load testing can provide an objective measure of these options -- essentially using the very application that will run on the upgraded hardware. Not only can a company see what the upgrade will (or will not) do in terms of response time for the existing number of users, the company can increase the number of users to determine how much headroom the new hardware offers before another upgrade is required.

User-acceptance testing Always a relatively sticky issue in terms of acceptance, load testing software provides a mechanism for demonstrating to the acceptance team how a system performs under load. Load testing can also provide feedback to a development team and can be used to measure how the team is doing against any performance requirements defined by the system specifications.

System software revision verification Some of the more interesting times in a development group can occur when it upgrades an application to a new version of a vendor's packaged software. Whether an operating system, middleware or other system software component, a new software version can cause havoc in an application that had been working perfectly. Load testing software can provide a safe pre-deployment venue for discovering multi-user errors and can compare performance characteristics with those from the previous version. With the time, effort and expense involved in deploying client/server systems, the last thing an organization wants to do is distribute an application that is inferior to an earlier version.

Not a free lunch

Staffing There is tremendous potential within load testing tool suites. There is, however, a price to pay financially, and perhaps more importantly in terms of organizational operations and infrastructure. The purchase price of these tool suites is the tip of the iceberg.

The staffing issue is often one of the hidden land mines in automated testing, and the majority of companies underestimate the quality of staff needed to develop and manage an automated testing effort.

Companies have come to understand that implementing a client/server system is a non-trivial exercise. Testing such systems is no less complex. Therein lies the surprise for many organizations. Test tool suites are software systems in their own right and require all the resources necessary to care for and feed such a system. The skills present in a manual quality insurance (QA) group do not meet the needs of an automation-based group and must be augmented with software development skillsets.

Many companies that have made the leap to automated testing now recognize this fact of life and have turned to a growing number of third-party testing experts for help with their first project. While this strategy requires an additional up-front expenditure, it can be the most effective way to:

  • acquire intelligence about the methodology of automated testing;
  • eEstablish a formal automated testing program; and
  • test an application.

Infrastructure This aspect may be the most difficult area for many companies to tackle. In many organizations, automated testing tools fall under auspices of a QA group. In spite of the emphasis on quality, the QA function is frequently relegated to the bottom of the development food chain. For automated testing tools to be effective, such thinking must change. Testing tools work best when integrated into the development cycle from the start. As development progresses, testing can often identify issues early on, before changes become a major undertaking. To do this, the QA and development groups must work hand-in-hand. With a premium on development groups, this can be a fairly significant barrier.

The Web -- Let the game begin!

Most end-user, client/server systems are currently intra-company; that is, the users are typically employees of that company. As such, any dissatisfaction regarding performance is within the confines of the company, and solutions can be negotiated within the realm of internal politics and corporate memos.

As companies expose their business systems to the public, performance will become a critical factor. And we need not be talking about high-profile business critical applications. Someone just browsing for information will soon become frustrated with a lethargic system. That company, in the brief space of a mouse click, can lose a customer. With serious buy-sell business applications and the Web's ability to even the competitive playing field, the issue is magnified. With such applications, the question is not if you should consider these tools, but when and which ones.