Guarding Against the Network Menace
Minimize the risk of the unknown when deploying Web applications. Discover a three-step process to monitor performance securely in real time from the browser back to the database.
- By Hom Wong
- February 22, 2007
Using live customers as the test bed for applications is generally considered bad practice, but this practice is exactly what happens in the brave new world of Web applications. Traditional development tools and methodologies don't allow development teams to really test and debug Web infrastructure issues before the real users get their hands on the new application. The result isn't pretty.
As reported by Gartner, up to 30 percent of development time can be consumed by debugging production issues. The important question operationally is how to minimize this 30 percent time sink to call developers to task only when the problem really has to do with the code, rather than their being distracted constantly by infrastructural issues.
Launching a Web application is like hang gliding off the cliffs overlooking the Pacific Ocean near foggy San Francisco. A hang glider jumps from the wind-swept cliff above the roaring ocean, sometimes into a bank of thick fog. This stunt would be a near-death experience without reliable gear complemented by actionable information and the experience required to tackle unforeseen circumstances. Nevertheless, launching new Java applications often emulates this experience.
Most Java developers and application architects diligently adhere to a best-practice development process to ensure the quality of the application. Within this process there are well-defined procedures supplemented by well-understood tools to ensure that—at least in the insular environment of the development lab—features are implemented as specified, the application performs to expectation, or that it isn't consuming too much memory or computing resources.
To carry forward the hang-gliding metaphor, the real concern is not how the glider holds up in the shop, but how it performs when you jump off the cliff. Similarly, to the developer or architect, after ensuring that the application passes the rigor of QA tests in the lab, they must find out how the application will perform when deployed on the Web infrastructure. The unknowns of the production environment abound:
- Are the servers running the expected release of the operating system and middleware to deliver the required performance?
- What other applications are deployed on these same servers that will conflict with or impact the performance of the new application?
- Are the servers properly load balanced?
- What about the Internet cloud? Is there enough bandwidth available consistently to support the rigor of the application?
- If a content delivery network is used to deliver high bandwidth-consuming objects, then how can you ensure the speedy delivery of these objects?
- If the application infrastructure is linked to the Web infrastructure through an application delivery controller (ADC), will the firewall, SSL off-loading, caching, and compression schemes being implemented impact positively or negatively application performance? Note that ADC technologies, according to Gartner, reside in the data center and are deployed asymmetrically—meaning only on the data center end—and accelerate end-user performance of browser-based and related applications by offering several technologies that work at the network and application layers.
- How will the uncontrollable "last mile" components—the end user's PC, browser, and Internet connection—impact the overall performance and availability of the application?
The list of questions goes on and on. Because of such complexity, there is no way to adequately eliminate all the risk factors associated with deploying a Web application.
Ensure Application Performance
While the risk of deploying Web applications cannot be completely eliminated, this simple three-step approach will greatly enhance the chance of smooth deployment and accelerated ramp-up (to ensure the newly deployed application doesn't share the same fate as Icarus):
- Trace all load-generated (synthetic) and beta user (real) transactions from end to end to identify and fix application speed bumps created by the interaction of the new application with the Web infrastructure and other applications running on the same infrastructure as well as bottlenecks within the Web infrastructure (see Resources).
- If an ADC is available, adjust the application delivery features (caching, compression, and so on) of the ADC that separates the application infrastructure from the Web infrastructure to compensate for the performance issues discovered through step 1.
- Apply the real-user monitoring and performance diagnosis capability to monitor real transactions in real time to give IT operations and development a common platform to triage and resolve performance problems before it impacts user experience (see Figure 1).
Load testing is a common technique used to ensure that the application will be able to perform adequately under the volume of transactions the application is expected to handle in production. Developers generate load testing scripts that mimic how a user is intending to use the application. These scripts will run on computers to simulate the effect of hundreds or thousands of users accessing the application simultaneously.
The limitation of load testing is whether the test scripts truly replicate how a real user will use the application, or whether the collection of test scripts will stress test all parts of the application. To alleviate the difficulty of deficient test coverage from load testing, beta testers are solicited to inject reality into the test process. Beta users of an application can discover confusing aspects in the user interface, and more importantly to the developer, use the application in unforeseen ways that reveal bugs within the application.
To maximize the effectiveness of load and beta testing, a mechanism should be deployed to track all synthetic or real transactions through the entire production Web and application infrastructure from browser to database. The data collected should be correlated and analyzed to obtain a clear understanding of how much time each component within the Web or application infrastructure takes to process the transactions.
This browser-to-database performance profile can identify bottlenecks within the infrastructure including, for example, whether the deployed servers are capable of handling the load, whether the load balancers are properly programmed, or whether a frequently used or slow-performing method call or SQL query can be tuned or optimized to deliver better overall performance. By mapping how all user transactions flow through the infrastructure or application, the developer can also determine where future performance improvement or cost-cutting efforts can most effectively and efficiently be applied.
Critical Network Piece
By tracking load testing and beta transactions, you can also ensure that the ADC is properly configured. The ADC is a critical network appliance that sits between the Web infrastructure and the application infrastructure. Since this single appliance performs load balancing, application delivery acceleration, and network security functions, it is crucial that this appliance be configured appropriately for the application it delivers. For example, a higher compression rate will consume less network bandwidth.
However, for an application that doesn't demand significant network bandwidth, it might not be helpful to configure the ADC to operate at a high compression rate as it will consume more computing resources on the ADC, resulting in lower overall performance. By measuring the performance profile of all Web infrastructure components serving the application using the previously mentioned browser-to-database monitoring approach, you can determine the best ADC configuration that delivers the best application performance to the end user.
The last crucial element to improve the chance of success is to deploy for production use a monitoring tool to measure transactional performance from a user's perspective similar to the one used to accomplish the first step listed previously. With such a tool, IT operations can be alerted proactively to degrading application performance from the user perspective. Also, by tracking and tracing an ill-performing transaction in real time from the browser all the way back to the database, IT operations can quickly identify the component—whether hardware or software—that is causing the slowdown.
This combined monitoring-and-diagnosis approach delivers actionable information to allow the appropriate functional expert—the server administrator, network administrator, database administrator, or application developer—to be dispatched quickly to fix the performance problem before impacting the end-user service level.
This approach is similar to an experienced hang glider pilot internalizing in real time all the environmental factors like wind speed, terrain, lift, temperature gradient, and so on to make the right adjustment to the pilot's center of gravity to turn, slow down, or speed up the craft. As with a hang glider pilot, since IT cannot possibly test and rectify all infrastructure and application issues that impact performance before the fact, the only viable approach is to proactively monitor performance as experienced by users and have the actionable information available in real time to take corrective actions ahead of a crash (see Figure 2).
The challenge of tracking and monitoring all or even a portion of the transactions in a production environment in real time is that the tool has to be super lightweight and scalable. It is certainly troublesome if the performance management tool slows down or increases the resource consumption of the application it is intended to monitor. The rule of thumb for a production-grade monitoring solution is that the tool should consume less than 5 percent of the total capacity of the network or servers that it is monitoring.
The benefits of using this three-step approach to deploy and manage Web applications in production are preserving or improving the end-user experience for higher customer goodwill or improved productivity, lowering the cost of managing Web applications by automatically pinpointing the cause of performance problems instead of organizing a multifunctional triage team to debate and attempt to reproduce performance problems, and establishing the performance profile of all Web infrastructure components for capacity planning and optimization purposes.
Browser Monitoring
The prerequisite to implementing this three-step approach is deploying the right unintrusive tool to monitor real-user experience from the browser's perspective. Because of the complexity of Web applications, especially composite applications (see Resources), all the objects forming the page or transaction dynamically come together at the user's browser. The required objects or data could be served by different servers, disparate data centers or hosting facilities, or even by third parties like content of application delivery networks (C/ADN), or software as a service (SaaS) vendors (see Resources). The assumption made in client-server computing, whereby, if the server is operating within its performance threshold, then the client is enjoying normal service level is no longer valid. The only accurate means to measure what service level is delivered by the infrastructure to the end user is by directly measuring transactional performance at the browser.
Besides direct browser performance measurement, there are other techniques that can provide an approximation of the application performance experienced by the end user. These alternate approaches include:
- Using a sniffer-like appliance to measure and report on the approximate roundtrip time of a packet. This approach cannot accurately measure pages composed of frames or when part of the application is served by C/ADN or SaaS vendors. Furthermore, the round-trip time data say nothing about what goes wrong within the infrastructure, and the dreaded time- and labor-consuming triage team approach is needed to replicate and identify the cause of problems.
- Subscribing to a third-party monitoring service where synthetic transactions are initiated from various computers operated by the service provider and the response time is measured and recorded. This approach can effectively provide a baseline for Web site performance. Beyond that, the accuracy of the measurement depends on whether the script used to generate the synthetic transaction adequately reflects how a real user interacts with the application, and whether the service provider's test computer is similarly situated in the Web infrastructure as the real user's computer. Even if you can overlook these inaccuracies, there is still the concern that the monitored data is not actionable as in the case of the sniffer approach.
There are two ways to measure application performance from the browser. The most obvious approach is to download an agent on the end user's computer. The agent can then measure application response time, error, machine statistics like CPU and memory utilization, and so forth and report back to some data collection server. Unfortunately, this agent-based approach is not practical for public-facing Web sites and perhaps most intranet deployments because most users are cagey about downloading agents. Deploying an end-user monitoring agent also means that IT has to bear the extra burden of maintaining and supporting the monitoring agent on a large number of distributed desktops and laptops.
The ideal approach is to adopt a mechanism where the monitoring probe is injected dynamically into the payload going to the end user at the beginning of each session. This injection can be done using the Web server or by the ADC appliance. In doing so, the end user doesn't have to be encumbered with any downloads, and IT can easily maintain the probe from a centralized location of the data center without having to touch each end user. Performance information or error condition will then be reported back to the server that injected the probe in the first place, alleviating security concerns.
Profiling Performance
A simple three-step process can be used to minimize the risk of the unknown when deploying Web applications, allowing for bottlenecks within the Web infrastructure or speed bumps created by the interaction of the application and the infrastructure to be discovered and eliminated prior to launch. The process starts by tracking and monitoring all real transactions as they are processed by Web and application infrastructure components from browser to database.
The resultant performance profile can be analyzed to pinpoint performance bottlenecks or areas within the application that can be tuned to achieve better performance. After the application is launched, the same technique can be used to monitor all or a portion of the real transactions, allowing performance issues to be discovered proactively and resolved before they affect real-user experience.
The key to executing this three-step approach is the ability to monitor performance at the real user's browser securely and unintrusively. Performance problems from the real user's perspective can then be triaged and resolved in real time by tracking the ill-performing transactions from the user's browser all the way back to the database through the Web and application infrastructure. Only with the agility enabled by real-time, holistic, and actionable information on application performance from end to end can you feel confident about launching and managing a Web application.