Software Engineering Secrets
Top engineers outline techniques for improving the corporate software development process.
- By Prasad Kunchakarra, Andrew Turner
As we are all aware, there are no silver bullets in software engineering. But there are techniques, or secrets, that can improve your chances of developing a successful product. Getting the right architecture is the single most important factor for building faster, cheaper and better solutions. The following is a list of best architectural practices for architecting distributed systems.
Secret 1: Use the appropriate number of tiers. Keep the architecture simple by minimizing the number of tiers and the number of components in each tier. Before adding another tier or component, ask yourself the following question: "Will it add any benefit?" If the answer is no, the component or tier should not be introduced. For example, if you need to develop a simple Web-based application that needs to retrieve a few rows of data from databases with no transactions, a separate server that hosts business logic components could be overkill. A JSP-enabled Web server could easily support such functionality.
On the other hand, if you need to develop an enterprise online accounting solution that has to support user authentication/authorization and access to multiple databases on multiple platforms, the functionality should be distributed across multiple services. The services could include service for authentication/authorization, service for supporting database transaction logic, and a separate service for each type of presentation logic.
Distributing the functionality between services makes this solution reusable. The authentication/authorization service can be reused by several applications.
Secret 2: Take advantage of legacy systems. Do not rewrite or duplicate existing application/system functionality. If an existing legacy asset adds value to the solution, the asset can and should be leveraged. Legacy applications can contain large amounts of business logic, but may not have a great deal of documentation that describes all of the business rules precisely and accurately. Rewriting the functionality supported by legacy applications can be a time-consuming, resource-intensive and risk-prone task. Therefore, it is always advisable to use existing legacy applications.
Secret 3: Consider the expected life of the system. Understand and take into consideration the shelf life of the system. If a system is meant to be short-lived, limit the time spent on architectural analysis. If the system has a longer shelf life, an extensive architectural analysis is necessary.
Secret 4: Build services that can be reused. If the system you are building has enterprise-level usability, design the architecture to maximize reuse. If you are developing a service to perform transactions against an enterprise repository, the services can most likely be reused by multiple applications. If you are building a small application whose business value beyond one year is questionable, reuse may not be an important issue.
Secret 5: Make sure you understand the requirements. Thoroughly understand business requirements, and infer key concepts such as business situation and business functions. The technology and architecture being evaluated or selected should be appropriate to the business problem that has to be solved.
For example, a Web-based financial app that serves customers could have different constraints than a Web-based human resources application that serves employees. The high availability and scalability issues associated with the customer-oriented application will lead to different architectural choices compared to an internally facing human resources application.
Secret 6: Use design patterns. Do not try to reinvent the wheel. Follow the concepts of patterns wherever applicable. A patterns-based solution can be easily communicated and has a proven track record. From an architecture point of view, patterns can be applied to identify the number of tiers, and the components within each tier.
Secret 7: Follow the rules. Conform to the organization's enterprise architectural standard, if one is available. Developing, supporting and maintaining applications using multiple and competing architectures is considered expensive and resource-intensive. By sticking to a single architecture and/or platform, you will garner more support from the operations or infrastructure groups, as well as other developers. If a security flaw is found in a Netscape or IE browser, how many people will scramble to analyze and fix the problem compared to a problem with Joe's browser?
Design best practices
It is now time to learn the secrets of the design phase. While this phase is often skipped in favor of starting to write code, that is not how the pros develop systems.
Secret 1: Optimize business rules. The first step in design is to verify the business rules (requirements). Business rules should be consistent with realistic expectations for the number of concurrent users and the transaction response time. Based on the existing infrastructure, chosen architecture, design and system constraints, and time and resources, you should negotiate for realistic and pragmatic business rules that can actually be implemented.
Secret 2: Optimize data design. Carefully analyze the entities and relationships, and design the database schema to meet performance goals. In this phase, determine the primary and foreign key indexes. Initially, you should normalize the database to avoid duplicate data storage. After the normalization, you may need to denormalize for performance reasons.
Secret 3: Proactively tune the database and SQL during development. If the response times for conducting the functionality are critical, performance tuning becomes an important issue. It has been observed that proactive tuning of systems during design and development is exponentially cheaper than reactive tuning of production systems. Therefore, an up-front investment in time and resources for tuning performance is highly recommended.
Secret 4: Optimize application design. Based on the requirements and chosen architecture, try to apply design pattern concepts to identify and design software components. Identifying and implementing the right pattern is an important element of the entire design process. Use object-oriented design concepts to design objects, and the Unified Markup Language (UML) to represent components, objects, relationships and interactions. If you are developing an application that interacts with databases, also take secrets five through nine below into consideration.
Secret 5: Let the database (not the code) do the work. Application developers can perform some of the calculation within the database system (using database functions and advanced SQL queries) or they can retrieve data and perform calculations within the applications. In the majority of situations that need simple calculationssuch as min, max, average, ordering and retrieving top 10 valuesthe database can perform the calculations more efficiently than the applications. Optimize database queries by extensively using SQL optimization techniques for complex queries that deal with large amounts of data. The optimization of algorithms within an application takes longer and is more error-prone than the optimization of SQL queries.
Secret 6: Optimize the design of database schema with indexes. Work with the database administrator and information architect to further optimize database schema by introducing indexes. If you are querying a table with a "where" clause against a column that is neither a primary nor a foreign key, the column would be a good candidate for creating an index. Indexes allow faster retrievals, but slower updates and inserts. Therefore, make sure the data is neither over- nor under-indexed.
Secret 7: Recalculate result sets using materialized views. Queries to a large database involving joins and aggregation such as SUM or both, are very expensive operations in terms of time and processing power. One way to improve query performance is to pre-calculate expensive join and aggregation operations on the database prior to execution time, and to then store these results in the database. Oracle supports this concept through materialized views, a generic database object that summarizes and pre-computes the results. In general, materialized views are suitable in distributed decision support and data warehousing applications.
Secret 8: Model the solution. Use an appropriate amount of modeling (at a minimum, use class diagrams and sequence diagrams). For OO systems, use UML with Rational or Together/J. Class diagrams should at least show public methods with arguments and return values. Sequences should be completed for all nominal use cases and any others that cause the flow to change significantly. How much time will this take? If the component analysis and basic business objects are already documented, it should not take more than a couple of days for a simple system or a couple of weeks for a medium-sized system. We find that doing the sequence diagrams brings some important issues to light. For example, what is the key value passed from the client to the server to retrieve a record? What if the record cannot be found in the database; will an exception be thrown? Is the client developer aware that code must be checked for this exception? These types of realizations often occur during modeling.
|Real-world example: Portfolio Spotlight Project
|The Portfolio Spotlight application is a non-trivial Web-based app that allows users to input financial portfolios and holdings, and then see the financial makeup graphically. Users can also return to the Web site to see how the market has affected their portfolio.
Here is a list of characteristics associated with the Portfolio Spotlight Application:
With these business requirements in mind, the following architecture decisions were made. Because the system needs to perform intensive calculations within a very short response time, the actual business logic components are hosted in Java-/CORBA-based servers. To keep maintenance costs to a minimum, separation of presentation components from business logic components is an obvious choice. The Model-View-Controller pattern was selected to distribute the responsibilities across the components.
- Web-based user-to-business financial application
- High availability
- Perform intensive financial calculations and display the results within short response times as required by Web-based apps
- A tight schedule of one month for development
- No legacy components involved
The components hosted by the Java/CORBA servers are expected to serve the Model functionalities. ColdFusion tag-based pages (CFM) handle presentation logic. A Java-based servlet hosted by the ColdFusion App Server serves as a controller component, mediating the requests and responses between a Java/CORBA server and the ColdFusion-based CFM pages. Because the framework and expertise were readily available in-house, ColdFusion-based technology was chosen to develop the presentation components. In addition, its rapid application development capabilities could be fully harnessed.
Analyzing the portfolio
Portfolio Spotlight creates and maintains portfolios and performs analysis to identify the diversification of portfolios. The portfolio was analyzed with respect to Sector Analysis, Asset Class Analysis and Overlap Analysis.
Since the number of holdings to be calculated can affect response time, it is important to work with the business user to optimize the response times and number of holdings. Expectations of the customer and the average number of holdings per customer aid the decision process.
At a higher level, the sector analysis and asset class analysis involved simple multiplication of two columns from two different tables and summing the resulting values for each category of sector or asset. The multiplication was performed while retrieving the rows themselves, instead of retrieving the values, storing them in hash tables and then multiplying them in the application program. Iterating through result sets in the application program is always expensive. Therefore, it is prudent to perform as many calculations as possible using SQL while retrieving the data.
For a given portfolio that includes more than one holding, the overlap analysis identifies all of the overlapping stocks between the holdings by taking into account the underlying composition of the holdings. The holding can be a stock, mutual fund or annuity. The underlying composition of a mutual fund indicates the stocks in which the mutual fund has invested. The analysis included the identification of stocks that are common to mutual funds or annuities of a portfolio. This involved a comparison of hundreds of rows of data spanning multiple tables. At the beginning of the project, it was decided that optimization of the overlap analysis from all aspects is very important to the success of the project.
To manage the risks associated with the analysis, we used these best practices:
The next level of tuning included creating materialized views that pre-calculate the results from joins. Data is refreshed once a day. The scripts that created these materialized views ran immediately after the fresh data was incorporated into the tables. Queries ran against materialized views instead of multiple tables.
- Developing an early prototype to perform overlap analysis. We developed several SQL queries using multiple approaches. Response times associated with each query were then measured, and the best-performing query was selected.
- Stress testing the prototype to identify response times under different concurrent users. We found response times exceeded the threshold limit with the optimized SQL.
- Exploring options for tuning the database schema. By careful analysis, we created additional indexes. Performance was still found to be well above the accepted limit.
Modeling and code inspection
During the modeling phase we discovered the need for primary key creation that needed to be done in the mid-tier server. The DBA needed to create some sequence tables in the Oracle database. We decided this was the best way to handle globally unique identifiers because we wanted to be able to add clones of our mid-tier server on demand. In addition, we finalized which exceptions could be thrown by each of the mid-tier server methods (for example, StockNotFoundException and MustSupplySharesOrDollarsException).
During an early code inspection of the server component, we found that database resources were not freed in a finally clause. This could have been difficult to detect before stress testing or production. Often, this type of problem is easy to catch in a peer inspection, but difficult to find and correct later. We also found the error logging to be light, so we came up with a simple standardwriting the date, error description, appropriate variables and stack trace in the log file for each severe problem.
Some methods also didn't check for bad inputs using jTest. The jTest output showed exactly what class, method and line number had the problem, as well as sample invocations that cause the identified trouble. Some cases of inefficient appending to String variables were flagged, so we switched to using a StringBuffer for that purpose.
Finally, we found a few issues with how the client component dealt with CORBA connections. There was a problem with the bind() call leaving some unfreed resources, and some unexpected coding was required on the client side to minimize rebinds(). We also found two places where the server was not releasing database resources. This problem had been detected in the coding phase, but a small percentage managed to sneak into system testing. With 25 simulated users doing transactions over and over again for an hour, it was not long before all transactions failed with "unable to get database connection." We then knew what to look for and by that afternoon the testing was cooking again.
Prasad Kunchakarra and Andrew Turner
Coding phase best practices
Now that the design effort has been completed, it is time to focus on coding. This is the phase most programmers can't wait to jump into. The following best practices will help you minimize the time you spend doing on-call support and maintenance.
Secret 1: Have code inspections early in the game. Peer inspections ... Wait a minute, why are we talking about peer inspections? Why not list some of the coding secrets? Listing coding secrets is impossible, there are just too many things to write aboutdatabase connection pooling, JavaDoc, efficient coding, using exceptions, coding style, maintainability and so on. Also, the rules keep changingthe important secrets for JSP writing are different from Servlets, and EJB coding secrets are different from CORBA coding secrets. What we can do is utilize a group of developers to verify that the code includes as many secrets as possible.
Peer inspections should be performed while there is time to affect the product under review. In other words, a code inspection must be performed while it still makes sense to change the code. It is unbelievable how many times we have showed up at a code inspection to find the code is about to be moved into production. Peer code inspections need to be performed as soon as a class in finished being coded. For a junior developer, or a developer new to the underlying technologies or application domain, there may be as many as three code inspections. Inspections should be conducted by peers, other coders within the group and, hopefully, by the group's best coder. The meeting should be spent listing all agreed-upon problems with the code and briefly discussing superior techniques. Deep discussion or issues that cannot be resolved rapidly should be taken off-line. We recommend distributing the product for review a day before the meeting, as neatly printed source code with line and page numbers. Also include some information from the previous phase (i.e., architecture diagrams for design inspection or models for code inspection).
Unit testing best practices
Secret 1: Use automated tools to aid in unit testing. An automated unit-testing tool provides a rapid list of defects that might otherwise be found during a code inspection or unit testing. The types of issues found by a unit-testing tool can include unused variables, resources not closed, methods too complex, failure to check for faulty inputs, and the inefficient use of Strings. A tool such as Parasoft's jTest allows developers to add their own rules. For example, we wrote a custom rule to verify that ResultSets were properly closed; this is a common problem that might otherwise not be detected until production. By default jTest produces lots of output with long lists of definite or potential problems. However, each rule or group of rules can be turned off to produce an efficient listing of defects.
Each method should be executed to verify its nominal path, and each error path should work correctly. In some cases, there may be multiple nominal paths to try. Verify that the outputs are correct, or the database has been modified as expected. For server code, we recommend putting a main method in each class and executing methods outside of the client tier.
Stress testing best practices
Secret 1: Use automated tools for stress testing. With Web-based applications, it is imperative to verify that the application will work properly when many users are hitting it. Tools such as Segue show the turnaround time for x concurrent users or x users hitting repeatedly over a period of time. This type of testing flushes out issues with SQL efficiency, failure to free resources, and general coding efficiency. It is important to have a feel for how many users the system should support within a given response time. Also, Segue allows the tester to verify a Web-based app using different browsers and Internet connections.
Best practices for any phase
Secret 1: Deal with issues as soon as possible. As soon as a problem is found, it should be dealt with. If you realize during the architecture phase that you have never seen a JSP client talk to a CORBA server, do a simple prototype right away. If, during the design phase, you realize that nobody has ever created 1,000 test logins for stress testing before, you should think about how that will be completed. Do not wait until the "proper" development phase to prototype a solution.
Secret 2: Keep a written issue log. Keep a list with the issue, responsible party and resolution. Review the list once or twice a week to ensure that issues will be resolved in a timely fashion. When an issue comes up that requires the business sponsor or project lead to make a decision, you should provide them with the issue, choices and a recommendation. Note if any of the solutions will cause a delay and how long it will take. To ensure that issues do not become lost or put on the back burner, add the following statement to your correspondence: "If a response is not received within 24 hours, we will proceed with Option A."
Secret 3: Help each other. If an individual member of the team is stuck on a problem, get the entire development team into the loop. Sometimes, a senior mid-tier developer may know more about some obscure Oracle feature than a new DBA. It is no consolation to the GUI developer if the server component or the database was responsible for the project's failure. By helping each other we all learn more and become more valuable as developers.