Loitering Objects and Java Framework Design
- By Leonard Slipp
- February 5, 2001
ONE OF THE key objectives early in the design of Java was to create an environment that eliminated the memory management problems (such as buffer overruns, de-referencing invalid pointers, and memory leaks) that have long plagued C/C++ development. The desire to achieve this goal contributed to the decision to remove all forms of explicit memory address manipulation from the Java language, as well as eliminate any direct means of reclaiming the memory allocated to an object. The emphasis on memory safety extended to the design of the bytecode instruction set and the runtime operation of the JVM, with its checks on array bounds and reference casts and its built-in garbage collection mechanism.
1 Collectively, these features have enhanced both the productivity of Java programmers and the integrity of their applications.
Although much of the memory management burden inherent to C/C++ development is no longer a concern to the Java programmer, the responsibility of managing a finite memory resource has not been eliminated. Ill-designed Java applications can exhibit the same macro-level symptoms (that of a continually growing process size) as traditional memory leaks. This undesirable memory growth is a consequence of unintentionally retaining references to objects that have outlived their usefulness to the application.2,3 These loitering objects degrade an application's performance by needlessly retaining memory from the operating system and increasing the activity of the memory management subsystem within the JVM. In its most extreme form, loitering objects can lead to the application's abnormal termination from a java.lang.OutOfMemoryError.
Reference Management
Effective memory management begins with a clear understanding of the assumptions built into—and the limitations of—Java's memory management policies. The memory allocated to an object is eligible for reclamation by the garbage collector when that object is no longer reachable by the running Java program. This determination of reachability begins at a foundational set of object references within the running JVM called the root set. While the members of the root set are drawn from several areas in the running JVM,1 the two principle sources of root set elements within a Java program are:
- The static reference variables within class definitions
- The reference variables within the method frames of each Java thread stack
The first source is relatively easy to appreciate, but the second requires elaboration. Each Java thread of execution within a running JVM has a thread stack. When a thread of execution enters a method, the arguments to that method's invocation, as well as that method's local variables, are stored within a new method frame that is pushed on that thread's stack. For the duration of time the method frame is on the thread stack, the reference variables defined within it are members of the root set. When the thread of execution exits the method, the method frame is popped off the thread stack and the reference variables defined within it are withdrawn from the root set. An important but often under-appreciated aspect of the root set is that its contents change dynamically. The static reference variables within class definitions are relatively permanent members of the root set, while the reference variables local to a method in execution are often transitory members.
The elements of the root set directly reference objects within the heap of the running JVM. Those objects in turn typically contain reference variables that refer to further objects, which are indirectly reachable from a member of the root set. As long as a path exists either directly or indirectly from a member of the root set to an object within the heap, that object is reachable by the running Java program. Because those objects are accessible through some reference variable within the Java program, they could, from the JVM's point of view, be required in a future path of execution and so must be retained through the garbage collection cycle. The garbage collector only reclaims the memory of objects that are not reachable by the running Java program.
With this insight into the relationship between reference variables and garbage collection, we can make the following observations. The first is based on the fact that reference variables within a Java program can be defined in three forms:
- Class-based: Reference variables within a class definition that have a static attribute associated with them.
- Instance-based: Reference variables within a class definition that do not have a static attribute associated with them.
- Method-based: Reference variables defined within the scope of a method.
Assignments made to method-based reference variables within methods of short execution time will not be responsible for the formation of loitering objects because those reference variables are transitory members of the root set. However, assignments made to either class-based or instance-based reference variables, or to method-based variables within methods of long execution time, can—if the assignments are not well managed—lead to the formation of loitering objects.
A second observation is that, from the point of view of the Java programmer, instantiating an object within Java's runtime environment is an explicit action. The programmer must invoke a constructor with the new operator, or invoke the newInstance() method of a java.lang.Class object. Eliminating an object from Java's runtime environment is an implicit action that occurs once that object is no longer reachable by the running program (thereby making the object eligible for garbage collection). Unfortunately, Java programmers occasionally overlook the steps necessary to ensure that implicit actions are taken.
The key to effective memory management in Java is effective reference management. Java programmers must be conscious of the references that are established to and removed from objects within their application. Although reference management is relatively straightforward when one person explicitly designs and implements a piece of code, it becomes increasingly complex when that software is integrated with other developers' code. When a programmer passes an object reference to an API developed by someone else, such as a colleague who developed a class library or the vendor of a Java-based application framework, what further unintended references to that object are established? Once established, how does the Java programmer find and eliminate those references to ensure that loitering objects are not formed? If these questions are left unanswered, effective reference management—and effective Java memory management—becomes much more difficult.
Java-Based Class Libraries and Application Frameworks
Object-oriented development environments enhance productivity by facilitating software reuse. While the most common form of reuse is the class library, the highest form is that of the application framework. An application framework embodies a generic design comprising a set of cooperating classes that can be adapted to a variety of specific problems within a given domain.4 As a software development model, application frameworks provide many advantages,5 including less code to design and implement, improved consistency, reduced maintenance overhead, improved integration and interoperability, proliferation of expertise, and orderly program evolution.
One of the critical aspects of Java's success has been the development of Java-based application frameworks such as JFC/Swing, Java Servlets, and EJBs. Indeed, it is rare to find Java development that is not undertaken in the context of an application framework. The framework-based development model is clearly an integral aspect of Java technology. However, without careful consideration of reference management issues by the framework vendor, the problem of loitering object formation can become a difficult one for the framework client to manage.
Framework-based development is a unique form of collaborative partnership that in turn requires a collaborative approach to memory and reference management. Although the responsibility for memory management is shared between the framework vendor and the framework client, the framework vendor takes a leadership role through:
- Design of the framework
- Implementation of its infrastructure
- Documentation provided to the framework client
Together, these three aspects determine the degree to which framework clients can effectively manage references in their applications.
Class Library and Application Framework Design
The development of a successful class library or application framework requires a deep study of the problem domain, and a careful analysis of the roles and responsibilities for the various objects within the environment. While there are many issues that must be carefully weighed by the architectural team, there are two design activities that form the basis for effective reference management:
- Defining object life cycles: Defining the life cycle of an object within the runtime environment entails explicitly stating its point of creation, the duration of its usefulness, and the point at which it should be eliminated from the runtime environment. The reference management value of explicitly defining the life cycle of an object is that it provides clear points at which the framework vendor can confirm that the Java-based implementation adheres to the life cycle defined in the specification. An object that exists beyond its intended lifetime is, by definition, a loitering object.
- Defining object interrelationships: Objects in the runtime environment establish various associations with one another as they collaborate to accomplish their goals. These relationships can take on different forms such as composition (a has-a relationship) or associations (a uses-a relationship). Defining the nature and duration of various object interrelationships is critical to reference management because the Java-based implementation of those relationships is based on assignment to various reference variables.
As the architectural team defines the roles and responsibilities of various objects within the runtime environment, there is a natural partitioning of these objects into two groups:
- Framework infrastructure objects: These objects provide the environment for objects defined by the framework client.
- Specialized client objects: These objects contain the specialized logic for a specific client's needs. To interact with the framework infrastructure, these classes typically implement a Java-based interface or extend an abstract class defined by the framework vendor.
Separating the two groups is the framework boundary—the class and interface definitions that are visible to each object group. From the framework client's perspective, the framework boundary is the set of framework infrastructure objects visible to it. From the framework infrastructure's perspective, the framework boundary is the set of framework client objects visible to it.
Defining the framework boundary provides a valuable "separation of responsibility" for effective reference management. The management of object references that lie solely within the framework client or solely within the framework infrastructure is relatively straightforward for the respective author to control. However, when an object reference crosses the framework boundary and is retained by the other side, loitering objects can form.
It is a familiar axiom of software development that the fastest and most error-free code in an application is that which by design is not written. By the same logic, through careful, iterative refinement of the methods comprising the framework boundary, and minimizing the references (through method arguments and return values) that have to traverse it, the framework vendor can constrain the points at which reference management needs to be enforced. Of the points of reference transition that do remain, the three most common requirements for a reference specification are:
- Transfer of object ownership
- Establishment/revocation of associations among objects
- Retrieval of object-based attributes
Object Ownership. In addition to defining the life cycle of objects in the context of various use case scenarios, the design team must clearly define the issue of object ownership. The object owner will be either the framework infrastructure or the framework client. The owner is responsible for implementing the logic to ensure that, subject to the correct use of the framework, the object is eliminated from the runtime environment at the end of its designated life cycle.
As noted earlier, creating an object within the Java runtime environment is an explicit act, while its elimination is an implicit one. With the clear assignment of responsibility for ensuring that an object is no longer reachable to the running Java program at the end of its designed lifetime, reference management becomes much easier for both the framework design team and framework client to control.
As a general guideline, it is desirable to have the same party responsible for both the creation and elimination of an object from the runtime environment. There is no ambiguity of "ownership" of the object or of who is responsible for its life cycle; that party simply looks after the objects that it created. In the case of JFC/Swing, Java Servlets, and EJBs, the underlying framework is responsible for the life cycle of objects of the specific classes developed by the client programmer.
While the guideline just outlined is valuable, there may be cases where it may not be appropriate. One party is responsible for creating an object, while the other is responsible for ensuring its elimination from the environment. In such cases, it is critical that the transfer of ownership be explicit, especially for the framework client. One technique to emphasize this transfer is to adopt a naming convention for the methods that are responsible for the ownership transition. Taligent's C++-based CommonPoint Framework established the following naming conventions6 for methods that allocated, managed, or accepted responsibility for storage:
- Methods that make a new object, that the caller must delete, begin with Create...
- Methods that copy an object, where the caller must delete the copy, begin with Copy...
- Methods that abandon an object and pass deletion responsibility to the caller begin with Orphan...
- Methods that accept an object the caller has allocated and take responsibility for eventually deleting it begin with Adopt...
In cases of ownership transition, the party that creates the object but subsequently relinquishes ownership must ensure that it does not needlessly retain a reference to the object after the method invocation responsible for the transfer.
Object Associations. As objects collaborate within the runtime environment, they establish associations with one another (in which an object reference is typically retained). Take, for example, the Observer pattern.7 The object assuming the Observer role must register with the Subject object before it can receive notifications of the Subject's state changes. The association remains in place until notifications are no longer required and the Observer is deregistered from the Subject.
A critical aspect of defining associations that cross the framework boundary is that the framework design team provides methods to facilitate both the establishment and revocation of the association. This notion of symmetry must exist for every relationship that can be established. For every add() action, there must be a remove(); for every register() action, there must be an unregister(). Without the ability to revoke an association that has been established, the framework client will not have a means of managing the underlying references that are established to the objects he/she owns.
Object-Based Attributes. There is often the requirement that an object on one side of the framework boundary will need to retrieve an object-based attribute from an object on the other side. For example, a framework client object may need to retrieve the collection of children managed by a parent framework infrastructure object. Without careful consideration of the use of the reference returned to and possibly retained by the caller, a loitering object may unintentionally form.
Generally, it is undesirable to return a reference to an internal object-based attribute to the caller. Once an element of an object's internal state is exposed, it could be modified and thereby undermine the integrity of the object and the application. Nonetheless, a design team may elect to expose the internal state of the object but inform clients (through the method's documentation) that they should not retain a reference to the object. A more defensive measure is to make a copy (typically through a clone() method) of the object-based attribute and return the copy to the caller. The caller has sole ownership of the returned object, with the responsibility of eliminating the reference that retains the object within the runtime environment. This approach protects the internal state of the object supplying the attribute, but introduces the cost of additional object creation.
An alternate approach is to enforce that the caller first allocates an instance of the object-based attribute being requested and passes it into the method invocation. The object supplying the information simply fills in the elements of the object and returns it. This approach has the advantage of keeping object ownership strictly on one side of the framework border, and it also addresses a common performance problem in Java applications: excessive allocation of short-lived objects. The JFC/Swing library takes this approach with the following method:
public
Dimension
java.awt.Component.getSize( Dimension rv )
If the method is invoked with a
null argument,
getSize() will instantiate a
Dimension object and return the new object. Alternatively, the caller can preallocate a
Dimension object and pass it as an argument to the method.
getSize() will simply update the individual fields of the
Dimension object passed to it and return that object.
Class Library and Application Framework Implementation
Once the design of the class library/application framework has been established and reviewed (typically through a peer-review process), the Java-based implementation begins. In most cases, loitering objects arise from simple coding oversights or omissions, such as the failure to nullify a class-based or instance-based reference variable or neglecting to remove an object from an internal data structure.
Defining the life cycles and interrelationships of objects as part of the framework design is invaluable in defining a testing strategy that validates that the implementation adheres to its specification. The best resource the framework vendor can have is an extensive test suite that applies the framework in a variety of scenarios representative of its use by the framework client. By using a heap analysis tool such as JProbe or Jinsight to examine the JVM heap contents, framework vendors can assure themselves that, subject to correct use of the framework, loitering objects will not occur.
Class Library and Application Framework Documentation
The framework vendor must provide clear documentation on the correct use of their product. With respect to promoting effective reference management, this entails the following:
- Programmer's manual: To use the application framework correctly, the framework client must have a firm understanding of the conceptual model that the environment is built on. An integral part of this effort is understanding the life cycle of the objects that are visible to the client programmer and their role in the collaborative management of object references.
- Examples: Complementing the programmer's manual, there should be an extensive set of well-commented examples that demonstrate how to use the framework and how to manage the references that cross the framework boundary.
- Javadoc-based API documentation: The method-based API of the framework boundary is typically covered in detail by the javadoc-based documentation. One technique that is highly valuable in promoting effective reference management is to define for each reference-based parameter whether a further reference to the object that parameter represents is retained by the underlying framework. If so, include a note describing the symmetric method that the framework client should later invoke to eliminate the additional references that were established to the object.
Conclusion
Effective memory management in Java relies on effective reference management by the Java programmer. Reference management can be difficult in the context of framework-based development, but with careful framework design, implementation, and documentation, the loitering object problem can be successfully addressed.
Acknowledgment
I would like to thank several of my colleagues in the development group for their valuable feedback on the initial draft of this article.
References
- Venners, B., Inside the Java 2 Virtual Machine, McGraw–Hill, 1999.
- Henry, E. and E. Lycklama, "How Do You Plug Java Memory Leaks?," Dr. Dobb's Journal, Vol. 25, No. 2, Feb. 2000, pp. 115–119, http://www.ddj.com/articles/2000/0002/0002l/0002l.htm.
- Nylund, J., "Memory Leaks in Java Programs," Java Report, Vol. 4, No. 11, Nov. 1999, pp. 22–30, http://www.javareport.com/archive/9911/html/from_pages/ftp_feature.shtml.
- Cotter, S. with M. Potel, Inside Taligent Technology, Addison–Wesley, 1995.
- Taligent Inc., The Power of Frameworks: For Windows and OS/2, Addison–Wesley, 1995.
- Taligent Inc., Taligent's Guide to Designing Programs: Well-Mannered Object-Oriented Design in C++, Addison–Wesley, 1994.
- Gamma, E. et al., Design Patterns: Elements of Reusable Object-Oriented Software, Addison–Wesley, 1995.