Understanding EJB argument passing

TODAY'S TIP REGARDS a performance characteristic that is particularly important to certain Enterprise JavaBean (EJB) applications, although it has applicability to other types of Java applications as well. It revolves around the way in which EJBs (and, more generally, any distributed object system) pass arguments to each other; I'll discuss how this task is accomplished and when you might want to change the manner in which it is done.

First, some terminology and a brief review. For the most part, arguments that are passed between two Java methods are passed by value. If you are given the code

int i = 3;
after invoking someMethod(), we know i will have the value 3, no matter what happens inside the method invocation. i is passed by value. If you are given the code

Point p = new Point(3, 3);
after invoking someOtherMethod(), we know p will reference the same object it referenced before the method invocation—the memory location it points to is the same. But the contents of that memory may have changed. If the method calls its parameter q and executes the statement q.x = 4, then when the method returns, p will refer to the point with coordinates (4, 3). In this case, p is passed by reference.

I've had semantic arguments with developers who insist that because p points to the same memory location all along, it too is passed by value. If the method executes the statement q = new Point(4, 3), then when the method returns, p will refer to the original point, which will still have the coordinates (3, 3). No matter what you choose to call it, this is the way Java works.

The point is that the contents of an object can change when it is passed between methods in Java. The exception to this rule occurs with RMI and other distributed programming technology. In that case, the two methods in question are running on two different virtual machines. To get the object from one method to another, the object is serialized by the client and deserialized by the server. A side effect of this deserialization is that a copy of the object has been made, and no matter what changes are made to the contents of that object in the server, the contents of that object on the client will not change. In this case, the objects are said to be passed by value (though in those semantic wars, some prefer the term "passed by copy").

This difference in programming semantics takes a little getting used to, but it has a rationale. Clearly, a copy of the object has to be made in the server's virtual machine as it cannot share physical memory with the client's virtual machine. The standard Java object semantics could have been preserved if the copy of the object was sent back to the client and somehow overlaid in memory on top of the original object. Such a trick would have been complex and would often have had a severe impact on performance. If the object hasn't changed, it's a waste of time to copy it back. It's far better for developers to be in control of any data that needs to be sent back to the client (and to send it, for example, in the return object).

This brings us to EJBs: a method invocation on an EJB is an RMI call. It is a requirement of the EJB specification that parameters passed during the method call be passed by value so that the EJB receives a copy of any object parameters (and the caller receives a copy of the return object, if applicable). Thus, your EJB client can assume that the contents of an object passed to an EJB server are not changed by the server.

As it turns out, EJBs that call each other are often deployed within the same virtual machine. Think of a session bean that needs to use multiple entity beans—all of the beans are often deployed within the same virtual machine. In this case, the object serialization is not strictly necessary. The client and server both have reference to the original object. But if the server (the entity bean) changes the contents of the parameter object, the client (the session bean) will see those changes. Later, if for scaling (or other) reasons the entity beans are moved onto a different machine, the session beans will no longer see any changes to the content of the parameter object. It would be untenable to have different semantics depending on how the application is deployed; that is why all parameters between EJBs are always passed by value, even when they are within the same virtual machine.

Most application servers are aware of the performance penalty paid by making an unnecessary copy of the object passed between EJBs within the same virtual machine and have an option that can be set to have these objects passed by reference (thereby avoiding the copy). If you turn on that option for an arbitrary EJB application, it is possible that the application will break. If the application is written to assume pass-by-value semantics, then the unexpected change in data contents could have a detrimental effect.

On the other hand, if you write your EJBs so that they make no assumption about these semantics, then you're ahead of the game. You can gain the performance benefit from passing by reference when the EJBs are deployed within the same virtual machine, and you can still safely deploy them on multiple machines. What this means is that the EJB must never modify any object it receives as a parameter. If it returns an object, it must return a new instance of the object so that it is not inadvertently shared.

The feasibility of this will depend, of course, on your application. I've seen EJB applications where enabling pass-by-reference semantics produced a 25% benefit, so if you have an application that makes a large number of EJB calls within the same virtual machine, a little forethought in application design might be to your benefit.

About the Author

Scott Oaks is a Systems Engineer for Sun Microsystems, where he focuses on practical applications of Java Technology. He is the co-author, with Henry Wong, of Java Threads. He can be contacted at [email protected].