Using object serialization to improve the server

 
This column is designed to give readers solutions to questions with a "how-to" bent. If you'd like to submit a question, send it to [email protected] or visit http://www.sigs.com/java/ask.html. In each issue, we'll answer the most common questions about Java programming.

Scott Oaks is a Systems Engineer for Sun Microsystems, where he focuses on practical applications of Java technology. He is the co-author, with Henry Wong, of Java Threads.

IN MY LAST column ("Back to basics: network servers in Java," Java Report, Vol. 3, No. 10) we explored the use of a basic TCP server and discussed how such a server could be used to filter or proxy data from another server. This allowed applet clients to get data from a server that they normally might not be able to access, as well as allowing them to receive data that had been specially parsed or altered in some manner.

We're going to use that same framework again this month to explore another question: How can we improve our server to handle a service that might be offered only intermittently?

There are several motivations for exploring this topic. To begin, we're assuming that our server holds some data that is interesting to the client: for example, it could cache data that clients are likely to request multiple times. Now what happens if our server terminates? We'd like to be able to recover that client-specific data without having to regenerate, recalculate, or otherwise re-obtain that data. As an example, consider the class shown in Listing 1.

In the class in Listing 1, we assume that there's only one interesting result every 24 hours, and we can just cache that result. If we already have a result that is less than 24 hours old, we use it; if not, we use our internal private method getResult0() to obtain the desired result.

Now say that we want to use this class in a server, and we want to be able to retain the cached result if our server goes down. There are numerous circumstances under which the server might go down: if it's a CORBA server, it can be brought up and down by the CORBA infrastructure. If it's an RMI server running under new facilities available in 1.2, the server might take itself down when it is idle and depend on the RMI daemon to restart it when it is needed. And no matter what type of server it is, if the machine crashes or needs to be rebooted, the server will go down.

In Java, the key to recovering this data between server instantiations is object serialization. The server can periodically serialize instances of the ClientData object: every time the object is changed, every few minutes, or whenever else makes sense. This is why we made the ClientData class implement the Serializable interface.

Now, when a server needs a ClientData object, it can either obtain a previously-serialized object or it can create a new one. Of course, there are often times when a server won't know for certain if such a saved object exists, so it must be prepared to handle such a case. If the server expects to find the serialized object (if it exists) in the File object f, then it could use this code to obtain the ClientData object:

ClientData cData = null;
try {
	FileInputStream fis =
			new FileInputStream(f);
	ObjectInputStream ois =
			new ObjectInputStream(fis);
	cData =
	(ClientData) ois.readObject();
	fis.close();
	ois.close();
} catch (Exception e) {
	cData = new ClientData();
}

Now, when it needs a result, the server merely calls the getResult() method on the cData object.

Similarly, the server is responsible for periodically executing code to save the state of the cData object:

try {
	FileOutputStream fos =
		new FileOutputStream(f);
	ObjectOutputStream oos =
		new ObjectOutputStream(fos);
	oos.writeObject(cData);
	oos.close();
} catch (Exception e) {}

In our example, the server should execute this code each time it has called the getResult() method on the cData object. If the ClientData class had a different interface—and in particular, if it had an interface that allowed us to determine if its internal state had changed—then we could use that information instead. But that's an uncommon interface for an object to have.

It's not particularly inefficient to write the cData object out each time it's used, even if its internal state has not changed. Nonetheless, if you're worried about the extraneous I/O, you can wrap the ObjectOutputStream around a ByteArrayOutputStream instead of a FileOutputStream and in that way obtain the raw bytes of the serialized object. Then you can call the getResult() method, repeat the process to get a new set of raw bytes from the cData object, compare the new and old bytes, and only write the bytes out if they have changed.

We could build a server around this code, but there are a few remaining issues. The first is how do we define the file in which the serialized data will be held, and the second is how we must synchronize access to client data objects in a multi-threaded server. These two issues are inextricably linked, because the mechanism that we use to save the data will affect how we implement the necessary synchronization.

The issue of synchronization is particularly important because the TCP server framework that we'll eventually use runs each client connection in a separate thread. Other server environments—RMI and CORBA servers, for example—also tend to run separate clients in separate threads. We must ensure that our files will work correctly in these environments.

Let's say that we decide to use the hostname of the client that our server is servicing. This does not remove the need for synchronization: a single host can create multiple connections to our server. What we really want is a one-to-one mapping between client hostnames, ClientData objects, and files in which the ClientData objects will be saved.

The result of that mapping is that we will need to synchronize requests on some object that is common in this mapping. We can't synchronize directly on the ClientData object because that object won't exist before we read it from the file (or create it if there is no saved version of it). We'll eventually need to synchronize on the File object that is associated with each client hostname.

We need to make sure that each thread in the server uses the same File object for common hostnames. It would not be sufficient for a thread in the server to create a File object based on the client host and then lock that file object: another thread in the server could perform the same activity, and we'd end up with two separate File objects that reference the same underlying file. We'll solve that problem by keeping a global cache of File objects in the server. File objects can be obtained with this code:

static Hashtable files;
static {
	files = new Hashtable();
}
...
static synchronized File
					getFile(String host) {
	File f;
	f = (File) files.get(host);
 	if (f == null) {
		f = new File("clientData" +
				File.separator + host);
		files.put(host, f);
	}
	return f;
}

By making all access to the files hashtable static and synchronized, we ensure that there is only one File object for every hostname. Now we can synchronize on that object to ensure that multiple clients from the same host do not interfere with each other:

private void process(
			File f, OutputStream os) {
	ClientData cData = null;
	String answer;
	synchronized(f) {
		.. define cData as above ..
		answer = cData.getResult();
		os.write(answer.getBytes());
		.. save cData to f as above ..
	}
}

And that's all there is to it—even though our initial code creates a new cData object each time it runs (either by finding the cData object in the serialized file or by instantiating one), synchronizing on the File object ensures that there will only ever be one cData object for each client that is simultaneously active; a second client attempting to retrieve a cData object must wait for the first client to finish writing it out to the persistent store.

This simple technique is very useful in an environment where the server does not run continuously. Note that there is a slight chance that the server will go down in the middle of writing out the serialized data, and that the resulting data might get written out incompletely. In practice, this is not a problem: When the server attempts to read the saved data, it will get an error because the deserialization will fail; a new instance of cData will then be created.

A slightly better alternative might be to save the object to a new file, and when the file has been completely written, rename the new file to the desired name. That would mean we could always recover some state, even if the state is slightly out-of-date. There are circumstances in which that is not the desired behavior, especially in the case of a server performing caching where the state must match data that is available on the backend: in that case, we're better off with either the most recent state or no state.

There's a new feature in 1.2 that makes this all particularly attractive when used in conjunction with RMI. This new feature allows RMI servers to be started automatically by a daemon process (the RMI daemon, or rmid). When a request for the RMI server comes into a machine, rmid will automatically start the server if it is not running. If the server later exits—e.g., if it shuts itself down, or if it encounters a fatal error—then rmid will start it again when a new request comes in.

You can even arrange for the data file (or any other object) to be passed to the constructor of your RMI server by using a new class called a MarshalledObject. This class is really a wrapper class for anything that you want to use to initialize the state. In our example, that doesn't really help us, because we have a different object (the File object) for every client that contacts us. If we had a single File object that held the state for everything in the server, however, we could wrap the File object in a MarshalledObject, and that File object would be passed to the constructor of the RMI server.

Using the MarshalledObject, we'd also have to write saveState() and restoreState() methods; these methods do nothing more than serialize and deserialize the data like we did above. Of course, these methods may be arbitrarily complex in the algorithms that they use to save and restore data.

Whether you use RMI or not, however, we've shown a useful feature for any server that needs to maintain state across instantiations. The ease of object serialization makes this all quite simple.

About the Author

Scott Oaks is a Systems Engineer for Sun Microsystems, where he focuses on practical applications of Java Technology. He is the co-author, with Henry Wong, of Java Threads. He can be contacted at [email protected].