IN MY LAST column ("Back to basics: network servers in Java,"
Java Report, Vol. 3, No. 10) we explored the use of a basic TCP server and discussed how
such a server could be used to filter or proxy data from another server. This allowed applet clients
to get data from a server that they normally might not be able to access, as well as allowing them
to receive data that had been specially parsed or altered in some manner.
We're going to use that same framework again this month to explore another question: How can we
improve our server to handle a service that might be offered only intermittently?
There are several motivations for exploring this topic. To begin, we're assuming that our server
holds some data that is interesting to the client: for example, it could cache data that clients are
likely to request multiple times. Now what happens if our server terminates? We'd like to be able to
recover that client-specific data without having to regenerate, recalculate, or otherwise re-obtain
that data. As an example, consider the class shown in
Listing 1.
In the class in Listing 1, we assume
that there's only one interesting result every 24 hours, and we can just cache that result. If we already
have a result that is less than 24 hours old, we use it; if not, we use our internal private method
getResult0() to obtain the desired result.
Now say that we want to use this class in a server, and we want to be able to retain the cached result
if our server goes down. There are numerous circumstances under which the server might go down: if it's a
CORBA server, it can be brought up and down by the CORBA infrastructure. If it's an RMI server running
under new facilities available in 1.2, the server might take itself down when it is idle and depend on
the RMI daemon to restart it when it is needed. And no matter what type of server it is, if the machine
crashes or needs to be rebooted, the server will go down.
In Java, the key to recovering this data between server instantiations is object serialization.
The server can periodically serialize instances of the ClientData object: every time the object
is changed, every few minutes, or whenever else makes sense. This is why we made the ClientData class
implement the Serializable interface.
Now, when a server needs a ClientData object, it can either obtain a previously-serialized object or it
can create a new one. Of course, there are often times when a server won't know for certain if such a
saved object exists, so it must be prepared to handle such a case. If the server expects to find the
serialized object (if it exists) in the File object f, then it could use this code to obtain
the ClientData object:
ClientData cData = null;
try {
FileInputStream fis =
new FileInputStream(f);
ObjectInputStream ois =
new ObjectInputStream(fis);
cData =
(ClientData) ois.readObject();
fis.close();
ois.close();
} catch (Exception e) {
cData = new ClientData();
}
Now, when it needs a result, the server merely calls the getResult() method on the
cData object.
Similarly, the server is responsible for periodically executing code to save the state of the
cData object:
try {
FileOutputStream fos =
new FileOutputStream(f);
ObjectOutputStream oos =
new ObjectOutputStream(fos);
oos.writeObject(cData);
oos.close();
} catch (Exception e) {}
In our example, the server should execute this code each time it has called the
getResult() method on the cData object. If the ClientData class had a
different interface—and in particular, if it had an interface that allowed us to determine
if its internal state had changed—then we could use that information instead. But that's an
uncommon interface for an object to have.
It's not particularly inefficient to write the cData object out each time it's used, even
if its internal state has not changed. Nonetheless, if you're worried about the extraneous I/O, you
can wrap the ObjectOutputStream around a ByteArrayOutputStream instead of a
FileOutputStream and in that way obtain the raw bytes of the serialized object. Then you can
call the getResult() method, repeat the process to get a new set of raw bytes from the
cData object, compare the new and old bytes, and only write the bytes out if they have changed.
We could build a server around this code, but there are a few remaining issues. The first is how do
we define the file in which the serialized data will be held, and the second is how we must synchronize
access to client data objects in a multi-threaded server. These two issues are inextricably linked, because
the mechanism that we use to save the data will affect how we implement the necessary synchronization.
The issue of synchronization is particularly important because the TCP server framework that we'll
eventually use runs each client connection in a separate thread. Other server environments—RMI and CORBA
servers, for example—also tend to run separate clients in separate threads. We must ensure that our
files will work correctly in these environments.
Let's say that we decide to use the hostname of the client that our server is servicing. This does not
remove the need for synchronization: a single host can create multiple connections to our server. What we
really want is a one-to-one mapping between client hostnames, ClientData objects, and files in which
the ClientData objects will be saved.
The result of that mapping is that we will need to synchronize requests on some object that is common
in this mapping. We can't synchronize directly on the ClientData object because that object won't
exist before we read it from the file (or create it if there is no saved version of it). We'll eventually
need to synchronize on the File object that is associated with each client hostname.
We need to make sure that each thread in the server uses the same File object for common
hostnames. It would not be sufficient for a thread in the server to create a File object based on the
client host and then lock that file object: another thread in the server could perform the same activity,
and we'd end up with two separate File objects that reference the same underlying file. We'll
solve that problem by keeping a global cache of File objects in the server. File objects
can be obtained with this code:
static Hashtable files;
static {
files = new Hashtable();
}
...
static synchronized File
getFile(String host) {
File f;
f = (File) files.get(host);
if (f == null) {
f = new File("clientData" +
File.separator + host);
files.put(host, f);
}
return f;
}
By making all access to the files hashtable static and synchronized, we ensure that there is only one
File object for every hostname. Now we can synchronize on that object to ensure that multiple
clients from the same host do not interfere with each other:
private void process(
File f, OutputStream os) {
ClientData cData = null;
String answer;
synchronized(f) {
.. define cData as above ..
answer = cData.getResult();
os.write(answer.getBytes());
.. save cData to f as above ..
}
}
And that's all there is to it—even though our initial code creates a new cData object each time
it runs (either by finding the cData object in the serialized file or by instantiating one),
synchronizing on the File object ensures that there will only ever be one cData object
for each client that is simultaneously active; a second client attempting to retrieve a cData
object must wait for the first client to finish writing it out to the persistent store.
This simple technique is very useful in an environment where the server does not run continuously.
Note that there is a slight chance that the server will go down in the middle of writing out the
serialized data, and that the resulting data might get written out incompletely. In practice, this
is not a problem: When the server attempts to read the saved data, it will get an error because the
deserialization will fail; a new instance of cData will then be created.
A slightly better alternative might be to save the object to a new file, and when the file has been
completely written, rename the new file to the desired name. That would mean we could always recover
some state, even if the state is slightly out-of-date. There are circumstances in which that is not the
desired behavior, especially in the case of a server performing caching where the state must match data
that is available on the backend: in that case, we're better off with either the most recent state or no
state.
There's a new feature in 1.2 that makes this all particularly attractive when used in conjunction with
RMI. This new feature allows RMI servers to be started automatically by a daemon process (the RMI daemon,
or rmid). When a request for the RMI server comes into a machine, rmid will automatically start the server
if it is not running. If the server later exits—e.g., if it shuts itself down, or if it encounters a
fatal error—then rmid will start it again when a new request comes in.
You can even arrange for the data file (or any other object) to be passed to the constructor of
your RMI server by using a new class called a MarshalledObject. This class is really a wrapper
class for anything that you want to use to initialize the state. In our example, that doesn't really help
us, because we have a different object (the File object) for every client that contacts us. If we
had a single File object that held the state for everything in the server, however, we could wrap
the File object in a MarshalledObject, and that File object would be passed to
the constructor of the RMI server.
Using the MarshalledObject, we'd also have to write saveState() and
restoreState() methods; these methods do nothing more than serialize and deserialize the
data like we did above. Of course, these methods may be arbitrarily complex in the algorithms that
they use to save and restore data.
Whether you use RMI or not, however, we've shown a useful feature for any server that needs to maintain
state across instantiations. The ease of object serialization makes this all quite simple.