Persistency Made Easy

If you are developing a distributed application, using check-pointing (where an application makes snapshots of its state for later analysis or rollback), or if you simply need to store or retrieve complex data, you have likely experienced the difficulty of converting and reconstructing your memory-based data image to or from a non-memory-based portable form. Two-way data transformation is a way to achieve data persistency. The quality extends data life span beyond one running session. A persistent environment encapsulates related mechanisms in a separate software layer and seamlessly provides the conversion functionality. Such an environment lets the programmer stay application focused, and remain within the familiar C++ framework and notation.

The persistent environment discussed in this article greatly assists in building and manipulating persistent objects without tying the user to proprietary commercial object databases. Its techniques and implementation do not rely on the assistance of data classes. Thus the environment results in:

  • generality and portability of implementation,
  • modularity (different parts of the environment can be modified/optimized individually),
  • minimal or no class awareness of their persistent nature,
  • minimal and well-defined requirements for integration of a class into the persistent environment, and
  • easy integration of the C++ standard and third-party libraries.
These results make this approach an attractive alternative to traditional techniques.1 A recent C++ Report article2 demonstrated another approach. However, the design presented here is far less intrusive and restrictive, works with regular pointers and references, and is C++ standard library-friendly.

POINTERS AND REFERENCES NEED ATTENTION A well-known obstacle in implementing persistency is related to the way data associations are implemented in C and C++; it is commonplace to implement them using pointers and/or references. References are essentially another (tamed and civilized) form of pointers, and I will usually not mention them separately. The whole discussion concerns both pointers and references equally.

A widespread programming technique is to embed a pointer (or reference) to an auxiliary data in the body of an object:


struct Employee
{ ...
  char*         name;
  Position& position;
};
Among other things, an Employee instance has two embedded pointers to auxiliary data storage—addresses of additionally allocated blocks of memory. Therefore, for persistency purposes, at least three pieces of information must be stored and later reconstructed: employee's name and position, and the Employee instance itself. While handling a character array (char* name) may be straightforward, the Employee instance itself requires special attention.

The standard memory allocation mechanism is not deterministic with respect to the positioning of allocated memory blocks. In other words, the standard allocator does not guarantee that during object reconstruction a memory block for, say, name will be allocated at the same address as it was originally constructed. If the memory location of the name storage changes, the value of the name pointer stored inside the Employee instance becomes invalid. Therefore, attempts to store and retrieve pointers directly (as we often do with integers or character strings) are most likely suicidal.

One solution is to save, then reconstruct the original memory layout. In other words, when restored, all data in memory are positioned exactly where the data were originally. In this case, the original location of the name storage in memory is preserved. Therefore, the value of the name pointer embedded in the Employee instance remains valid.

A memory allocator with the described deterministic memory allocation behavior is the foundation of the persistent environment design. The functionality is confined to the raw-memory management layer. Low-level functional encapsulation dramatically minimizes requirements imposed on and eases maintenance of persistent data.

C++ provides a powerful and flexible memory management mechanism. Therefore, an implementation of a custom memory allocator with the described behavior is far easier to accomplish than it may sound. The Allocator class discussed later in the article encapsulates that functionality.

VIRTUAL CLASSES NEED ATTENTION TOO Another obstacle in achieving object persistency is specific to C++ and associated with runtime polymorphism3 of classes containing virtual functions. The traditional implementation of the feature relies on the virtual function table.4 A vtbl pointer is embedded in every instance of a virtual class. Those pointers are unique identifiers that associate an object with its class and, therefore, determine the object's behavior. Although C++ compilers are not mandated to employ virtual tables, I have yet to find a compiler not using them.

The discussed design does not rely on the particular implementation of an object's runtime behavior (RTB). It uses the standard technique based on placement operator new().5 The environment supplies every persistent class with a unique function responsible for the restoration of a class's RTB. For example, a Foo class will be accompanied by:


void* _MakeFoo(void* addr)
{                            
  return ::new(addr)   
    Foo(Persistent::Constructor());
}
The function constructs a Foo object (restores vtbl pointer) out of a persistent memory block restored by Allocator (details discussed later). The function is part of the Foo-related information stored in the persistent Schema database. The class-function association is created and the Schema database is built by a special-purpose preprocessor.

FOCUS ON USE AND USERS One of the strong points of this implementation is its convenience. To the user, the whole persistent environment is represented by a single Persistent class. The environment does not affect or limit functionality of persistent objects. There is no special interface or specific restrictions to deal with the objects. The only place where objects reveal their persistent nature is during their initial construction:


Persistent domain(domain_size);
...
const Key   key = 123;
const Moo* type = CLASS_TYPE(Moo);

Moo* moo = new(domain, type, key) Moo(args);

The verbose operator new() reflects a distinct property of all persistent objects—they exist within a special persistent memory domain. The persistent new() serves as a bridge to the domain and simply transfers memory allocation requests to the domain:


template
inline
void*
operator new(size_t size,
             Persistent& domain,
             const T* type,
             Key key =Persistent::NoKey())
{
  return domain.allocate(
    Schema::schema(type), key, size);
}
A newly created object is immediately accessible via a pointer returned by the persistent new(). However, in contrast with the initial construction (which is explicit), the object reconstruction phase is automatic. Complete data layout within a persistent domain is restored at once, when the domain is reconstructed using a previously packed memory image. Within the reconstructed domain, an object is accessible using a unique key if that key was supplied during initial construction of the object:


// Reconstruct previously stored domain.
Persistent domain(packed_buffer);

// Get access to reconstructed data.
Moo*     moo = (Moo*)  domain.find(ACCESS_KEY);
char* string = (char*) domain.find(STRING_KEY);
The need for the type in the persistent new() is not immediately obvious. It is used to provide access to the information associated with a corresponding class in the Schema database. The information is stored with the object and used during the reconstruction phase to restore RTB of the object. Nonvirtual data such as C-style structures, character arrays, and so forth do not exhibit runtime polymorphism and do not have embedded vtbl pointers. Therefore, they do not require a type during creation:


char* c0 = new(domain, SOME_KEY) char[25];
Bar* bar = new(domain, BAR_KEY) Bar;
For persistency purposes these objects are considered typeless and use a special Persistent::NoType() type to employ the same memory allocation mechanism. The typeless persistent new() is simply a convenience routine:


inline
void*
operator new(size_t size,
             Persistent& domain,
             Key key =Persistent::NoKey())
{
  return operator new(
    size, domain, Persistent::NoType(), key);
}
The typeless new() has to be used sparingly when used for instances of nonvirtual classes. If requirements change and a class becomes virtual later, the typeless new() will have to be replaced with the typed new() for all persistent instances of the class. The replacement task could be tedious on a larger scale. Therefore, a sensible decision might be to restrict the usage of the typeless new() to non-class-based data.

Persistent memory allocations for instances of virtual classes do require the type that is specified by the simple macro:


#define CLASS_TYPE(type) ((const type*) 0)
However, it is unlikely that you will be using the macro, since it is perfectly legal to use an instance that is being created to specify the type. The following interface is valid because the persistent new() needs only moo's type, not its value:


Moo* moo = new(domain, moo, key) Moo(arguments);
Often the interface to the persistent new() can be further simplified. The requirement to allocate a unique access key for every block of persistent memory appears particularly tedious and error prone. Just imagine a system with hundreds, even thousands, of objects and auxiliary data where each one required such a key! Fortunately, using an access key is an exception rather than a general rule. If such a key is not provided, the Persistent::NoKey() value will be used by default. Consequently the next time the persistent domain is reconstructed, the objects created with the default value will not be accessible with the Persistent::find(Key) function. It is not that much of a problem though, as the majority of data are inherently interconnected. In other words, one object points to some other data, and so forth. Those data and their associations form a data tree. Therefore, an access key is only needed for the root object of such a tree.


class Foo
{ ...
  char*           _name;
  Moo*             _moo; 
  Array _children; 

  const char* name() const { return _name; }
};
A Foo instance employs additional persistent memory blocks. Those blocks are indirectly accessible through or by the Foo instance (for example, foo.name()). Therefore, only the Foo instance needs an access key.

Obviously, before persistent objects are created, a persistent domain must be created first:


Persistent domain(domain_size);
This line creates a new, empty persistent domain of the domain_size size. Here the static allocation of a memory pool (domain) differs from the standard dynamic memory allocation on the heap—the user has to estimate ahead how much memory (domain_size) the persistent domain is allowed to occupy. Once the current domain is exhausted, new persistent memory allocation requests in the domain will fail.

However, applications are not limited to one persistent domain. Several domains can be opened simultaneously and manipulated concurrently. Objects in those domains can have cross-references. Interdomain references are valid as long as those domains are stored and reconstructed together. The creation of an additional domain may be automated and initiated when the current domain is exhausted. Obviously, the multiple domains approach requires all interdependent domains to be initialized together before they are used.6

Alternatively, one huge domain—similar to the local memory pool—may be created. There is no penalty for doing so—a packed domain's size is affected not by the total size of the domain in memory but by the amount of persistent memory actually used.

To make a snapshot of the current state of a persistent domain,


int packed_size = domain.pack(buffer);
will pack the domain into the provided buffer storage. Persistent::pack(char*&) returns the size of the packed domain in bytes and sets the buffer to point to the end of the packed domain so that another domain can be packed right after. The memory image domain being packed is not affected by pack() snapshots. Currently, it is the user's responsibility to ensure that the storage buffer is of a sufficient size.

Once packed into a linear buffer, a persistent domain can be stored to disk or travel over the network. Then,


Persistent domain(packed_buffer);
will re-create the original memory image of the domain and reconstruct data layout in the domain. Data initially created with access keys are immediately accessible via


Msoo*     moo = (Moo*)  domain.find(ACCESS_KEY);
char* string = (char*) domain.find(STRING_KEY);
DESIGN AND IMPLEMENTATION The discussed implementation consists of three major functionally distinct classes (see Fig. 1).

Figure 1
Figure 1. Combined object and interaction diagram of the persistent environment.

  • The Allocator class implements deterministic memory allocation.
  • The Schema class accommodates information about persistent classes.
  • The Persistent class glues the pieces together and provides a single-class user interface.
Clear functional separation dramatically simplifies implementation and, importantly, maintenance of the environment. Please click here to download the examples and the source code to be discussed.

THE ALLOCATOR CLASS The following conceptual Allocator implementation uses the best-first with lazy coalescing algorithm to allocate memory blocks from a memory allocation arena (a chunk of continuous memory). The Allocator public interface is minimal and almost conventional:


class Allocator
{
  public:

 ~Allocator ();
  Allocator (Size, void* =0);

  void*  allocate (Size, void* =0);
  void deallocate (void*);

  ...
};
The only deviation is to accommodate the deterministic allocation requirement. Therefore, the Allocator(Size, void*) accepts an address (the second argument) in the application memory address space that the arena is to be anchored to. In other words, Allocator allows specification of the starting memory allocation address. The functionality is primarily used by the environment to reconstruct an allocation arena at the same memory address as it was created initially. Normally, the user does not bother providing this argument.

The current Allocator implementation uses only one allocation arena and requires the size of the arena as the first argument. More sophisticated implementation should be able to automatically request a new arena or increase the size of the existing arena when the currently used memory pool is exhausted.

allocate(Size, Address =0) has been extended to meet the same deterministic allocation requirement. The function accepts an optional second argument to specify the address in the arena where a requested memory block should be located. Again, the functionality is not normally visible to the application programmer.

That additional Allocator functionality is sufficient to realize the following algorithm for data transition from memory to secondary storage, and back to memory:

  • A memory allocation arena is initially created and anchored to an address in local memory address space (Allocator(arena_size)).
  • An application creates a persistent object (allocates and initializes a memory block within the arena using allocate(block_size)).
  • Persistent domain is converted to a non-memory-based form. The anchor address and the size of the memory allocation arena (anchor_address, arena_size), the address and the size of the allocated memory block (block_address, block_size), and the content of the block are stored.
  • Memory image of the persistent domain is reconstructed in memory. The previously stored information—the anchor address and the size of used allocation arena (anchor_address, arena_size), the address and the size of the allocated memory block (block_address, block_size), and the content of the block—are retrieved.
  • Arena reconstruction. An arena of the original size anchored to the originally allocated anchor address in local memory address space (Allocator(arena_size, anchor_address)) is created.
  • Data reconstruction. A memory block of the specified size is allocated within the arena at the specified address (allocate(block_size, block_address)). The content is copied to the block.
During the described two-phase data transformation, a block's attributes (size, memory location, content) have been successfully preserved and restored. If the block had a pointer to another memory block allocated and stored within the same allocation arena, the pointer would be safely restored together with the pointed-to data.

Allocator uses the MemoryPool support class to acquire an anchored memory allocation arena. This particular MemoryPool implementation employs but is not limited to UNIX shared memory mechanism. On the systems lacking shared memory support, a similar mechanism, memory mapping, may be used instead. Alternatively, conventional memory allocation mechanisms may be adapted to support the functionality.

THE SCHEMA CLASS The Schema class plays an equally important role during object reconstruction—it provides the infrastructure needed to restore the RTB of instances of virtual classes. Therefore, further discussion is not relevant to nonvirtual data.

Before an instance of a virtual class can be used within the persistent environment, the class has to be registered in the Schema database. The registration process creates a Schema instance and a set of supporting functions for the class. Those functions associate a unique numeric identifier with the class. That class identifier will be packed together with instances of the class. Later, during object reconstruction, that identifier will be used to associate class information with the data being unpacked.


struct Schema
{
  typedef const char cchar;

  // Get registration instance using class type, class id or class name.

  template
  static const Schema& schema (const T*);
  static const Schema& schema (ClassID);
  static const Schema& schema (cchar*
                               class_name);

  static Size size (); // Get table size.
  void*    restore (void*) const;
  ClassID class_id () const { return   _id; }
  cchar*      name () const { return _name; }

  typedef void* (*Restore)(void*);

  ClassID                   _id;
  cchar*                  _name;

Restore _restore; static const Schema _schema[]; // Reg. table };

Every registration Schema instance consists of:

  • a unique numerical identifier assigned to the corresponding class (_id),
  • the name of the class (_name), and
  • a pointer to the function responsible for the class's RTB reconstruction (_restore).
Those Schema instances form a registration table similar to:


const Schema Schema::_schema[] =
{
  { 0, "default", 0        },
  { 1, "Foo",     _MakeFoo },
  { 2, "Moo",     _MakeMoo },
};
The first entry in the table (_schema[0]) is predefined and reserved for nonvirtual persistent data. It corresponds to the Persistent::NoType() type discussed earlier. Consequently, the entry has no RTB reconstruction function (_restore equals zero). Every other entry encapsulates all the necessary information for a particular class. For example, for the Foo class, the unique identifier was assigned to 1 and the RTB reconstruction function was set to _MakeFoo().

To be precise, class name (_name) entries are not needed for persistency purposes. However, it is handy to have class names, and it is a bonus of automatic generation of the registration table:


cout < schema::schema(object).name();="">
Class information is collected and shaped into the registration table by a separate preprocessor called the schema generator. This fairly traditional technique is similar to the ones used by commercial object databases (ObjectStore). The schema generator produces a schema.c file similar to:


// This schema.c file is part of libschema.so 
// library and auto-generated. Do not hand edit.

#include 
#include "search.h"

REGISTER_CLASS(Foo, 1)
REGISTER_CLASS(Moo, 2)

const Schema Schema::_schema[] =
{
  { 0, "default", 0        },
  { 1, "Foo",     _MakeFoo },
  { 2, "Moo",     _MakeMoo },
};

Size
Schema::size()
{
  // Return the size of the table.
  return sizeof(_schema) / sizeof(_schema[0]);
}
The schema generator builds and recompiles the file every time a new class is added or an existing class is modified. Due to its dynamic nature, the file is compiled into a separate shared (dynamic) library libschema.so.

For every registered class the schema generator provides two functions conveniently expressed using the following macro:


#define REGISTER_CLASS(class_name, class_id)  Address                                       _Make ## class_name(Address addr)             {                                               return ::new(addr)
   class_name(Persistent::Constructor());     }                                             template<>                                    const Schema&                                 Schema::schema(const class_name*)             {                                               return Schema::schema(class_id);            }
For class Foo, the macro expands into:


Address
_MakeFoo(Address addr)
{
  return ::new(addr)
   Foo(Persistent::Constructor());
}

template<>
const Schema&
Schema::schema(const Foo*)
{
  return Schema::schema(1);
}
Schema::schema(const Foo*) reserves the _schema[1] entry in the registration table for the Foo class. _MakeFoo() is the _restore member of the _schema[1] instance. The function employs the placement operator new and the Foo(Persistent::Constructor) constructor to restore RTB after a Foo object (pointed to by addr) was restored in memory by Allocator. The constructor is discussed in more detail later in the article.

The Schema provides the infrastructure necessary to implement the following RTB restoration algorithm:

  • Restore an object in memory (using Allocator).
  • Extract the class identifier accompanying the object.
  • Retrieve the Schema instance associated with the class identifier (Schema::schema(class_id)).
  • Invoke the RTB restoration function (Schema::restore(addr)).
The Schema::restore(Address) is merely a public interface to the restore function:


inline
Address
Schema::restore(Address addr) const
{
  return _restore ? _restore(addr) : addr;
}
Obviously, for nonvirtual data Schema::restore() simply returns the provided address.

Just a few more words to explain the purpose of the search.h appearing in the schema.c example. The file is a list of declaration files that the schema generator scans while building the registration table. Consequently, the search.h includes all the class declarations necessary for successful compilation of the libschema.so library. Following our simple example, search.h will be:


#include 
#include 
This file is created and maintained manually. Automation of it might be an alternative, but at the time of writing the idea was rejected due to limited computer resources.

INTERFACE TO THE PERSISTENT ENVIRONMENT The major parts of the user interface to the persistent environment have already been briefly covered in the Focus on Use and Users section. Those two distinct parts are:

  • management of a persistent domain and
  • management of persistent objects in the domain.
The management of persistent objects is minimal and conventional. It consists of the discussed persistent operator new() and an overridden global operator delete(). The operator delete() is extended to handle persistent memory:


void operator delete(void* addr)
{
  if (!addr) return;

  // Scan through used domains looking for
  // the domain the addr belongs to.

  Bag::iterator bi =   
   Persistent::domains.begin();

  for (; bi != Persistent::domains.end(); ++bi)
  {
    Persistent* domain = *bi;
    Address  low = domain->address();
    Address high = (char*) low + 
                   domain->size();

    if (low < addr="" &&="" addr="">< high)="" {="" domain-="">deallocate(addr);
      return;
    }
  }
  free(addr); // addr points at an ordinary memory.
}
An application can employ several persistent domains. Therefore, the function tries to determine which domain the memory block specified by addr belongs to. If no domain claims the block, it is assumed that it was allocated locally. For some applications the universal operator delete() might be somewhat inflexible and/or inefficient. For such applications an employment of the Persistent::deallocate(void*) might be an alternative.


class Persistent
{
  public:

  class Constructor 
  {
    // An empty class to provide a unique
    // signature for persistent constructors.
  };

  ~Persistent ();
  Persistent (Size, Address =Allocator::Any);
  Persistent (cchar* packed_buf);

  Size    pack (char*& buf);
  Address find (Key) const;

  // Family of functions returning information
  // about allocated memory blocks:
  // Get size of a memory block,
  // Get class id of a memory block,
  // Get user-provided access key of a mem. block,
  // Check if block is persistent or transient.

  static Size         size (const Address);
  static ClassID      id  (const Address);
  static Key           key (const Address);
  static bool is_persistent (const Address);

  static Key   NoKey ();    // Default access key.
  static cvoid* NoType (); // Default class type.
    ...
};
The family of informational functions (size(), id(), key(), is_persistent()) accepts an address of a persistent memory block and returns information associated with the block.

VIRTUAL CLASS REQUIREMENTS AND FOO(PERSISTENT::CONSTRUCTOR)


class Foo
{  ...
  virtual ~Foo ();
           Foo (Persistent::Constructor) {}
};
The Foo(Persistent::Constructor) is the only coding requirement to integrate a virtual Foo into the persistent environment. Only classes with the Persistent::Constructor signature constructors are registered by the schema generator. Moreover, the constructor is solely responsible for proper object initialization during reconstruction. Paired with placement new() the constructor restores RTB of instances of virtual classes after their memory image is reconstructed by Allocator:


Address _MakeFoo(Address addr)
{
  // Restore RTB (usually vtbl pointer) of the Foo object pointed by addr.

  return ::new(addr) 
   Foo(Persistent::Constructor());
}
However, the original constructor's purpose is more than just RTB restoration. Constructors (including persistent constructors) initialize an object's content. The C++ Standard (paragraph 8.5, bullet 9) states that if during initialization there is no explicit initializer specified for an object of a nonprimitive type, the object will be implicitly default-initialized. It should be noted that instances of primitive types (int, char*, Foo&, etc.) do not get default initialized.

In the following Bad example, after the Bad::Bad(Persistent::Constructor) is called, the _foo instance will be default-initialized (using Foo::Foo()), and its previously restored content will likely be overridden.


class Bad
{  ...
  Bad(Persistent::Constructor) {}

  int  _int; // OK
  Foo* _ptr; // OK
  Foo  _foo; // Problem.
};
In addition, _foo is embedded in Bad and, therefore, does not occupy an individual memory block. Consequently, _foo will be missed during automatic RTB restoration (done only for individual memory blocks). Notice that _int does not suffer the same fate, because it is a primitive type. Therefore, an aggregation class has to ensure that it will reconstruct RTB and not override the content of aggregated classes:


class Good
{  ...
  Good(Persistent::Constructor c) : _foo(c) {}

  Foo _foo;
};
The task of proper initialization of embedded instances may be tedious for complex classes. A solution2 is to implement default constructors so that they do nothing during persistent object reconstruction:


Foo::Foo()
{
  if (Persistent::restoration_in_progress()) return;

  ... // Nonpersistent initialization stuff.
}
However, the easiest solution might be to avoid embedded objects altogether and use the pointer-based style automatically covered by the environment:


class Easy
{  ...
  Easy(Persistent::Constructor) {}

  Foo& _foo;
  Zoo* _zoo;
};
COOPERATIVE USAGE OF PERSISTENT AND TRANSIENT MEMORY The persistent constructor allows customized object initialization during reconstruction and optimized persistent memory usage. For example, to minimize the amount of data stored to a disk, a persistent Foo class might employ nonpersistent memory for internal data that do not need to be persistent. These data are recreated in local memory every time a Foo instance is reconstructed.


class Foo
{  ...
  Moo*     _moo;
  char* _buffer;
};

Foo::Foo(Persistent::Constructor)
{  ...
  // The data are not persistent and are re-created for every session.
  _moo    = new Moo;
  _buffer = new char[1024];
}
Mixing persistent and transient memory requires discipline. Obviously, only persistent memory blocks will be ultimately stored and then reconstructed. Therefore, attention has to be paid to pointers to transient data embedded in persistent objects (_moo and _buffer). Those pointers need to be reassigned during object reconstruction in the persistent constructor.

Another point to remember is that persistent constructors get invoked during persistent domain reconstruction. Therefore, attempts to address or use other data within the domain at that stage may lead to unpredictable results.

As the persistent constructor is primarily required for the restoration of an object's RTB, the constructor and the registration in the Schema database are optional for nonvirtual classes. Nonvirtual classes need the constructor and registration in the Schema database only to customize object reconstruction.

It is possible to create persistent or nonpersistent instances of the same class using, respectively, persistent or conventional new(). However, if instances of a class dynamically allocate additional memory for internal purposes, those instances often need to know if they are being instantiated as persistent or transient so that they use persistent or transient memory allocation mechanisms respectively:


Employee::Employee()
{
  if (Persistent::is_persistent(this))
  {
    // Use persistent allocation.
  }
  else
  {
    // Use transient allocation.
  }
};
Checks are necessary to ensure that persistent and transient memory blocks do not get indiscriminately mixed. Doing so will likely lead to dangling pointers and/or persistent memory leakage. (A persistent memory leak occurs when persistent memory gets reconstructed without its transient owner. The persistent memory becomes inaccessible and effectively lost.)

However, when carefully implemented, this approach may substantially optimize or minimize persistent memory usage by storing data only and not a data "holder" (an object managing access to the data). For example, the following Employee class is transient. It serves as a public user interface to internally managed persistent data.


Employee::Employee(Key name_key, 
                   Key surname_key)
{
  _name    = (char*) domain.find(name_key);
  _surname = (char*) domain.find(surname_key);

  if (!_name)
  {
    _name = new(domain, name_key) char[SOME_SIZE];
  }
  ...
};
The optimization techniques of blending persistent and transient memory allow the building of flexible and economical persistent models.

THE PERSISTENT ENVIRONMENT AND THIRD-PARTY LIBRARIES To be used in the persistent environment, a third-party class is required to have a modular allocator-based memory management implementation and to be nonvirtual. The first requirement ensures that the persistent memory allocation mechanism, instead of the default memory management, is easily integrated into the class. The second prerequisite comes from the fact that virtual classes are required to have a persistent constructor. This is unlikely for third-party libraries. It is possible to modify the declaration of a third-party class and to provide an inlined persistent constructor. However, the modification may be difficult to maintain. Many classes of the C++ standard library are well-suited for integration into the persistent environment. The following example demonstrates a typical custom allocator-based technique to employ the std::string:


#include 

// Actual implementation of PersistentAllocator was modified
// in order to be tested with the string class coming with gcc-2.95.2,
// which does not conform to the standard.

template
class PersistentAllocator
{  ...
  pointer allocate(
    size_type num,
    PersistentAllocator::const_pointer =0)
  {
    return _domain->allocate(
      Schema::schema(Persistent::NoType()),
      Persistent::NoKey(), num * sizeof(T));
  }
  void deallocate(pointer addr, size_type num)
  {
    _domain->deallocate(addr);
  }
  ...   
  static Persistent* _domain;
};

typedef basic_string< char,="">,
  PersistentAllocator > String;

// Class is not virtual to avoid default-initialization of the content
// as described in "Virtual Class Requirements" section.

class Employee
{
  public:

  Employee(char* name, char* position)
  : _name(name), _position(position) {}

  private:

  String _name;
  String _position;
}

Employee* e = new(domain, KEY)
  Employee("John", "Worker");
...
Employee* e = (Employee*) domain.find(KEY);
CONCLUSION The persistence techniques discussed are reasonably simple. However, the underlying concept has a wide variety of applications: from single-process data storage or retrieval to complex multimachine distributed systems, and from new persistent class development to legacy code upgrade to third-party library support.

Acknowledgments Many thanks to Brian Button, Herb Sutter, and James Grenning for their opinions, comments, advice, and patience in reviewing the manuscript.

References

  1. Lopez, L. "Persistent Lists Using ISAM Files," C/C++ User's Journal, Aug. 1997.
  2. Hesse, J. "EZSave for C++," C++ Report 12(2): 21–26, 40–41, Feb. 2000.
  3. Stroustrup, B. The Design and Evolution of C++, Addison–Wesley, Reading, MA, Paragraph 13.6.1, 1994.
  4. The Design and Evolution of C++, Paragraph 2.5.5.
  5. The Design and Evolution of C++, Paragraph 10.4.11.
  6. Meyers, S. Effective C++: 50 Specific Ways to Improve Your Programs and Design, Addison–Wesley, Reading, MA, Item 47, 1998.