In-Depth

Versatile Querying With db4o

This open source, object database provides you with a collection of APIs that adapt to diverse querying requirements.

In many database-backed applications, data is not simply fetched and analyzed for the purpose of report generations. Rather, data informs and controls the application. Data is queried and fetched in real time, and the paths down which the application flies in order to access that data can change from one application execution instance to the next. Or, they may change from instant to instant during the execution of a single application.

These shifting circumstances demand different query capabilities of the database engine. Sometimes, the query itself is simple, but speed is paramount. Other times, the object structure is elaborate, so the query language's expressiveness is of primary importance. Still other times, the application requires not so much querying as object navigation. That is, the logic within the application is best served if persistent objects can be traversed in the same manner that the application would navigate those objects, were they entirely in memory.

As a consequence, varying query needs argues for a database engine with querying capabilities that are simultaneously efficient and flexible — an API (or, rather, a collection of APIs) adaptable to diverse querying requirements. An open source object database that comes close to fulfilling these informal mandates is db4object's db4o.

One of db4o's most attractive features is its balance of simplicity and power. On the one hand, its API is so easily mastered and readily grasped that even a novice developer can create a full-fledged database application in a single sitting. On the other hand, that same API provides lower-level calls into the database engine that admit hardcore developers into the engine's plumbing in order to tweak and tune db4o's equipment for optimum performance.

db4o's characteristics are best illustrated — rather than merely discussed — so this article demonstrates db4o through example.

db4o Fundamentals
Though the ultimate goal is a demonstration of db4o's query API, you nonetheless must have objects in the database in order to have something to query. So, you'll begin as any newcomer would: You have defined classes, and you want to persist objects of those classes. The classes in this example model a system that tracks tests performed for a QA project. Your system consists of two classes. The first is TestSuite:

public class TestSuite {	
  private String name;  // Test Suite name
  private String description;
  private String configuration;
  private char overallScore;
  private ArrayList <TestCase> cases;
  private long dateExec;
  ... <remainder of TestSuite definition> ...
}  

A TestSuite is primarily a container for TestCases, (a test case being a single executable test — related test cases are gathered into a suite). A TestSuite carries additional, global data members, and the purpose of each is apparent: configuration records the specifics of the system being tested; overallScore is a summary score ('P' for pass, 'F' for fail, 'B' for blocked, and so on) for the entire suite; and dateExec is a millisecond field that identifies the date and time that the test suite was executed. The cases ArrayList carries the individual cases, modeled by the TestCase class:

public class TestCase {
  private String name;
  private String comment;
  private char status;
  private long duration;
  private float result;
  ... <remainder of TestCase definition> ...
}

Each test case has a name, a free-form comment field, a status (pass or fail), a duration and a result (for associating arbitrary data with the test — throughput in bytes-per-second, for example).

To avoid getting bogged down in describing detailed uses of these classes, pretend that you have executed all the test cases of a particular test suite and stored the results in a TestSuite object (along with the associated TestCases in the cases ArrayList). Now, you want to persist that object. That is, you want to create a database, open it, store the TestSuite object (which you will name testsuite) and close the database:

  // Create the database
  new File("testsuites.YAP").delete();
  ObjectContainer db = Db4o.openFile("testsuites.YAP");
  // Store the TestSuite object
  db.set(testsuite);
  // Close the database
  db.close();

Dust your hands off; you're done. (Of course, to keep things simple, you left out the details of instantiating the TestSuite object and its component TestCases.)

As a brief detour before you proceed to querying, stop for a moment and reflect upon what is happening in the above code. In particular, consider what you don't see happening — things that db4o did without having to be told.

First, you didn't have to tell db4o anything about the structure of the TestSuite class; db4o discovered it without your help, using Java's reflection capabilities.

Second, you did not have to advise db4o concerning the cases ArrayList. Not only did you not have to tell db4o about the size of the ArrayList, but you also didn't have to tell db4o about its contents. Just as db4o discovers all it needs to know about TestSuite objects, it discovers all it needs to know about TestCase objects (within the ArrayList).

The result is that, if you regard testsuite as the 'root' object of an arbitrarily large and complex object tree, db4o finds and stores the whole tree without assistance from you. So, storing the base object testsuite also stores the cases ArrayList. This process is referred to as 'persistence by reachability.'

Finally, you did not have to request that db4o guard your call to the set() method with a transaction. Any call that modifies an ObjectContainer (the db4o object that models the database) automatically starts a transaction, unless a transaction is already active. Furthermore, the call to close() terminates the transaction, so the above code is equivalent to this:

  db.startTransaction();
  db.set(testsuite);
  db.commitTransaction();
  db.close();

where startTransaction() and commitTransaction() are fictional methods made up to illustrate this point. db4o does have explicit calls to commit or abort a transaction, but they've been excluded from the original code to make the simplicity apparent.

Querying I: QBE
With objects stored in the database, you can perform querying and retrieval. db4o provides three querying APIs: one simple, one elegant and one complex. Each has its strengths and each is applicable to different query situations. From db4o's perspective, which API you chose is irrelevant because they are all compatible.

You'll begin with the simple API called query by example (QBE).

To employ QBE, you construct a 'template' object of your query target, and pass that to the ObjectContainer's query() method. In essence, you are telling db4o to "go get all the objects that look like this one."

Suppose one of your test suites is named Network Throughput, and you want to retrieve all executions of this test so that you can determine the percentage of test failures (based on the TestSuite's overallScore). Using QBE, Listing 1 shows the code for this function.

In Listing 1, testTemplate is the QBE template object. Note that only its name field holds an actual value; all other object members are either null or zero. Null or zero fields do not participate in QBE queries; hence, the call to db.get() returns all TestSuite objects in the database whose name field matches 'Network Throughput.' The matching objects are returned in the result ObjectSet, through which the code iterates, retrieving objects and calculating the results.

QBE's advantage over the other APIs is its simplicity. There is no separate query language to master. In addition, QBE is typesafe. You cannot create a query analogous to the SQL code:

SELECT TestSuite.overallScore FROM TestSuite
  WHERE TestSuite.name = 200.0

Put another way, since the query is crafted in Java code, the compiler won't let you put a floating-point value into a string field, or vice-versa.

QBE's only apparent disadvantage is that it can only execute 'equals to' queries. In addition, QBE uses the null value to identify string or object reference members that do not participate in the query, and the value zero to indicate non-participating numeric fields. So, for example, you could not contrive a QBE query to fetch all TestCase objects whose result field equals 0.0f.

A more capable query mechanism is required for more elaborate queries.

Querying II: Native Query
db4o's Native Query system is arguably the most flexible query mechanism imaginable. Rather than building queries with a database query language, you construct a Native Query with plain old Java. Native Queries relies on two mechanisms to accomplish: one is a class, Predicate; the other is an interface, QueryComparator. The class includes an overrideable callback method that specifies how objects are chosen from the database (the body of the query). The interface declares a method that designates the sort order of the results.

Suppose you want to locate all the test suites that are executed in a given week, that have an overall score of 'failed,' but for which more than half of the associated test cases are scored as 'passed.' Expressed as a Native Query, this query is straightforward.

First, you extend db4o's Predicate class (see Listing 2).

Notice the use of Java's generic syntax in the class definition. This tells db4o to fetch only TestSuite objects. When the query is executed, TestSuite objects are passed to the match() method (the callback method to which you referred), which returns true for any object that meets the query criteria and false for any object that doesn't meet it.

The code in the match() method first determines if the candidate object is within the one-week range. If so, the code cycles through the member test case objects, calculating the total number of passed cases. If the result is less than 50 percent, the suite is rejected. Otherwise, it passes.

Listing 3 displays the code to perform the query.

Think of Native Queries as the objects of the target class pulled one-by-one from the database and passed to the match() method. Only those for which match() returns true are placed in the results ObjectSet.

What about sorting? If we want to arrange the results in ascending date order, we implement the QueryComparator interface as follows:

public class NativeQuerySort implements QueryComparator<TestSuite>{
  public int compare(TestSuite t1, TestSuite t2)
  {
    if (t1.getDateExec() < t2.getDateExec()) return -1;
    if (t1.getDateExec() > t2.getDateExec()) return 1;
    return 0;
  }
}

The function of the compare() method is clear. Objects that succeed in the query are passed to the compare() method in pairs. The method returns a value less than, equal to or greater than zero if the first object is to be sorted before, equal to or after the second, respectively. To actually impose the sorting on the results, you instantiate the NativeQuerySort object and modify the call to query():

    . . .
// Instantiate the sort class
nqsClass = new NativeQuerySort();
     . . .
ObjectSet results = db.query(nqqClass, nqsClass);
     . . .

The rest of the code remains the same.

You may be skeptical that a Native Query is simply a programmatic trick — that it is no faster than code that fetches all the TestSuite objects and excludes those that don't meet the criteria.

Not quite. Native Queries can be optimized. All you have to do is place a pair of jar files — db4o-xxx-nqopt.jar (where xxx is the db4o version) and bloat.jar — in the CLASSPATH. At query execution time, code in these libraries will optimize constructs (in the match() method) such as primitive comparisons, arithmetic and boolean expressions, and simple object member access. The list of supported optimizations is constantly growing.

Querying III: S.O.D.A.
One of db4o's unique strengths is its layered API arrangement. You can choose either to control db4o from a high level — granting the database engine considerable latitude in how it decides to carry out its operations — or you can access db4o in a more direct fashion. The latter choice places a greater burden on the programmer, who must direct the engine's inner workings more carefully. But the reward is a faster and more capable database.

db4o's S.O.D.A. (Simple Object Data Access) query mechanism is a perfect example of this API layering. S.O.D.A. is actually db4o's internal query system — QBE and Native Queries are translated into S.O.D.A. However, applications can call S.O.D.A. directly.

Suppose you want to locate all of the 'Network Throughput' test suites that have at least one test case whose result — a measurement of bytes per second — is less than a specified value (for example, 100). Listing 4 shows a S.O.D.A. query for this request.

The diagram shown in Figure 1 helps to explain the code in Listing 4. The code builds a query graph that guides the underlying engine. The descend() method creates a branch on this graph that steps downward into the object structure. Each descent builds a node in the tree, to which you can attach a constraint (using the constrain() method). To borrow SQL terminology, constraints specify the 'WHERE' portion of the query. Multiple constraints can be joined with the assistance of conjunction '(and)' or disjunction '(or)' methods. You used and() in the above query to connect the constraints.

As with the other query methods, the results are returned in an ObjectSet, through which you iterate to retrieve the fetched objects.

Note that, while S.O.D.A. is a low-level access method, it is not without intelligent, default behavior. Here's the code that accesses the result field of the cases member:

query.descend("cases").descend("result"). ...

You did not have to tell S.O.D.A. that 'cases' is a collection. So, when the query executes, it invisibly examines the results field of all items in the cases ArrayList, and correctly returns test suites for which any test case meets the search criteria.

A Database for All Seasons
The examples outlined in this article demonstrate the evolution of db4o's querying APIs. However, it is not an organic, random evolution — an API exists because of its survival skills. db4o's querying APIs support easy, efficient and powerful object database querying.

db4o's varied APIs enable you — from within a single database engine — to bring to bear the specific query mechanism appropriate to your task at hand. A QBE query is ideal for quickly fetching a node from within a network of objects, then navigating that network through ordinary object references. A Native Query offers flexibility: If you can write Java, you can write a Native Query. Finally, S.O.D.A. provides the machinery for fast and complex queries that draw their speed from the db4o engine.

Think of db4o's query APIs as a kit of carefully selected tools. You don't have to wrestle your application code in order to accommodate your programs to a one-size-fits-all query architecture and its associated language. Instead, it is a database that db4o gives you what you need.