NoSQL Toolbox Helps Pick the Right Database
A start-up founded by academic researchers has created the NoSQL Toolbox, a method for choosing the best database for a given job.
The tool comes from German company Baqend GmbH, which provides a namesake Back-end-as-a-Service (BaaS) platform for creating data-driven Web sites and mobile apps. The company was formed by database researchers at the University of Hamburg, including CEO Felix Gessert, who announced its NoSQL Toolbox in a blog post yesterday.
Titled "NoSQL Databases: a Survey and Decision Guidance," the post explains the background of the tool, which helps companies develop a decision tree for mapping requirements to NoSQL database systems. The tool is designed to unravel the complexity of various competing systems and map their capabilities to real-world needs. Gessert cited that complexity in explaining the origins of the NoSQL Toolbox.
"The heterogeneity and diversity of the numerous existing systems impede the well-informed selection of a data store appropriate for a given application context," Gessert said. In other (non-researchese) words: The many NoSQL databases available makes it hard to pick the right one for a job.
"Instead of contrasting the implementation specifics of individual representatives, we propose a comparative classification model that relates functional and non-functional requirements to techniques and algorithms employed in NoSQL databases," Gessert said. "This NoSQL Toolbox allows us to derive a simple decision tree to help practitioners and researchers filter potential system candidates based on central application requirements."
The post examines the NoSQL universe in detail, discussing data models (key-value, wide-column, document and graph) and CAP theorem classes, concerning consistency, availability and partition-tolerance. It then explains how the NoSQL Toolbox maps database techniques -- sharding, replication, storage management and query processing -- to functional properties (joins, sorting and so on) and non-functional properties (scalability, elasticity and so on).
That tool can be used to form a decision tree that, as illustrated in an example, reveals that CouchDB and MongoDB are the best choices to back Web sites (with SimpleDB also in the running), while Hadoop, Spark and Parallel DWH are the best choices for Big Data, with Cassandra, HBase, Riak and MongoDB also offered as viable choices in that category.
"Naturally, this view on the problem space is not complete, but it vaguely points towards a solution for a particular data management problem," Gessert said.
The post also winnows down some of the research into easily digestible graphics that provide comparisons of the MongoDB, Redis, HBase, Riak, Cassandra and MySQL databases. The graphics make it easy to discern, for example, that while all of those databases employ caching, only MongoDB provides in-memory storage, and only MySQL leverages shared-disk and transaction protocol techniques.
Gessert also provides some ready-made guidance based on the research.
He said: "If the data volume exceeds the limits of a single machine, the choice of the right system depends on the prevalent query pattern: When complex queries have to be optimized for latency, as for example in social networking applications, MongoDB is very attractive, because it facilitates expressive ad-hoc queries. HBase and Cassandra are also useful in such a scenario, but excel at throughput-optimized Big Data analytics, when combined with Hadoop."
"In summary," Gessert concluded, "we are convinced that the proposed top-down model is an effective decision support to filter the vast amount of NoSQL database systems based on central requirements. The NoSQL Toolbox furthermore provides a mapping from functional and non-functional requirements to common implementation techniques to categorize the constantly evolving NoSQL space."
In other words: The NoSQL Toolbox can help organizations choose the best database for a given job.
David Ramel is an editor and writer for Converge360.