Apache Cassandra Release 0.6 Brings Hadoop Support

The Apache Software Foundation (ASF) has released version 0.6 of the Apache Cassandra open-source distributed database.
Three major new features are touted in Cassandra 0.6. The first, and most notable, is the ability to run Apache Hadoop queries directly against data in the distributed Cassandra store.

The second new feature is architectural. The addition of an integrated row cache means you no longer need a separate caching layer. Third, Cassandra write processes have been optimized, providing a claimed 30 percent increase in speed.

Cassandra is a "NoSQL" non-relational data store. It is designed around a structured key-value store and uses an eventual consistency transaction model. The data model is decentralized and fault tolerant as all cluster nodes are identical and can thus be replaced without incurring downtime. In addition, writes and reads are highly tunable to suit the performance and consistency needs of the deployment.

This is the first Cassandra release since it became a top-level ASF project. Cassandra was promoted from the ASF Incubator program in February 2010. The Cassandra project began at Facebook to run their Inbox Search. It became an open-source Google code project in 2008, and moved into the ASF Incubator in 2009.

While still a relatively young project, Cassandra has already seen use by a number of high-profile organizations include Cisco, Digg, Facebook, IBM, Rackspace, Reddit, SimpleGeo and Twitter. According to the project web site, a production cluster with "over 100 TB of data in over 150 machines" is already in use.

Hadoop, another open-source ASF top-level project, provides a framework for running applications in a parallel, highly distributed manner on large commodity computing clusters, using the MapReduce programming model developed by Google. Hadoop also provides a highly distributed file system.

About the Author

Terrence Dorsey is a technical writer, editor and content strategist specializing in technology and software development. Over the last 25-plus years he has worked on developer-focused projects at ESPN, The Code Project, and Microsoft. Read his blog at or follow @tpdorsey on Twitter.