Apache HBase 1.0 Release Features API Reorg

The Apache Software Foundation (ASF) signed off on version 1.0 of HBase, the "Hadoop database," after seven years of development.

Running on top of the Hadoop Distributed File System (HDFS), Apache HBase 1.0 is designed specifically for Big Data analytics, providing real-time, random access to data structures that can number in the billions of rows and millions of columns. (See the Hacker News comment thread for a discussion about real-world use cases requiring millions of columns.)

Although just reaching v1.0 status, the ASF noted it's already in large-scale use at Web giants such as Facebook Inc., Yahoo Inc., Pinterest and many other large organizations, and even some smaller firms. "Chances are when using your computer or mobile device you interact with a system built with HBase many times daily without ever knowing it," said Andrew Purtell of the project's management committee in a statement yesterday.

Now developers at those companies are going to need to adjust their coding going forward, as a major API restructuring was part of some 1,500 changes made from v.0.98, including many performance improvements, marking the "start of a new era," said Project Release Manager Enis Söztutar in a blog post yesterday.

"HBase's client-level API has evolved over the years," Söztutar said. "To simplify the semantics and to support and make it extensible and easier to use in the future, we revisited the API before 1.0. To that end, 1.0.0 introduces new APIs, and deprecates some of the commonly used client-side APIs (HTableInterface, HTable and HBaseAdmin).

"We advise you to update your application to use the new style of APIs, since deprecated APIs will be removed in the future 2.x series of releases," Söztutar continued.

He said the v1.0 release has three primary goals: to lay a stable foundation for future 1.x releases; to stabilize running an HBase cluster and its clients; and to make versioning and compatibility dimensions explicit.

Calling the release "a major milestone," project exec Michael Stack praised the "army of contributors" and pointed to more things to come.

"There is no rest for the wickedly talented set of contributors who made HBase 1.0," Stack said. "HBase 2.0 is already taking form in our master branch. Users can look forward to new orders of read/write and node count scaling and this time around they won't have to wait seven years on it shipping; HBase 2.0 will be out later this year."

Söztutar went into more detail. "Read replicas phase 2, per column family flush, procedure v2, SSD for WAL or column family data, [and so on] are some of the upcoming features in the pipeline," he said.

About the Author

David Ramel is an editor and writer for Converge360.