Cloudera to Donate Impala and Kudu Big Data Projects to Apache

Cloudera Inc. yesterday proposed to donate its homegrown Impala and Kudu projects to the Apache Software Foundation (ASF) to foster further community development of the technologies used for Big Data analytics.

Although already open source projects hosted on the GitHub code repository with Apache licensing, Impala and Kudu can benefit from ASF stewardship to accelerate growth and expand the diversity of their developer communities, said Cloudera, one of the leading distributors of Apache Hadoop-based enterprise software solutions.

The three-year-old Impala is a popular Hadoop ecosystem component, described by Cloudera as a high-performance C++ and Java SQL query engine for working with data stored in Hadoop-based clusters, while the newer Kudu -- currently in beta -- is a distributed columnar storage engine for working with Hadoop.

"The architecture of Kudu, Impala and Hadoop sets the foundation for the modern analytic database architecture," company execs Marcel Kornacker and Justin Erickson said in a blog post yesterday. "Hadoop and Kudu enable all your data to be flexibly used across all Hadoop SQL and non-SQL processing frameworks, as Kudu can now handle fast-changing data that can't be easily managed in HDFS. Impala continues to be the leading analytic query engine that is uniquely positioned to enable interactive BI and SQL analytics for this platform."

Cloudera has published an official Impala proposal and a Kudu proposal to donate the projects to the ASF.

"We believe that the ASF is the right venue to foster an open source community around Impala's development," Cloudera said in its Impala ASF incubator proposal. "We expect that Impala will benefit from more productive collaboration with related Apache projects, and under the auspices of the ASF will attract talented contributors who will push Impala's development forward at pace."

Most Impala code committers are Cloudera employees, the company said, though it has accepted contributions from other organizations and individual developers, something it hopes to see more of.

"The project has received some contributions from developers outside of Cloudera, from individuals belonging to organizations such as Intel and Google, from hobbyists and from students using Impala to advance their understanding of distributed databases," Cloudera said in its proposal. "The project attracted an active user community as well. We hope to continue to encourage contributions from these developers and community members and grow them into committers after they have had time to continue their contributions."

Cloudera expanded on that idea in a statement yesterday. "Impala has been Apache-licensed since its public launch and has since become an open source standard in the Hadoop ecosystem," Cloudera said. "Since opening up Impala for community contributions earlier this year, there has been increasing development activity, with Google developing integrations between Impala and BigTable, as well as contributions from Arcadia Data, Intel and others. By donating the project to the ASF, this diverse community can further drive the vision of Impala from its well-established foundation."

Although much less mature, Cloudera has similar hopes for Kudu, which was released as a beta in September and has seen contributions from Xiaomi, Intel and Dropbox. Also, Dremio has worked on integrating Kudu with Apache Drill and has explored using Kudu in a production use case, Cloudera said.

"For the first time, the community has both an interactive query engine with Impala and an updateable storage engine with Kudu -- enabling fast analytic use cases on data as it changes," Cloudera said. "In a short period since its release, Kudu has experienced widespread interest within the open source community. Through its application to join the ASF, Kudu will continue to benefit from the broader development community and the collaboration of these projects will dramatically expand the use cases they can serve as more companies look to develop real-time analytic applications."

About the Author

David Ramel is an editor and writer for Converge360.