Databricks Unveils Community Edition for Learning Spark, Security Framework

Databricks Inc., a commercial champion of the open source Apache Spark Big Data analytics project founded by the technology's creators, introduced a new free community edition of its Spark-based data platform along with a new security framework.

At the company's ongoing Spark Summit 2016 conference, the Databricks Community Edition (DCE) was moved to general availability following a four-month beta program that attracted more than 8,000 users.

It provides an IDE, training materials and sample application notebooks free to use by data analysts, data scientists and developers interested in learning about Spark and exploring the increasingly popular technology that has recently seen massive investments by major industry players such as IBM and Microsoft along with many others.

Databricks last month previewed the upcoming "shiny new toy" -- Apache Spark 2.0 -- noted for new "structured streaming" enhancements and Project Tungsten, which "focuses on substantially improving the efficiency of memory and CPU for Spark applications, to push performance closer to the limits of modern hardware."

A Databricks Visualization Example
[Click on image for larger view.] A Databricks Visualization Example (source: Databricks)

The company sees the free DCE as a means to attract even more developers to the technology, which has been described as one of the most active open source projects under development.

"By making DCE generally available, we are looking to fuel the growth of the community by introducing Apache Spark to first-time users," company exec Ion Stoica said in a blog post yesterday. "Finally, by training a new generation of data scientists and engineers, we hope to mitigate the ever growing scarcity of data specialists."

DCE users get access to a 6 GB micro-cluster, a cluster manager and notebook environments that can be used to prototype simple applications, along with the previously mentioned learning resources and associated Massive Open Online Courses (MOOC).

"This year we've seen explosive growth for the Apache Spark project and all signs indicate the pace will only accelerate as the community expands even more," said Matei Zaharia, cofounder and CTO at Databricks. "Databricks Community Edition has created an ideal environment for learning Apache Spark. Developers of all backgrounds can now use Databricks Community Edition to learn Spark and mitigate the acute Spark skills gap."

Also, announced today at the Spark Summit, is the advancement of a new security initiative, the Databricks Enterprise Security (DBES) framework.

"DBES combines encryption, integrated identity management, role-based access control, data governance and compliance standards to secure Apache Spark workloads in an end-to-end security framework," the company said in a news release today.

Claiming to be the only company providing comprehensive enterprise security on top of Spark implementations, Databricks announced the completion of the first phase of the framework.

"DBES builds upon the extensive Databricks access management and encryption functionalities that already exist," company exec Dave Wang said in a blog post today. "With the completion of DBES Phase One today, enterprises gain the ability to control access to Apache Spark clusters on an individual basis, manage user identity with a SAML 2.0 compatible identify management provider service, and end-to-end auditability."

In other news from the Spark Summit in San Francisco, IBM announced a cloud-based Spark development environment, while Microsoft, MapR Technologies and several other companies announced new Spark-based products.

About the Author

David Ramel is an editor and writer for Converge360.