Cloudera's 'Self-Service Tool for Data Scientists' Generally Available Today

Newly public company Cloudera Inc. today announced its "self-service tool for data scientists" is now generally available after a preview program.

The company in March announced a private beta for Cloudera Data Science Workbench, a Web-based tool that allows for Big Data analytics -- with a focus on machine learning (ML) -- using the Python, R and Scala programming languages in customizable project environments that can leverage different libraries and frameworks as needed.

The company said security and compliance functionality comes with support for Hadoop authentication, authorization, encryption and governance. Operating on-premises or in the cloud, it lets data scientists use popular Big Data tools Apache Spark and Apache Impala to directly access data in those secure Hadoop clusters.

"We are entering the golden age of machine learning and it's all about the data. However, data scientists continue to struggle to build and test new analytics projects as fast as they would like, particularly in large scale environments," said exec Charles Zedlewski in a news release.

Using the Workbench with Python
[Click on image for larger view.] Using the Workbench with Python (source: Cloudera)

"The Data Science Workbench is a self-service tool that accelerates the ability to build, scale and deploy machine learning solutions using the most powerful technologies. This means that data scientists now have the freedom to share, collaborate and manage their data in a way that best suits them and their enterprise, resulting in an easier and faster path to production."

According to the product's Web site, data scientists can use the workbench to set up analytics pipelines as they wish, using functionality such as built-in scheduling, monitoring and e-mail alerting. Such customized pipelines allow for the development and prototyping of new ML projects before they go into to production, Cloudera said.

In citing the tool's ability to integrate with various deep learning frameworks, the company has focused on Intel's BigDL, a library for deep learning using Spark that was open sourced by Intel.

"The benefits of BigDL integration into Data Science Workbench include the ability to leverage deep learning libraries and tactics on CPU architecture without any additional hardware considerations or separate environments," the company said." The combination provides a convenient way to create Spark data science pipelines natively and integrate them with deep learning library (BigDL) and other Spark/Hadoop components on the Cloudera Data Science Workbench." More on the BigDL integration can be found in the blog post, "BigDL on CDH and Cloudera Data Science Workbench," published Friday.

About the Author

David Ramel is an editor and writer for Converge360.