Cloudera Web Tool Does Self-Service Data Science

Cloudera Inc. introduced a new Web-based tool for enterprise self-service data science, seeking to solve technical and organizational problems that can stifle machine learning and advanced analytics projects.

Cloudera Data Science Workbench, now in a private beta program, stems from technology the company obtained in last year's acquisition of startup

The Web-based tool is aimed at enterprises that want to foster machine learning (ML) and Big Data analytics innovations without depending on data scientists with a hard-to-find combinations of skill sets that include software development and business domain expertise. Such skill sets are so valued that the position of data scientist headed Glassdoor's recent report on the "50 Best Jobs in America."

While such people are rare and pricy, organizations can enable other employees already doing related work, such as statisticians, quantitative researchers, actuaries, analysts and so on, Cloudera indicated.

Using the Cloudera Data Science Workbench
[Click on image for larger view.] Using the Cloudera Data Science Workbench (source: Cloudera)

Rather than full-scale projects using Apache Hadoop or Apache Spark frameworks with programming languages such as Java or Scala, this group of people usually works on smaller desktop-based projects using more localized tools and programming languages such as Python and R.

The new tool seeks to address issues that keep such "real-world data scientists" from making bigger enterprise impacts with their projects, providing collaboration, scale, compliance, security and more. Specifically, Cloudera said, the Data Science Workbench lets data scientists:

  • Use R, Python or Scala on the cluster from a Web browser, with no desktop footprint.
  • Install any library or framework within isolated project environments.
  • Directly access data in secure clusters with Spark and Apache Impala.
  • Share insights with their team for reproducible, collaborative research.
  • Automate and monitor data pipelines using built-in job scheduling.

"Built using container technology, Cloudera Data Science Workbench offers data science teams per-project isolation and reproducibility, in addition to easier collaboration," Cloudera said in a blog post. "It supports full authentication and access controls against data in the cluster, including complete, zero-effort Kerberos integration. Add it to an existing cluster, and it just works."

Cloudera promised more information to come on the beta project in the coming weeks.

About the Author

David Ramel is an editor and writer for Converge360.