Databricks Spark Platform Gets Deep Learning Boost

Databricks Inc. said it's wedding Big Data with deep learning in the latest update to its Apache Spark-based platform.

The new support for deep learning -- a variant of machine learning -- means data developers and data scientists can use the platform to more easily create deep learning models, leveraging GPU computing power and new integration with various related code libraries.

Deep learning, according to Wikipedia, is: "is a branch of machine learning based on a set of algorithms that attempt to model high level abstractions in data by using a deep graph with multiple processing layers, composed of multiple linear and non-linear transformations."

It can be used for projects ranging from recognizing images and supply relevant captions to machine log analysis and risk detection. It's but one segment of new-age software development related to artificial intelligence, cognitive computing and machine learning.

"Data scientists looking to combine deep learning with Big Data -- whether it's recognizing handwriting, translating speech between languages, or distinguishing between malignant and benign tumors -- can now utilize Databricks for every stage of their workflow, from data wrangling to model tuning," the company said in a news release last week. "Databricks is the first to integrate these diverse workloads in a fast, secure, and easy-to-use Apache Spark platform in the cloud."

Databricks Deep Learning
[Click on image for larger view.] Databricks Deep Learning (source: Databricks)

This isn't Databricks' first foray into that technology space, as the company previously provided a library to accommodate Google's TensorFlow deep learning framework on Spark, a data-processing technology created by the company's founders. [Editor's Note: This article has been revised to remove a reference to a DreamSpark program from Databricks that was actually an April Fools' joke. We regret the error.]

In furthering that effort, the company added support for applying GPU horsepower to working with TensorFrames, or TensorFlow wrappers running on Spark DataFrames (distributed data collections organized by named columns).

"With Spark deployments tuned for GPUs, plus pre-installed libraries and examples, Databricks offers a simple way to leverage GPUs to power image processing, text analysis, and other machine learning tasks," the company said in a blog post. "Users will benefit from 10x speedups in Deep Learning, automated configuration of GPU machines, and smooth integration with Spark clusters. The feature is available by request, and will be generally available within weeks."

The company cited a recent survey of Spark users -- covered here -- that indicated machine learning was a key growth area for the framework, seeing a 38 percent increase in usage last year.

Databricks said its new enhancements can lead to applications such as:

  • More timely and accurate cancer detection for healthcare providers: To read and interpret pathology images with higher accuracy than humans.
  • Faster drug discovery for pharma: To predict therapeutic uses of drugs at earlier stages to speed up the development and sales pipelines.
  • More capable artificial intelligence, such as language translation: To translate spoken speech with computers at an accuracy that rivals human performance.

"Today's dynamic data teams are applying a broad range of analytic tools to more data, but requiring insights and faster ROI," the company quoted analyst Tony Baer at Ovum as saying. "With the Databricks' platform, they can easily utilize the latest innovations, whether it's Spark Streaming or deep learning, enabling them to build and deploy sophisticated business applications, in a simpler and faster way."

About the Author

David Ramel is an editor and writer for Converge360.