News

Google Offers Web-Based Data Analytics with Open Source Tool

Google yesterday unveiled a Web-based interactive tool for developers to explore and analyze data stored on the company's cloud platform.

Google Cloud Datalab is a beta service, announced at an event in Paris, based on open sourced technology. It's designed to help developers explore, transform, visualize and process data stored on BigQuery, Compute Engine and Google Cloud Storage. BigQuery is a managed data warehouse for Big Data analytics; Compute Engine is the company's Infrastructure-as-a-Service (IaaS) offering; and Cloud Storage provides an object storage service.

The new tool, which runs as an application on the company's App Engine service, is built with open source Jupyter technology, which uses the "notebook" format for interactive data science and scientific computing using many different languages. Jupyter evolved from the IPython project, which uses notebooks as "an interactive computational environment" in which developers can integrate code execution with rich text and media, mathematics and plots. The notebook approach also facilitates collaboration and sharing of development projects.

"Cloud Datalab combines the power of Google BigQuery and Google Cloud Storage with familiar data science ecosystems built around IPython, removing the need for complex integration between products," Google said on the new tool's Web site.

XXX
[Click on image for larger view.] Using Datalab for Programming Language Correlation (source: Google)

Along with Python, developers can use SQL for their interactive queries, or construct BigQuery User Defined Functions with JavaScript. Because it uses the notebook format, Google said the tool can leverage Git-based source control, while also providing the option to sync up with non-Google source code services such as GitHub and Bitbucket.

"Cloud Datalab removes common barriers of getting started," Google said in a blog post yesterday. "Instead it provides a ready-to-use, fully setup, secure, multi-user environment integrated with source control for developers and data scientists. With Cloud Datalab, you can focus on your data analysis tasks immediately."

Developers wishing to customize the tool can fork the GitHub project or submit pull requests, Google said. The project is packaged as a ready-to-go Docker container with Jupyter/IPython and several Python libraries. Developers can use the beta tool for free, which might be subject to change, but must pay for the other Google services used.

About the Author

David Ramel is an editor and writer for Converge360.