Google Adds to Big Data Connection Options
Google Inc. yesterday introduced two new connectors for accessing Hadoop data stored in the company's cloud storage platform.
Released as previews, BigQuery and Cloud Datastore connectors give developers more options for working with Hadoop on the Google cloud. The two new connectors join the existing Cloud Storage connector for Hadoop.
BigQuery provides SQL-like queries against Big Data stores, while Cloud Datastore is a managed service that implements NoSQL-like queries. Cloud Storage is a RESTful Infrastructure as a Service (IaaS) storage service that can store massive amounts of data in objects of up to 5TiB in size in buckets with unique keys. The Cloud Storage connector lets users run MapReduce jobs against Hadoop data using the Cloud Storage file system. Google cloud developers wanting to work with Hadoop clusters had the option of using the Cloud Storage file system or the Hadoop Distributed File System (HDFS). While HDFS is the default file system for using Apache Hadoop, Google recommended using Cloud Storage as the default file system for working with Hadoop clusters.
"These three connectors allow you to directly access data stored in Google Cloud Platform's storage services from Hadoop and other Big Data open source software that use Hadoop IO abstractions," Google said in a blog post. "As a result, your valuable data is available simultaneously to multiple Big Data clusters and other services, without duplications. This should dramatically simplify the operational model for your Big Data processing on Google Cloud Platform."
Google is providing MapReduce code samples to developers who want to get started using the BigQuery connector, using the Datastore connector and using the latter for reading data and the former for publishing results.
David Ramel is the editor of Visual Studio Magazine.