News

Cloudera and MongoDB Enter Big Data Partnership

Cloudera Inc. and MongoDB Inc. yesterday announced a strategic partnership to integrate their Big Data technologies.

Cloudera is one of the main players in the Big Data space, with a popular Hadoop distribution and accompanying bundled packages for enterprises. MongoDB is billed as the most widely used NoSQL database and is often used in Big Data analytics.

Details of the partnership were scant, though the companies said they had the lofty goal to "transform how organizations approach Big Data."

One immediate aspect of the partnership was the certification of the MongoDB Connector for Hadoop to be used with Cloudera Enterprise 5, the company's latest package that bundles the Cloudera Distribution Including Apache Hadoop (CDH) with a subscription to Cloudera Manager, a Hadoop administration tool, and technical support.

"Certifying MongoDB's Connector for Hadoop on Cloudera Enterprise 5 was our first step to help our joint customers," Kelly Stirman, director of product at MongoDB, told this site. "Resources are being invested even further to match our companies' joint vision, including building a tighter integration of the next version of the MongoDB Connector for Hadoop."

The companies said no further news on the partnership will be forthcoming until the June 24 kick-off of the MongoDB World conference in New York.

The MongoDB Connector for Hadoop plug-in lets developers more easily use MongoDB as an input source or output destination for processing Big Data with the open source Hadoop framework. The connector obviates the need to use custom code or cumbersome import/export scripts to move data around. It takes advantage of multi-core parallelism, has full integration with the Hadoop and Java Virtual Machine (JVM) ecosystems, is compatible with Amazon Elastic MapReduce, and can use local filesystems, Hadoop Distributed File System (HDFS), or S3 to read and write backup files, MongoDB said in a Webinar last year.

"The connector presents MongoDB as a Hadoop-compatible file system allowing a MapReduce job to read from MongoDB directly without first copying it to HDFS, thereby eliminating the need to move terabytes of data across the network," MongoDB said last year when the connector was updated. "MapReduce jobs can pass queries as filters, so avoiding the need to scan entire collections, and can also take advantage of MongoDB's rich indexing capabilities including geospatial, text-search, array, compound and sparse indexes."

Beyond the connector, the companies hinted at more to come regarding the collaboration. "More than a simple technology integration, the partnership brings the two companies' leadership to bear in enabling enterprises to fundamentally rethink how data can be shared and put to work across the enterprise," they said in a statement. "The combination of Cloudera Enterprise and MongoDB will enable customers to easily develop, operate and manage Big Data infrastructure that powers modern applications."

Yesterday's announcement was the latest in a string regarding MongoDB, which earlier this month released a major upgrade of its database. Last week, Microsoft announced new high-memory MongoDB instances were available in its Microsoft Azure cloud platform.

About the Author

David Ramel is an editor and writer for Converge360.