Pivotal Releases Hadoop Distribution Based on Open Data Platform

Pivotal Software Inc. released its first Apache Hadoop distribution based on the Open Data Platform, a controversial new industry consortium designed to further standardized Big Data initiatives.

The new Pivotal HD distribution based on Hadoop is part of the company's enhanced Pivotal Big Data Suite, announced at this week's EMC World conference in Las Vegas (Pivotal originated as an EMC/VMware "spin-out" joint venture in 2013).

This edition of Pivotal HD is the first one to be based on the Open Data Platform (ODP), an industry group Pivotal formed with Hortonworks Inc., IBM and other partners to promote and advance "the state of Apache Hadoop and Big Data technologies for the enterprise."

Despite its stated goals and open aspirations, the consortium has been heavily criticized, especially by noted Hortonworks competitors Cloudera Inc. and MapR Technologies Inc. Critics claim the group impinges upon the mission of the official Hadoop steward, the Apache Software Foundation, and MapR characterized it as "misdefined" and "vendor-biased."

Nevertheless, Pivotal has gone on to base its products on the "ODP core," leveraging Apache Hadoop 2.6 technologies.

Among these is an updated version of Apache Spark, the hot open source project that provides a speedy execution engine improving upon the original MapReduce component in the Hadoop ecosystem.

Numerous other Apache components were also updated, including: Ranger and Knox for improved security; Pig and Hive for scripting and querying; HBase for a non-relational database option; Zookeeper and Oozie for coordination and orchestration; Ambari (in addition to Nagios and Ganglia) for monitoring; and Tez for data processing.

"Pivotal Big Data Suite is designed to provide customers with better stability, management, security, monitoring and data processing capabilities in the Hadoop stack," the company said in a statement. "This allows enterprises to off-load more business-critical workloads to Hadoop, to store and process large volumes of data at lower costs and in way that is compliant with policies and regulations."

Pivotal also announced several other improvements to its Big Data Suite. For example, it features the updated Pivotal Greenplum Database, an open source offering based on the PostgresSQL database that now provides up to 100x performance improvements.

Those performance improvements are boosted by the new Pivotal Query Optimizer for Big Data analytics. Pivotal described it as "the most advanced cost-based query optimizer for Big Data."

"Pivotal Query Optimizer has been proven to to deliver significant performance boosts to Pivotal HAWQ, the world's most advanced enterprise SQL on Hadoop engine and to Pivotal Greenplum Database," the company said.

The updated Pivotal Big Data Suite is available now.

About the Author

David Ramel is an editor and writer for Converge360.