Cloudera, Hortonworks Update Big Data Platforms

Big Data heavyweights Cloudera and Hortonworks each upgraded their products last week.

Cloudera announced that its Cloudera Enterprise 5 data management platform was generally available, while Hortonworks announced its Hortonworks Data Platform 2.1 was available as a technical preview.

The two companies, along with MapR, have been battling for supremacy in the exploding Big Data market and sometimes trading barbs back and forth. They each take different approaches, with Hortonworks concentrating on improving core Hadoop products and Cloudera focusing on providing a suite of products constituting an "enterprise data hub."

In fact, Cloudera positioned the Enterprise 5 release as "the evolution of the platform from a mere Apache Hadoop distribution into an enterprise data hub." The new offering -- a combination of the company's Cloudera Distribution Including Apache Hadoop (CDH) 5.0 and its Cloudera Manager 5.0 tool for managing end-to-end CDH clusters -- includes a host of new components and enhancements.

New components in the release include recent versions of Apache Spark; Apache Crunch; Parquet; Kite SDK; and Apache Avro. It's based on the Apache Hadoop 2.3.0 distribution and includes Hadoop Distributed File System (HDFS) caching and an NFS gateway. Other technologies in the new platform include YARN and MapReduce, Apache HBase, Impala, Apache Hive and many more.

"Analytic platforms are assuming multiple personalities," said Tony Baer, a principal analyst at Ovum quoted by Cloudera. "Cloudera Enterprise 5 is clearly in line with this trend as it supports a wide range of analytic styles venturing well beyond MapReduce to interactive SQL, search, and in-memory computing. Cloudera Enterprise 5 is also extending Hadoop's footprint with regard to security, data management, and governance. For instance, the new data lineage capabilities are the first building blocks that could eventually lead to a fully audited experience."

Cloudera said some 100 partners have invested resources in order to certify on the new platform before its release, including companies such as HP, IBM, Oracle and Dell. Cloudera Enterprise is available as a 60-day trial download.

Hortonworks, meanwhile, said its Hortonworks Data Platform (HDP) 2.1 incorporates "the very latest innovations from the Hadoop community in an integrated, tested and completely open enterprise data platform." The company said new enhancements span all aspects of its Enterprise Hadoop, including data management, data access, integration, governance, security and operations.

One notable component is the Stinger Initiative, a project designed to boost the performance of Apache Hive and SQL querying in Hadoop. "Hadoop users and developers now have native interactive SQL query at petabyte scale in Apache Hive," the company said.

Data governance improvements cited by Hortonworks come with the open source Apache Falcon project, designed to provide a reliable, simple and repeatable framework for managing data flow in Hadoop.

On the security front, Apache Knox provides perimeter security through one point of control for authentication and access for clusters. Other security measures ensure secure operations across multiple layers through features such as access control lists for HDFS and grant and revoke functions for Apache Hive, Hortonworks said.

The company also listed other improvements such as stream processing via Apache Storm, searching through Hadoop data with Apache Solr and advanced operations available through Apache Ambari.

Hortonworks said a single virtual machine (VM) download of HDP 2.1 is available now, while complete versions of the release for Linux and Windows will be available later this month.

About the Author

David Ramel is an editor and writer for Converge360.