Upgrades of enterprise Hadoop-based distributions from two of the top three vendors -- Cloudera Inc. and Hortonworks Inc. -- dominated Big Data news this week, while the third, MapR Technologies Inc., added the MapR-DB NoSQL database to its offering.
Cloudera Enterprise 5.2 features better security, new cloud capabilities, the addition of Impala 2.0 as the analytic database on Hadoop and more partner technology integration, the company said.
The enterprise distribution comprises four of the company's products: Cloudera Distribution Including Apache Hadoop (CDH) 5.2, Cloudera Manager 5.2, Cloudera Director 1.0, and Cloudera Navigator 2.1.
CDH 5.2 is the company's Apache Hadoop distribution and related projects. The company claims this distribution is the only one in the market to feature unified batch processing, interactive SQL queries, interactive search and role-based access controls. The distro featured improvements to Hadoop, support for a new OS -- Ubuntu 14.04 (Trusty) -- and feature upgrades to a slew of related Apache Software Foundation projects, including HBase, Hive, Impala, Spark and many more.
"This release reflects our continuing investments in Cloudera Enterprise's main focus areas, including security, integration with the partner ecosystem, and support for the latest innovations in the open source platform (including Impala 2.0, its most significant release yet, and Apache Hive 0.13.1)," said Cloudera's Justin Kestelyn in a blog post. "It also includes a new product, Cloudera Director, that streamlines deployment and management of enterprise-grade Hadoop clusters in cloud environments; new component releases for building real-time applications; and new support for significant partner technologies like EMC Isilon."
The company said Cloudera Enterprise 5.2 is now generally available with its enterprise offerings.
- Hortonworks, which differentiates its Hortonworks Data Platform (HDP) 2.2 from competitors as being a completely open source platform sans any proprietary extensions, said the new release features Hadoop YARN -- an improvement on the original MapReduce component of the Hadoop ecosystem -- as its architectural center. Key areas of improvement include governance, security and operations.
"HDP 2.2 includes the most recent innovations that have been developed within Hadoop and its related ecosystem of projects," the company said in a statement. "In all, HDP 2.2 comprises more than 100 new and advanced features that integrate with YARN and allow organizations to simultaneously utilize batch, interactive and real-time methods to interact with a single set of data stored within Hadoop."
Some of the new functionalities listed by the company include new and improved YARN-ready engines; enterprise SQL at Hadoop scale, featuring Stinger.next; Apache Argus for better centralized security administration and enforcement of policies; widespread management and monitoring improvements; and business continuity enhancements.
A tech preview of HDP 2.2 is now generally available, said Hortonworks, which also highlighted partnerships with HP, Microsoft, Red Hat, SAP and Teradata.
- MapR announced its MapR-DB database management system was added to its free MapR Community Edition, now available for download. It's also available in a mission-critical-grade Database Edition. The company described MapR-DB as "an enterprise-grade, high-performance, in-Hadoop NoSQL" database that provides real-time operational analytics.
"Operational analytics entails analyzing live, operational data immediately to deliver outputs in real time," according to MapR. "With MapR-DB and Hadoop on the same cluster, you add immediacy to the analysis of your live, operational data, without needing to copy data across separate clusters. This converges the real-time data access of NoSQL databases with the large-scale parallel processing of Hadoop. A few examples of operational analytics use cases include customer service optimization, real-time ad targeting, real-time personalization, and logistics route optimization."
MapR said operational analytics use cases also include fraud detection and prevention; real-time product recommendations; and user authentication. These require a balanced combination database reads and writes in analytics operations, the company said. At the same time, it said enterprises are looking for real-time transactional or streaming applications for use with Internet of Things (IoT) input, sensor data and analytics logs. Usually, MapR said, separate technologies address the database operations and analytics jobs, but its platform simplifies that complicated architecture and handles both requirements.
"MapR-DB is an in-Hadoop database that integrates natively with Hadoop," the company quoted Forrester Research analyst Noel Yuhanna as saying. "It supports automatic sharding and re-balancing of the cluster to support broader scale. MapR customers have deployed all types of workloads including transactional, analytical, predictive analytics, and mixed. Since MapR provides NoSQL key-value integration with Hadoop, MapR customers often deploy a mixed workload in a single cluster."
Myriad other announcements this week -- the Strata + Hadoop World conference was held in New York ends today -- include: