Big Data Product Watch 10/17/14: Big Three Make Big Moves

Upgrades of enterprise Hadoop-based distributions from two of the top three vendors -- Cloudera Inc. and Hortonworks Inc. -- dominated Big Data news this week, while the third, MapR Technologies Inc., added the MapR-DB NoSQL database to its offering.

  • Cloudera Enterprise 5.2 features better security, new cloud capabilities, the addition of Impala 2.0 as the analytic database on Hadoop and more partner technology integration, the company said.

    The enterprise distribution comprises four of the company's products: Cloudera Distribution Including Apache Hadoop (CDH) 5.2, Cloudera Manager 5.2, Cloudera Director 1.0, and Cloudera Navigator 2.1.

    CDH 5.2 is the company's Apache Hadoop distribution and related projects. The company claims this distribution is the only one in the market to feature unified batch processing, interactive SQL queries, interactive search and role-based access controls. The distro featured improvements to Hadoop, support for a new OS -- Ubuntu 14.04 (Trusty) -- and feature upgrades to a slew of related Apache Software Foundation projects, including HBase, Hive, Impala, Spark and many more.

    CDH 5.2 features more SQL functionality and compatibility with Impala 2.0.
    [Click on image for larger view.] CDH 5.2 features more SQL functionality and compatibility with Impala 2.0.
    (source: Cloudera Inc.)

    "This release reflects our continuing investments in Cloudera Enterprise's main focus areas, including security, integration with the partner ecosystem, and support for the latest innovations in the open source platform (including Impala 2.0, its most significant release yet, and Apache Hive 0.13.1)," said Cloudera's Justin Kestelyn in a blog post. "It also includes a new product, Cloudera Director, that streamlines deployment and management of enterprise-grade Hadoop clusters in cloud environments; new component releases for building real-time applications; and new support for significant partner technologies like EMC Isilon."

    The company said Cloudera Enterprise 5.2 is now generally available with its enterprise offerings.

  • Hortonworks, which differentiates its Hortonworks Data Platform (HDP) 2.2 from competitors as being a completely open source platform sans any proprietary extensions, said the new release features Hadoop YARN -- an improvement on the original MapReduce component of the Hadoop ecosystem -- as its architectural center. Key areas of improvement include governance, security and operations.

    "HDP 2.2 includes the most recent innovations that have been developed within Hadoop and its related ecosystem of projects," the company said in a statement. "In all, HDP 2.2 comprises more than 100 new and advanced features that integrate with YARN and allow organizations to simultaneously utilize batch, interactive and real-time methods to interact with a single set of data stored within Hadoop."

    HDP 2.2 includes more than 100 new features and closes thousands of issues.
    [Click on image for larger view.] HDP 2.2 includes more than 100 new features and closes thousands of issues.
    (source: Hortonworks Inc.)

    Some of the new functionalities listed by the company include new and improved YARN-ready engines; enterprise SQL at Hadoop scale, featuring; Apache Argus for better centralized security administration and enforcement of policies; widespread management and monitoring improvements; and business continuity enhancements.

    A tech preview of HDP 2.2 is now generally available, said Hortonworks, which also highlighted partnerships with HP, Microsoft, Red Hat, SAP and Teradata.

  • MapR announced its MapR-DB database management system was added to its free MapR Community Edition, now available for download. It's also available in a mission-critical-grade Database Edition. The company described MapR-DB as "an enterprise-grade, high-performance, in-Hadoop NoSQL" database that provides real-time operational analytics.

    "Operational analytics entails analyzing live, operational data immediately to deliver outputs in real time," according to MapR. "With MapR-DB and Hadoop on the same cluster, you add immediacy to the analysis of your live, operational data, without needing to copy data across separate clusters. This converges the real-time data access of NoSQL databases with the large-scale parallel processing of Hadoop. A few examples of operational analytics use cases include customer service optimization, real-time ad targeting, real-time personalization, and logistics route optimization."

    MapR-DB reportedly eliminates unnecessary Java virtual machines and the 
corresponding overhead.
    [Click on image for larger view.] MapR-DB reportedly eliminates unnecessary Java virtual machines and the corresponding overhead.
    (source: MapR Technologies Inc.)

    MapR said operational analytics use cases also include fraud detection and prevention; real-time product recommendations; and user authentication. These require a balanced combination database reads and writes in analytics operations, the company said. At the same time, it said enterprises are looking for real-time transactional or streaming applications for use with Internet of Things (IoT) input, sensor data and analytics logs. Usually, MapR said, separate technologies address the database operations and analytics jobs, but its platform simplifies that complicated architecture and handles both requirements.

    "MapR-DB is an in-Hadoop database that integrates natively with Hadoop," the company quoted Forrester Research analyst Noel Yuhanna as saying. "It supports automatic sharding and re-balancing of the cluster to support broader scale. MapR customers have deployed all types of workloads including transactional, analytical, predictive analytics, and mixed. Since MapR provides NoSQL key-value integration with Hadoop, MapR customers often deploy a mixed workload in a single cluster."

Myriad other announcements this week -- the Strata + Hadoop World conference was held in New York ends today -- include:

  • Actian Corp. launched the Actian Analytics Platform – Express Hadoop SQL Edition, described as "a free community version of the industry's first end-to-end analytics platform running 100 percent inside of Hadoop."
  • Predixion Software, which develops cloud-based predictive analytics software, released the latest version of its platform, Predixion Insight 4.0. "The new release expedites the deployment of predictive analytics directly to the point of front-line decisions, and expands predictive capabilities across a wider variety of production environments, such as applications, databases, data stores, real-time engines, devices and machines," the company said.
  • Pentaho Corp. announced new capabilities in automated data modeling and publishing to aid enterprises in implementing a Streamlined Data Refinery architecture. "This popular design pattern orchestrates the process of preparing large, diverse, blended data sets for queries on-demand using Hadoop," the company said.
  • MongoDB Inc. enhanced its MongoDB Management Service (MMS) to make it easier to run the NoSQL database system. "MMS is now centered around the experience of deploying and managing MongoDB on the infrastructure of your choice," the company said. "You can now deploy a cluster through MMS and then monitor your deployment."
  • Rackspace Inc. released the OnMetal Cloud Big Data Platform "so customers can now deploy bare-metal instances of Apache Hadoop with Spark in just three clicks."

About the Author

David Ramel is an editor and writer for Converge360.