New MapR Distribution Aims at Real-Time Big Data Analysis

MapR Technologies Inc. today released a new version of its Hadoop-based distribution with expanded real-time Big Data analytics capabilities.

The MapR Distribution for Apache Hadoop 4.0.1 is now generally available with several updated components, including a new version of Apache Drill, a SQL-on-Hadoop open source project headed by the company. It also includes new versions of other open source components such as Apache Spark, which is a data analytics cluster computing framework added to the package in April, and Apache HBase, a non-relational distributed database.

Along with other enhancements, these new components expand the distribution's real-time analytics functionality, providing better interactive query and stream processing capabilities, the company said.

MapR is commonly referred to as one of the top three enterprise Hadoop distributors, along with Cloudera Inc. and Hortonworks Inc. While those competitors focus on more pure-play open source technologies bundled with enterprise services and support, MapR has been described as taking a more proprietary approach in improving components of the Hadoop ecosystem for its distribution, which comprises more than a dozen Hadoop community projects.

While being active in many open source projects, MapR has taken the lead on development of Drill, an Apache incubator project last week released in beta as version 0.5.0. It features an SQL query engine for working with large-scale data sets of varying types, including files, NoSQL databases and more complex types of data such as JSON and Parquet.

The MapR platform.
[Click on image for larger view.] The MapR platform. (source: MapR Technologies Inc.)

"The vision and innovation that the Apache Drill community has brought to the marketplace heralds a new era of data exploration," said MapR CEO John Schroeder in a statement. "The agility to directly query self-describing data and the flexibility to process complex data types push the envelope in Big Data analysis and insight. We are extremely excited by the potential of Drill to transform data-driven companies."

Drill is an open source effort inspired by Google's Dremel system, offered as a RESTful Web service named Google BigQuery for interactive analysis of large data sets. The updated Drill version joins four other SQL-on-Hadoop technologies bundled with MapR's distribution, including Apache Hive, SparkSQL, Cloudera's Impala and "certified integration" with HP-Vertica.

On the NoSQL side of things, the distribution includes HBase and the company's own MapR-DB database. Other components include the machine-learning and graph libraries Mahout, MLLib and GraphX. The distribution also offers a choice of batch processing frameworks, including the updated MapReduce 2.x, based on the greatly enhanced YARN technology that addressed widespread criticism of the original MapReduce, which is still available as an option, along with Spark.

"Hadoop is generating increasing interest for use in business-critical applications that rely upon low latency and consistency," the company quoted RedMonk analyst Donnie Berkholz as saying. "This latest release from MapR targets these needs with improved support for real-time processing through the incorporation of new tools as well as new releases of existing ones."

MapR said its distribution includes unique features such as backward compatibility; heterogeneous processing of older and newer MapReduce applications on the same set of nodes; advanced multi-tenancy capabilities that can isolate and protect data on specific nodes; fine-grained resource management; comprehensive wire-level security that now encompasses YARN apps; a no-NameNode architecture; snapshots for point-in-time consistency; and more.

"Organizations can now maximize business impact and minimize risk by running operational applications along with real-time analytics on Hadoop," said MapR exec Tomer Shiran in a statement. "MapR continues to lead the market by integrating the latest open source components into a mission critical, multi-tenant platform with self-healing high availability, disaster recovery, and data protection capabilities."

The MapR distribution comes in three editions, ranging from the standard M3 edition to the M5 enterprise edition and the M7 enterprise database edition for Hadoop.

About the Author

David Ramel is an editor and writer for Converge360.