MapR 5.0 Heads Hadoop Summit News

Like the elephant in the room that crashed the party, MapR Technologies Inc. unveiled version 5.0 of its Apache Hadoop-based Big Data analytics distribution among a barrage of MapR news announcements at the ongoing Hadoop Summit, hosted by its archrival Hortonworks Inc. in conjunction with Yahoo.

"The latest MapR release auto synchronizes storage, database and search indices to support complex, real-time applications to increase revenue, reduce operational costs and mitigate risk," the company said in one of four news releases issues since Monday. "MapR 5.0 also includes comprehensive security auditing, Apache Drill support, and the latest Hadoop 2.7 and YARN features."

Immediacy is a continuing theme in the product's positioning, with the company noting that the MapR Distribution including Apache Hadoop version 5.0 (MapR 5.0) enables new types of "real-time" applications and enables "as-it-happens" business where analytics insights are acted upon in a short "data-to-action" cycle.

The company also emphasized that its new offering was designed for conducting Big Data analytics on a single platform, noting that it has found companies are more often deploying multiple applications -- with some customers tallying more than 50 -- on a single cluster.

How MapR Fits In
[Click on image for larger view.] How MapR Fits In (source: MapR Technologies Inc.)

In contrast to Hortonworks, which touts its distribution as 100 percent open source Apache Hadoop-based software, MapR adds proprietary components and functionality in its enterprise-grade platform, such as the MapR File System (MapR-FS) that replaces the core Apache Hadoop component, Hadoop Distributed File System (HDFS).

Another such proprietary component is the MapR-DB, an in-Hadoop NoSQL database that has been enhanced in MapR 5.0. The company said the new distribution "extends the MapR real-time, reliable data transport framework, used in the MapR-DB Table Replication capability, to deliver and synchronize data in real time to external compute engines. The first supported external compute engine is Elasticsearch to enable synchronized full-text search indexes automatically without writing custom code."

MapR also highlighted new security and governance capabilities in addition to its existing authentication and authorization functionality. For example, MapR 5.0 allows for the comprehensive auditing of all data access actions in JSON-formatted log files, which facilitates reporting, validation and fast analysis with the new Apache Drill support. The Drill project also provides Drill Views, which limits secure access to file data to authorized users only.

The San Jose, Calif.-based MapR just a few weeks ago announced the general availability of Apache Drill 1.0 and its inclusion in its Hadoop distribution. MapR is heading the development of the open source Drill project, which is a low-latency query engine based on ANSI SQL standards that facilitates self-service, interactive analytics at Big Data scales.

By adding support for Hadoop 2.7 and YARN 2.7, the company said its distribution gives users even more new capabilities, such as performing YARN "application rolling upgrades" in addition to the existing MapR "platform-level rolling upgrades."

"Designed as a large-scale batch data analysis system, Hadoop is not often associated with operational analytics or transaction processing," the company quoted IDC analyst Carl W. Olofson as saying. "Hadoop adds tremendous value for decision management at the strategic and operational levels, but still is emerging as a framework for making tactical decisions 'in the moment.' With Hadoop innovations, such as those in MapR 5.0, happening every day, enterprises should consider using Hadoop as a 'Decision Data Platform' that functions as a single platform for handling both live operational data and real-time analytics."

MapR said the new distribution will be available in 30 days.

In other MapR news, the company today announced it was working with Microsoft to offer its distribution on the Microsoft Azure cloud. It also announced new auto-provisioning templates to speed up cluster deployment. Finally, it announced a bunch of partners have embraced the new distribution to extend their own offerings, including Centrify, Dataguise, Datameer, HP Security Voltage, Informatica, Protegrity, Syncsort, Talend, Teradata, Waterline Data and Zaloni.

MapR was also scheduled to conduct seven presentations at the three-day summit in San Jose, which ends tomorrow.

About the Author

David Ramel is an editor and writer for Converge360.