MapR Adds Streaming for New Converged Big Data Platform
- By David Ramel
- December 8, 2015
MapR Technologies Inc. today announced the upcoming addition of an event streaming system to its Apache Hadoop-based Big Data distribution, improving real-time analytics capabilities in a rebranded, integrated and converged solution.
That solution, rebranded as the MapR Converged Data Platform, is designed to serve as kind of a one-stop shop for Big Data analytics. It integrates a file system, NoSQL database, stream processing and analytics into one package that can work with data-in-motion and data-at-rest.
"As a result, MapR enables developers to create new, innovative applications that reduce data duplication and movement, lower the cost of integration and maintenance associated with multiple platforms, and accelerate business results," the company said.
The key addition to the new platform, MapR Streams, is a global publish-subscribe event streaming system tailored for Big Data analytics. In a publish-subscribe system, data "publishers" -- or producers -- supply data into the system to be consumed by data "subscribers" for analysis, with no need of a direct one-on-one connection. Once ingested into the system, the data is categorized by "topics" that can be consumed by interested subscribers. So the publishers and subscribers don't even have to know about one another, and the data can be streamed at different rates.
MapR Streams is much like the Apache Kafka project, described as "high-throughput, distributed, publish-subscribe messaging system." In fact, MapR Streams uses the same publish-subscribe API as the open source Kafka. MapR said its solution differs from Kafka because the company adds enterprise features such as global replication, security, multi-tenancy, high availability and disaster recovery supplied by the MapR Converged Data Platform.
The San Jose, Calif., company claims its converged platform is unique in the industry, being the first and only such solution that allows developers to:
- Easily build scalable, continuous high-throughput streams across thousands of locations with millions of topics and billions of messages.
- Unite analytics, transaction, and stream processing to reduce data duplication, latency and cluster sprawl while using existing open source projects like Spark Streaming, Apache Storm, Apache Flink, and Apache Apex.
- Enable reliable message delivery with auto-failover and order consistency.
- Ensure cross-site replication to build global real-time applications.
- Provide unlimited persistence of all messages in a stream.
The streaming technology opens up the platform for developers to apply to a variety of use cases across many industries, MapR said. The streaming data collected for analysis can come from almost any source, including Web applications, system and machine logs, social media sites, and connected sensors and devices. That data can in turn be used in a variety of ways. For example, in financial services applications, it can provide real-time fraud detection and real-time push notifications for mobile apps. Advertisers can apply it for real-time user targeting according to customer segments and user preferences. And telecommunications companies can apply it to optimize advertising for audio and video content depending on what the users are interested in, to name a few examples.
"MapR is at the forefront of designing solutions for data-centric businesses as they operate today and provides the best Big Data platform with a core architecture in place to successfully address modern data challenges," said MapR customer Michael Brown at comScore. "Our system analyzes over 65 billion new events a day, and MapR Streams is built to ingest and process these events in real time, opening the doors to a new level of product offerings for our customers."
Having released MapR 5.0 in June, the company will add MapR Streams to its MapR 5.1 distribution, scheduled for release early next year. It will also be part of the company's free offering, MapR Converged Community Edition. "We will also release a virtual machine sandbox with MapR Streams along with tutorials, sample code, and video demos to make getting started easy," the company said.
David Ramel is an editor and writer for Converge360.