Hortonworks DataFlow Taps More Apache Tech for Streaming Analytics

Hortonworks Inc. today announced its Big Data streaming analytics tool, DataFlow, is now at version 2.0, sporting the integration of several more technology projects under the Apache Software Foundation umbrella.

Hortonworks DataFlow (HDF) version 2.0 is described as "an integrated system for dataflow management and streaming analytics to quickly collect, curate, analyze and deliver insights in real-time, on-premises or in the cloud."

Streaming analytics is a growing segment of the Apache Hadoop-based ecosystem, as enterprises move on from the original batch-processing focus to glean more real-time business insights from data emanating from the growing Internet of Things (IoT) and other sources.

"We have seen significant HDF adoption across our customer base," said Hortonworks exec Jamie Engesser in a statement today. "Customers clearly want an integrated real-time solution for streaming data and the new functionality of HDF 2.0 accelerates business value from data-in-motion for customers."

Citing the integration of several other Apache projects in the updated DataFlow, Hortonworks listed the following highlights of the new release:

  • Next-generation user experience: New graphical user experience and integration of Apache NiFi, Apache Kafka and Apache Storm into Apache Ambari for accelerated deployment and real-time operations.
  • Enterprise readiness: Integration with Apache Ranger for centralized and comprehensive security policy management of streaming dataflows across groups in an enterprise.
  • IoT at the edge: Apache MiNiFi, a new lightweight edition of Apache NiFi providing data collection at the edge with enterprise-grade security and management at scale.

To get a better understanding of exactly what those Apache projects bring to the Big Data analytics table, here are brief descriptions of each, from the sources themselves and from Hortonworks:

  • Apache NiFi is for dynamic, configurable data pipelines, through which all sources, systems and destinations communicate.
  • The Apache Ambari project is aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters.
  • Apache Kafka is for high throughput distributed messaging with pub sub semantics to operate at speed on Big Data volumes that adapt to differing rates of data creation and delivery. It's commonly used for ingesting data into analytics frameworks.
  • Apache Storm is for real-time streaming analytics to create immediate insights at massive scale, with performance that is 6-10X faster than any previous Storm release.
  • Apache Ranger is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform.

In a blog post today, Hortonworks outlined the "Three Things To Know About HDF 2.0," written by Haimo Liu and Kanishk Mahajan.

That post highlighted the following streaming analytics features of HDF 2.0:

  • Storm windowing and state management.
  • Improved Storm topology debugging including Dynamic Worker Profiling, Topology Event Inspector, Dynamic Log Levels and Distributed Log Search.
  • Improved Kafka SASL and Kafka Automated Replica Leader Election.
  • Improved Storm scalability with Pacemaker Daemon, Resource Aware Scheduling and Improved Nimbus HA.

Along with the Apache project integration ("an integrated ecosystem of Apache NiFi, Kafka and Storm"), the remainder of the "three things to know" include the aforementioned enterprise readiness via Ambari and Ranger and "extending the reach towards the edge, with support for Apache MiNiFi."

"Our business depends on actionable insights from combining data-at-rest with data-in-motion at scale," the company quoted exec Mike Bishop at customer Prescient as saying. "Prescient pulls information from 49,000+ sources to determine which physical, health and environmental threat factors are most relevant to the business continuity and personal safety of specific travelers. Connected Data Platforms allow us to create value for our customers and are powering our real-time business of keeping travelers safe."

About the Author

David Ramel is an editor and writer for Converge360.