Spark Joins Storm in StreamAnalytix Big Data Tool

Impetus Technologies Inc. today said it added Apache Spark support to its StreamAnalytix tool to complement existing functionality provided by the Apache Storm project for distributed, real-time streaming analytics.

StreamAnalytix was originally built around Storm, a free and open source distributed real-time computation system, but in version 2.0 -- available by the end of next month -- the platform was tweaked to make its features available to other streaming engines, starting with Spark. That was done by adding an abstraction level above the open source engine to "future proof" the tool by letting developers adopt evolving technology options such as the wildly popular Spark, said to be the most active open source Big Data project.

"Among stream processing engines, Spark Streaming is gaining popularity, while Apache Storm has been in production deployments for many years and is a robust, proven, widely used option," Impetus said in a statement today. "StreamAnalytix 2.0 builds on its existing visual integrated development and application-monitoring environment to provide abstraction over multiple streaming engines. It can also accommodate newer engines as they gain market acceptance. This approach allows developers and data analysts to use drag-and-drop operators to create real-time analytics applications by choosing the most optimal engine for each use case."

[Click on image for larger view.] StreamAnalytix at a Glance (source: Impetus Technologies)

Impetus exec Anand Venugopal expounded more on those use cases, derived from working with enterprise customers.

"Specifically, some required the low-latency, event-level processing that only Apache Storm could address," Venugopal said. "In other cases, the micro-batch architecture of Apache Spark Streaming with its leverage of Spark SQL and MLlib for machine learning was the best fit. We are excited about offering a platform that now simplifies those tradeoffs by incorporating both technologies under one easy and uniform user experience."

Spark Streaming and other real-time analytics tools are gaining prominence as vast amounts of data are becoming available with the growth of the Internet of Things (IoT) and the myriad devices, sensors and other connected "things" that provide instant telemetry. Tools such as Storm and Spark improve on the original batch-processing capabilities of the Apache Hadoop ecosystem in order to provide instant analysis of the constantly changing data streams.

To accommodate this new wave of data, Impetus said enterprises previously had two sub-optimal options: use commercial products that might be expensive or proprietary; or resort to "do-it-yourself" development using raw open source code. The company said its solution provides the best of both worlds.

In addition to Spark Streaming support, Impetus listed the following enhancements to StreamAnalytix 2.0:
  • Ability to interconnect subsystems, which individually use different streaming engines.
  • Embedded complex event processing engine enhanced for high-availability support.
  • Built-in operators for predictive models including inline model-test feature.
  • Additional support for industry standard message queue systems, including Amazon Kinesis and Simple Storage Service (S3), Apache ActiveMQ, IBM MQ and TIBCO.
  • Enhanced self-service, real-time dash-boarding with editable widgets for various chart types.
  • Multi-tenancy controls with the ability to restrict resources for specific tenants and pipelines.
  • Ability to create multiple versions of real-time pipelines and choose the active version.
  • Rich array of real-time data processing functions for string, time, date, numeric and other data types.
  • Code-free enrichment and blending of streaming data with static data with lookups and MVEL expressions.
  • Extensibility of stream-processing operators and libraries with user-defined functions.

"Along with the early success of customers deploying real-time analytics solutions based on streaming data, we are seeing many new proprietary and open-source based, real-time streaming solutions hit the market," the company quoted Les Yeamans, founder of, as saying. "A solution like Impetus' StreamAnalytix 2.0, which is architected to provide a level of abstraction that allows for deployment of multiple streaming engines depending on the use-case requirements, affords customers a new level of 'best-of-breed' flexibility in their real-time architecture."

About the Author

David Ramel is an editor and writer for Converge360.