Open Source B.I. Pentaho Integrates with Storm and YARN

Open source business intelligence and analytics company Pentaho Corporation last week announced native support of two Apache projects -- Storm and YARN -- on the company's data integration platform, Pentaho Data Integration (PDI).

The native support for the two projects aims to make it possible for developers to process big data analytics in real time. YARN, the next-gen Apache Hadoop data processing framework, allows Hadoop to be used as a flexible, multi-purpose data processing and analytics platform (free from the limitations of MapReduce). Storm, an Apache incubation project, is an open-source, distributed, real-time computation system for fast processing of large data streams. So-called "Storm-YARN" puts the computational resources of Storm apps in Hadoop clusters.

"This integration gets developers beyond the delays of batch processing," said Donna Prlich, Pentaho's senior director of product and solutions marketing.

The ability to analyze a lot of data from a range of disparate sources in real time is fast becoming an essential enterprise capability, Prlich told ADTmag. The advent of the Internet of Things -- for example, the triggering on a mobile device of a relevant offer for shoppers standing at a checkout line or monitoring sensor data from an HVAC system to optimize a building's temperatures -- as well as "the relentless evolution of the big data ecosystem," are adding to the pressure on enterprises.

The PDI integration came out of Pentaho Labs, a kind of internal incubator, which the company launched last year. The brainchild of company founders Richard Daley and James Dixon, the Labs are staffed with industry experts and their own data scientist. The first fruit of that project was the Adaptive Big Data Layer, which is designed to insulate Pentaho customers from dependencies of the different Hadoop distributions as they plug into popular big data stores. 

Pentaho appears to be on the right track with this integration, says 451 Research analyst Matt Aslett. YARN is behind growing interest in Hadoop as more than just a platform for batch-based MapReduce, he points out, but also for "rapid data ingestion and analysis," especially when it's used with Apache Storm.

"Native support of Storm and YARN from companies like Pentaho will encourage users to innovate and drive greater value from Hadoop," he said in a statement.

Pentaho made the announcement last week at the annual Strata Conference in Santa Clara, Calif. The theme of this year's event was "Making Data Work."

About the Author

John K. Waters is the editor in chief of a number of sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].