WatersWorks

Blog archive

Cascading: Open Source Java App Framework for Big Data

Enterprise interest in Big Data and associated analytics software has sparked intense interest in Apache Hadoop, the open source framework for running applications on large data clusters built on commodity hardware, and something of a flood of tools for developers working with it. But as an applications market emerges in this space, the next Big Thing for Big Data is likely to be app-oriented middleware.

That's an insight Tony Baer, principal analyst at Ovum, shared with me when I talked with him recently about Continuuity's recent Reactor 2.0 release, which the Java toolmaker billed as the first scale-out application server for Apache Hadoop.

"It is inevitable that applications will be developed that run against Big Data," Baer said, "and as that occurs, it will be necessary to have an application layer that allows developers with Java and other languages to develop apps that run against it."

Baer's prediction makes perfect sense, and it's one reason Java jocks might want to keep an eye on Concurrent, the company behind the open source Cascading project. Cascading is a Java application development framework for rich data analytics and data management apps running across "a variety of computing environments," with an emphasis on Hadoop and API compatible distributions.

"Big Data is moving to the next phase of maturity and it's all about the applications," the company says on its Web site. "The applications process the data and extract the value at scale and we believe that there must be a simple, reliable and consistent way to build, deploy, run and manage these data driven applications."

Great minds.

Concurrent characterizes Cascading as "a rich Java API for defining complex data flows and creating sophisticated data oriented frameworks," and it claims more than 110,000 user downloads a month. Its published user list includes Twitter, eBay, Square and Etsy, among others.

The San Francisco-based company recently announced Cascading 2.5 with new support for Hadoop 2 and YARN, the next-gen Hadoop data processing framework (sometimes called MapReduce 2.0).

Chris Wensel, Concurrent's founder and CTO, has argued that developing and building applications on Hadoop has proven to be difficult, despite the framework's rapid enterprise adoption. "With Hadoop 2, the community has addressed many concerns, paving a clearer path for enterprise users," he said in a statement. "At Concurrent, we're dedicated to forging a simpler path to mass Hadoop adoption by delivering a framework for building powerful and reliable data-oriented applications supporting data driven business models -- quickly and easily. Our support for Hadoop 2 was an easy decision, as we continue to be an integral part of the Hadoop and Big Data ecosystem, providing solutions that simplify application development and management for the enterprise."

As a Java-based framework, Cascading fits naturally into JVM-based languages, including Scala, Clojure, JRuby, Jython and Groovy. And the Cascading community has created scripting and query languages for many of these languages. The company's extensions page offers a growing list of user contributed code.

Cascading 2.5 is publicly available and freely licensable under the Apache 2.0 License Agreement.

Posted by John K. Waters on December 4, 2013