Open Source Project 'Weave' Aids Java Devs Using YARN

Continuuity announced today the public availability of its new Apache Hadoop data processing framework Weave product, a next-gen Apache Hadoop data processing framework.

Weave is a framework designed to ease the process of writing distributed applications by providing developers with a set of interfaces that allow them to manage resources, nodes and jobs within those apps through an abstraction layer built on YARN.

"If I'm an application developer, if I have an idea for a new algorithm or visualization, or I want to build a content targeting system or advertising analytics, and I want to build it on top of Hadoop, I've got like six months of learning to do just to stand it up and understand how it works," said Jonathan Gray, CTO and co-founder of Continuuity. "So there's a big gap between the kind of APIs that are exposed out of the infrastructure of this very distributed system and what a traditional Java developer who has been building J2EE applications is familiar with. That's the gap we're trying to fill with Weave."

YARN (sometimes called MapReduce 2.0) made a splash last year when the Hadoop community promoted it to the status of "sub-project" of the Apache Hadoop Top Level Project. The promotion moved YARN to the same level as Hadoop Common, the Hadoop Distributed File System, and MapReduce.

"We knew that we were going to have to take Hadoop beyond MapReduce," Arun C. Murthy, co-founder of HortonWorks and major YARN contributor, told ADTmag last year when the promotion was announced. "I expect to see MPI, graph-processing, simple services, all co-existing with MapReduce applications in a Hadoop YARN cluster. You can even run MapReduce now as an application for YARN."

In a statement accompanying the Weave announcement Murthy said, "As we developed YARN and decided to make it generic to support a wide set of use cases, we understood that there would be the need for something like Weave…. We are thrilled that Continuuity has decided to help make YARN even easier to use, and that they have shared their efforts with the community."

Gray said his company is currently working with HortonWorks, Cloudera, and others in the Hadoop community to determine the best way to integrate Weave into the Apache ecosystem. Although the company implemented Weave initially on YARN, Gray said it has the potential to be used in the future with different distributed resource managers.

Weave is designed to make developing Java-based YARN apps as simple as running threads on a local Java Virtual Machine (JVM), Gray explained. The framework provides Java developers using YARN with increased usability and functionality, which will broaden the set of applications and patterns that YARN can support, he said. It comes with standard support for application lifecycle management, communication among containers and the Application Master, and handling application level errors. There's a simplified API for specifying, running, and managing apps built on YARN; a generic Application Master; log and metrics aggregation for applications; simplified archive management; and a discovery service, among other features.

Weave is available now on GitHub under the Apache 2.0 License.

Gray will be a featured speaker on the panel "Apache HBase Futures" at HBaseCon in San Francisco on Thursday. Continuuity's Andreas Neumann and Alex Baranau will be also speaking in a session titled, "High-Throughput, Transactional Stream Processing on Apache HBase."

About the Author

John K. Waters is the editor in chief of a number of sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].