LinkedIn Contributes Build Tool for Hadoop

LinkedIn Corp. is joining the parade of major companies donating their homegrown Big Data tools to the open source community, offering up a Hadoop plugin for working with the Gradle build system.

The business-oriented social network said its in-house LinkedIn Gradle Plugin for Apache Hadoop, now available on GitHub, helps rein in the dozens of individual Hadoop jobs using multiple frameworks that the company often found itself using in the development of data-driven applications.

"A couple of years ago, LinkedIn adopted Gradle as our primary build system," engineer Alex Bain wrote in a blog post last week. "With Gradle, developers can easily extend the build system by defining their own plugins. We developed the Hadoop plugin to help our Hadoop application developers more effectively build, test and deploy Hadoop applications. The plugin includes the Hadoop DSL, a domain-specific language for specifying jobs and workflows for Hadoop workflow managers like Azkaban and Apache Oozie."

Written in the Groovy programming language, the accompanying embedded DSL features syntactic code constructs for working with jobs and workflows for Hadoop managers.

"Since it's an embedded Groovy DSL, you can use Groovy (or Java) anywhere throughout the DSL," Bain said. "Using the DSL shields you from some of the painful details of creating Azkaban or Oozie workflow files. The DSL is statically compiled into job and workflow files at build time. Since it's statically compiled, it can be statically checked. The static checker will catch a number of common problems with your workflow files at build time, rather than running your Hadoop workflow only to have it to error out hours later."

Bain said the company welcomes contributions to the project, whether in the form of pull requests, bug reports, documentation improvements or any other kind of idea or feedback. Documentation and examples are available here.

"The Hadoop Plugin and Hadoop DSL have been embraced as the standard way to develop Hadoop workflows at LinkedIn," Bain said. "If you are writing Hadoop jobs using Gradle as your build system, you should definitely consider using the Hadoop Plugin. It will save you time and energy in developing your Hadoop workflows.

About the Author

David Ramel is an editor and writer for Converge360.