Spring for Apache Hadoop Goes Live

SpringSource on Tuesday announced the general availability release of Spring for Apache Hadoop, which integrates the Hadoop framework for data-intensive distributed computing with the Spring Java/J2EE application development framework.

Developer Costin Leau announced the 1.0 release on the SpringSource Community blog almost exactly one year after he announced the first milestone release. The combination of the two frameworks, he wrote, "provides a consistent programming and configuration model across a wide range of Hadoop ecosystem projects: rather than dictating what to use, the framework embraces and enhances your technology stack, staying true to the core Spring principles."

Spring is one of the most popular Java app frameworks on the market today. It's an open-source, layered Java/J2EE framework based on code published in SpringSource founder Rod Johnson's book Expert One-on-One Java EE Design and Development (Wrox Press, October 2002). Although SpringSource has been a Java-focused operation, the framework has been ported to .NET.

Hadoop, also open-source, is a Java-based platform for the distributed processing of large data sets across clusters of computers using a simple programming model. Combining the two frameworks allows enterprise Java developers to build applications that scale from one server to thousands, and to deliver high availability through the software, rather than hardware.

The combined frameworks also support comprehensive HDFS data access through such Java Virtual Machine (JVM) scripting languages as Groovy, JRuby, Jython and Rhino. HDFS (Hadoop Distributed File System) is designed to scale to petabytes of storage and to run on top of the file systems of the underlying OS.

The list of Spring Apache Hadoop capabilities also includes: declarative configuration support for HBase; dedicated Spring Batch support for developing workflow solutions that incorporate HDFS operations and "all types of Hadoop jobs;" support for the use with Spring Integration "that provides easy access to a wide range of existing systems using an extensible event-driven pipes and filters architecture;" Hadoop configuration options and templating mechanism for client connections to Hadoop; and declarative and programmatic support for Hadoop Tools, including FsShell and DistCp.

SpringSource has provided sample applications (already compiled and ready for download) that cover a variety of scenarios, Leau said. The samples "complement the comprehensive user documentation," which includes a section on how to get stared with Spring for Apache Hadoop using Amazon's Elastic MapReduce service.

Spring for Apache Hadoop is provided out of the box in the distro from the Greenplum big-data analytics group, recently spun off from parent company EMC to the Pivotal Initiative. And Leau said it is being tested daily against various Hadoop 1.x distros, including vanilla Apache Hadoop, Cloudera CDH3 and CDH4.

"We want to make sure [it] works reliably no matter your Hadoop environment," he said.

Spring for Apache Hadoop has been released under the open source Apache 2.0 license. It's available now as a free download.

About the Author

John K. Waters is the editor in chief of a number of sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].