WatersWorks

Blog archive

Spring Hadoop Fits Neatly Under the Spring Data Umbrella 

VMware's recent announcement of an integration of its Spring Framework with Apache Hadoop is aimed at making life easier for enterprise Java developers who want to use the popular open-source platform for data-intensive distributed computing. The new Spring Hadoop is a lightweight framework that combines the capabilities of the Spring framework with Hadoop's ability to allow developers to build applications that scale from one server to thousands and deliver high availability through the software, rather than hardware.

By integrating the Hadoop Framework, a Java-based, open-source platform for the distributed processing of large data sets across clusters of computers using a simple programming model, with the Spring Java/J2EE application development framework, VMware has created a project that fits neatly under the Spring Data umbrella. The open-source Spring Data project comprises a group of sub-projects seeking to make it easier to develop apps that use a bunch of new data access technologies, such as non-relational databases, cloud-based data services and MapReduce frameworks like Hadoop.

In addition to Apache Hadoop, the list of Spring Data sub-projects includes, among others, the Spring Data JPA, which simplifies the development of Java Persistence API-based data access layers; VMware's GemFire distributed DB management platform; the Redis advanced key-store; and the MongoDB document-oriented database.

The new framework also supports comprehensive HDFS data access through such Java Virtual Machine (JVM) scripting languages as Groovy, JRuby, Jython and Rhino. HDFS (Hadoop Distributed File System) is designed to scale to petabytes of storage and to run on top of the file systems of the underlying OS.

The list of Spring Hadoop capabilities also includes: declarative configuration support for HBase; dedicated Spring Batch support for developing workflow solutions that incorporate HDFS operations and "all types of Hadoop jobs;" support for the use with Spring Integration "that provides easy access to a wide range of existing systems using an extensible event-driven pipes and filters architecture;" Hadoop configuration options and templating mechanism for client connections to Hadoop; and declarative and programmatic support for Hadoop Tools, including FsShell and DistCp.

Developer Costin Leau announced the integration on the SpringSource Community blog. "…Spring Hadoop stays true to the Spring philosophy offering a simplified programming model and addresses 'accidental complexity' caused by the infrastructure," he wrote. "Spring Hadoop, provides a powerful tool in the developer arsenal for dealing with big data volumes."

VMware has released Spring Hadoop under the open source Apache 2.0 license. It's available now as a free download.

Posted by John K. Waters on March 13, 2012