Spark Poised To Break from Hadoop, Move to Cloud, Survey Says

The popular Apache Spark project is poised to break from the Hadoop ecosystem as an independent data processing tool, and it may shift from on-premises installations to the cloud, according to new research.

That research is a new survey of nearly 7,000 techies and managers in the Big Data space, conducted by The Taneja Group Inc. for Cloudera Inc., a major provider of a Hadoop/Spark-based data processing distribution. The new "Apache Spark Market Survey," like other such polls, shows Spark is still red-hot and enjoying enterprise growth that's mainly hindered by a persistent bugaboo: a lack of trained workers skilled in the technology.

"We found that across the broad range of industries, company sizes and Big Data maturities represented in the survey, over one-half (54 percent) of respondents are already actively using Spark," Taneja said in an executive summary of the survey report. "Spark is proving invaluable as 64 percent of those currently using Spark plan to notably increase their usage within the next 12 months. And new Spark user adoption is clearly growing -- 4 out of 10 of those who are already familiar with Spark but not yet using it plan to deploy Spark soon."

The Spark momentum is so great that the technology -- originally positioned as a replacement for MapReduce with added real-time capabilities and in-memory processing -- could break free from the reins of the Hadoop universe and become its own independent tool.

Top Spark Use Cases
Top Spark Use Cases (source: The Taneja Group)

"Apache Spark has quickly grown into one of the major Big Data ecosystem projects and shows no signs of slowing down," Taneja said. "In fact, even though Spark is well connected within the broader Hadoop ecosystem, Spark adoption by itself has enough energy and momentum that it may very well become the center of its own emerging market category."

The one thing that might impede such a move -- and which continues to hinder Spark and Big Data adoption in general -- is the perennial Big Data skills shortage, cited again and again in such surveys, year after year.

"The biggest challenge with Spark, similar to what has been previously noted across the broader Big Data solutions space, is still reported by 6 out of 10 active users to be the Big Data skills/training gap within their organizations," Taneja said.

The research firm also expounded on the possible shift to cloud-based implementations. "Interestingly, while on-premise Spark deployments dominate today (more than 50 percent), there is a strong interest in transitioning many of those to cloud deployments going forward," Taneja said. "Overall Spark deployment in public/private cloud (IaaS or PaaS) is projected to increase significantly from 23 percent today to 36 percent, along with a corresponding increase in using Spark SaaS, from 3 percent to 9 percent."

Other survey findings include:

  • After the skills shortage, other major barriers to Spark adoption are complexity in learning/integrating Spark, cited by more than one-third of respondents, and the ability to consume relevant training in a variety of formats (online, in-person, conference or tradeshow).
  • Top use cases for spark include data processing/ETL (cited by 55 percent of respondents), real-time processing (more than 40 percent), data science (more than 30 percent) and machine learning (more than 30 percent).
  • 71 percent of respondents are employing Spark for data science.

"Overall, it's clear that Spark has gained broad familiarity within the big data world and built significant momentum around adoption and deployment," Taneja concluded. "The data highlights widespread current user success with Spark, validation of its reliability and usefulness to those who are considering adoption, and a growing set of use cases to which Spark can be successfully applied.

"Other Big Data solutions can offer some similar and overlapping capabilities (there is always something new just around the corner), but we believe that Spark, having already captured significant mindshare and proven real-world value, will continue to successfully expand on its own vortex of focus and energy for at least the next few years."

About the Author

David Ramel is an editor and writer for Converge360.