Apache Hadoop Community Promotes YARN -- But Don't Call it MapReduce 2 -- ADTmag

WatersWorks

By John K. Waters

Apache Hadoop Community Promotes YARN -- But Don't Call it MapReduce 2

The Hadoop community recently promoted YARN -- the next-gen Hadoop data processing framework -- to the status of "sub-project" of the Apache Hadoop Top Level Project. The promotion puts YARN on the same level as Hadoop Common, the Hadoop Distributed File System, and MapReduce. It had been part of the MapReduce project; the promotion means it'll now get the spotlight and developer attention its proponents believe it deserves.

"We now have consent from the community to separate YARN from MapReduce," says Arun C. Murthy. "Which is as it should be. YARN is not another generation of MapReduce, and I really don't like the 'MapReduce 2.0' label. This is a different paradigm. This is much more general and much more interesting."

Murthy ought to know: he's has been a full-time contributor to the Hadoop MapReduce project since it got off the ground at Yahoo in early 2006. Back then, he and fellow Yahoo software engineer Own O'Malley set a world data-sorting record (http://sortbenchmark.org/) using Map-Reduce: a terabyte in 60 seconds. Today, Murthy is a member of the Apache Hadoop Project Management Committee and a co-founder of Hortonworks, one of the chief providers of commercial support and services for Hadoop.

And he's been working on YARN full-time for about two and a half years.

"We knew that we were going to have to take Hadoop beyond MapReduce," Murthy says. "The programming model—the MapReduce algorithm—was limited. It can't support the very wide variety of use-cases we're now seeing for Hadoop. YARN turns Hadoop into a generic resource-management-and-distributed-application framework that lets you implement multiple customized apps. I expect to see MPI, graph-processing, simple services, all co-existing with MapReduce applications in a Hadoop YARN cluster. You can even run MapReduce now as an application for YARN."

Hadoop, of course, is the open-source framework for running applications on large data clusters built on commodity hardware (let's just say it: Big Data). I sometimes forget that Hadoop is actually a combination of two technologies: Google's MapReduce and HDFS. MapReduce is a programming model for processing the large data sets that supports parallel computations on so-called unreliable clusters. HDFS is the storage component designed to scale to petabytes and run on top of the file systems of the underlying operating systems.

What Murthy and others are hoping to do is redefine Hadoop from "HDFS-plus-MapReduce" to "HDFS-plus-YARN."

"The users can now look at Hadoop as a much more general-purpose system," Murthy says. "And from a developer perspective, we've opened up Hadoop itself to the point where now anyone can implement their own applications without having to worry about the nitty-gritty details of how you manage resources in a cluster and what you do for fault tolerance. [Promoting it] will also help us get more users and more developers to build an ecosystem around YARN. I guarantee you that next year at this time, we will be looking at four or five ways of doing real-time processing on Hadoop."

And I had to ask: What does YARN stand for?

"We were sitting around at lunch one day, trying to come up with the most inane names for our product," Murthy confessed to me. "The result was 'Yet Another Resource Negotiator—YARN.' I know: it's a really bad name."

But really promising technology.

Hortonworks is in the process of publishing a still-unfolding series of blogs by Murthy and Hortonworks' product marketing director Jim Walker on the subject of YARN and its implications for Hadoop. And there's a new collaboration mailing list ([email protected]) for those who want to get involved in the project.

Posted by John K. Waters on August 15, 2012

Featured

AppTrends

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

Visual Studio Live! @ Microsoft HQ
July 27-31, 2026

Visual Studio Live! @ San Diego
September 14-18, 2026

The AI Pivot
September 25, 2026

Live! 360 6-Week Training & Certification Course: Mastering the Microsoft AI Framework: Building Enterprise-Ready AI Agents with Microsoft Foundry
October 6–November 10, 2026

VSLive! 6-Week Training & Certification Course: Blazor Developer Accelerator: Hands-On Skills for Real-World .NET Teams
October 7 – November 11, 2026

Live! 360 Orlando
November 15-20, 2026

Artificial Intelligence Live! Orlando
November 15-20, 2026

AI Enterprise Architecture Live! Orlando
November 15-20, 2026

Cybersecurity & Ransomware Live! Orlando
November 15-20, 2026

Data Platform Live! Orlando
November 15-20, 2026

Visual Studio Live! Orlando
November 15-20, 2026

Live! 360 2-Day Hands-On Seminar: AI-Powered .NET Development with Claude & Claude Code
December 8-9, 2026

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training with CoPilot: 4-Day Hands-On Experience
December 15-18, 2026

Visual Studio Live! Las Vegas
March 22-26, 2027

Visual Studio Live! @ Microsoft HQ
August 2-6, 2027

Free White Papers

More Tech Library

WatersWorks

By John K. Waters