After a Big Data Scaling 'Crisis,' LinkedIn Open Sources Scale Testing Tool -- ADTmag

After a Big Data Scaling 'Crisis,' LinkedIn Open Sources Scale Testing Tool

By David Ramel
February 9, 2018

LinkedIn engineers have open sourced a homegrown Big Data scale testing tool following a "crisis" they experienced after adding 500 machines to a Hadoop Distributed File System (HDFS) cluster, resulting in disastrous slowdowns.

The company attributed the problem to applying changes from the Apache Hadoop open source community, from which even official releases aren't always subject to performance and scale testing.

Called Dynamometer, the tool addresses that testing void, allowing for performance/scale testing without the traditional cost and hassle associated with spinning up huge systems and clusters to assure real-world functionality doesn't suffer.

And suffer it did, when LinkedIn added 500 machines to its HDFS cluster in March 2015. That caused operations performed against the team's primary HDFS cluster to take up to two orders of magnitude longer than normal -- if they didn't time out completely.

"By the time the issue was detected, it was non-trivial to remove the new machines, as they had already accumulated a significant amount of data that would need to be copied off before they could be removed from service," the team said in a blog post yesterday. "Instead, we worked quickly to solve the specific issue causing the performance regression ... and subsequently began to make plans for how to protect ourselves from similar mishaps in the future."

That protection is now being shared with the open source community via Dynamometer, "a framework that allows us to realistically emulate the performance characteristics of an HDFS cluster with thousands of nodes using less than 5 percent of the hardware needed in production."

While expressing surprise that open source Big Data releases aren't tested for scaling functionality -- which, after all, is the core tenet of Big Data to begin with -- LinkedIn said the testing deficit makes sense considering:

Scale testing is expensive -- the only way to ensure that something will run on a multi-thousand node cluster is to run it on a cluster with thousands of nodes.
HDFS is maintained by a distributed developer community, and many developers do not have access to large clusters.
Developers are aware that Apache community releases (as opposed to distributions like CDH and HDP) are not typically run in production.

The code for Dynamometer is on GitHub, where the project's three main components were described:

Infrastructure: This is the YARN application which starts a Dyno-HDFS cluster.
Workload: This is the MapReduce job which replays audit logs.
Block Generator: This is a MapReduce job used to generate input files for each Dyno-DN; its execution is a prerequisite step to running the infrastructure application.

The site says: "Dynamometer is a tool to performance test Hadoop's HDFS NameNode. The intent is to provide a real-world environment by initializing the NameNode against a production file system image and replaying a production workload collected via e.g. the NameNode's audit logs. This allows for replaying a workload which is not only similar in characteristic to that experienced in production, but actually identical."

LinkedIn has already enjoyed multiple benefits from the tool, it said, such as making tweaks to the configuration of an upgrade from Hadoop 2.3 to Hadoop 2.6. The tool tipped off engineers to adjust JVM heap size and garbage collection tuning parameters, "potentially avoiding a disaster upon upgrading."

About the Author

David Ramel is an editor and writer at Converge 360.

Featured

AppTrends

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

Visual Studio Live! @ Microsoft HQ
July 27-31, 2026

Visual Studio Live! @ San Diego
September 14-18, 2026

The AI Pivot
September 25, 2026

Live! 360 6-Week Training & Certification Course: Mastering the Microsoft AI Framework: Building Enterprise-Ready AI Agents with Microsoft Foundry
October 6–November 10, 2026

VSLive! 6-Week Training & Certification Course: Blazor Developer Accelerator: Hands-On Skills for Real-World .NET Teams
October 7 – November 11, 2026

Live! 360 Orlando
November 15-20, 2026

Artificial Intelligence Live! Orlando
November 15-20, 2026

AI Enterprise Architecture Live! Orlando
November 15-20, 2026

Cybersecurity & Ransomware Live! Orlando
November 15-20, 2026

Data Platform Live! Orlando
November 15-20, 2026

Visual Studio Live! Orlando
November 15-20, 2026

Live! 360 2-Day Hands-On Seminar: AI-Powered .NET Development with Claude & Claude Code
December 8-9, 2026

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training with CoPilot: 4-Day Hands-On Experience
December 15-18, 2026

Visual Studio Live! Las Vegas
March 22-26, 2027

Visual Studio Live! @ Microsoft HQ
August 2-6, 2027

Free White Papers

More Tech Library