After a Big Data Scaling 'Crisis,' LinkedIn Open Sources Scale Testing Tool -- ADTmag

After a Big Data Scaling 'Crisis,' LinkedIn Open Sources Scale Testing Tool

By David Ramel
February 9, 2018

LinkedIn engineers have open sourced a homegrown Big Data scale testing tool following a "crisis" they experienced after adding 500 machines to a Hadoop Distributed File System (HDFS) cluster, resulting in disastrous slowdowns.

The company attributed the problem to applying changes from the Apache Hadoop open source community, from which even official releases aren't always subject to performance and scale testing.

Called Dynamometer, the tool addresses that testing void, allowing for performance/scale testing without the traditional cost and hassle associated with spinning up huge systems and clusters to assure real-world functionality doesn't suffer.

And suffer it did, when LinkedIn added 500 machines to its HDFS cluster in March 2015. That caused operations performed against the team's primary HDFS cluster to take up to two orders of magnitude longer than normal -- if they didn't time out completely.

"By the time the issue was detected, it was non-trivial to remove the new machines, as they had already accumulated a significant amount of data that would need to be copied off before they could be removed from service," the team said in a blog post yesterday. "Instead, we worked quickly to solve the specific issue causing the performance regression ... and subsequently began to make plans for how to protect ourselves from similar mishaps in the future."

That protection is now being shared with the open source community via Dynamometer, "a framework that allows us to realistically emulate the performance characteristics of an HDFS cluster with thousands of nodes using less than 5 percent of the hardware needed in production."

While expressing surprise that open source Big Data releases aren't tested for scaling functionality -- which, after all, is the core tenet of Big Data to begin with -- LinkedIn said the testing deficit makes sense considering:

Scale testing is expensive -- the only way to ensure that something will run on a multi-thousand node cluster is to run it on a cluster with thousands of nodes.
HDFS is maintained by a distributed developer community, and many developers do not have access to large clusters.
Developers are aware that Apache community releases (as opposed to distributions like CDH and HDP) are not typically run in production.

The code for Dynamometer is on GitHub, where the project's three main components were described:

Infrastructure: This is the YARN application which starts a Dyno-HDFS cluster.
Workload: This is the MapReduce job which replays audit logs.
Block Generator: This is a MapReduce job used to generate input files for each Dyno-DN; its execution is a prerequisite step to running the infrastructure application.

The site says: "Dynamometer is a tool to performance test Hadoop's HDFS NameNode. The intent is to provide a real-world environment by initializing the NameNode against a production file system image and replaying a production workload collected via e.g. the NameNode's audit logs. This allows for replaying a workload which is not only similar in characteristic to that experienced in production, but actually identical."

LinkedIn has already enjoyed multiple benefits from the tool, it said, such as making tweaks to the configuration of an upgrade from Hadoop 2.3 to Hadoop 2.6. The tool tipped off engineers to adjust JVM heap size and garbage collection tuning parameters, "potentially avoiding a disaster upon upgrading."

About the Author

David Ramel is an editor and writer at Converge 360.

Featured

AppTrends

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

VSLive! 4-Day Hands-On Training Seminar: Hands-on with Blazor
May 5-8, 2025

Cybersecurity & Ransomware Live! VirtCon 2025
May 13-15, 2025

VSLive! 3-Day Hands-On Training Seminar: Master Modern JavaScript: Unlock the Full Potential of Your Code
June 2-4, 2025

VSLive! 2-Day Hands-On Training Seminar: Asynchronous and Parallel Programming in C#
June 24-25, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
July 15-18, 2025

Visual Studio Live! @ Microsoft HQ
August 4-8, 2025

Visual Studio Live! San Diego
September 8-12, 2025

Live! 360 2-Day Hands-On Seminar: Swimming in the Lakes of Microsoft Fabric and AI – A Hands-on Experience
September 18-19, 2025

Live! 360 Orlando
November 16-21, 2025

Artificial Intelligence Live! Orlando
November 16-21, 2025

Cloud & Containers Live! Orlando
November 16-21, 2025

Cybersecurity & Ransomware Live! Orlando
November 16-21, 2025

Data Platform Live! Orlando
November 16-21, 2025

Visual Studio Live! Orlando
November 16-21, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
December 16-19, 2025

Free White Papers

More Tech Library