News

Netflix Hollow: Java Tools for Managing In-Memory Datasets

Netflix has announced a new Java library and toolkit for managing small- to medium-sized in-memory datasets that are disseminated from a single organization to multiple users for read-only access. Dubbed Hollow, the open source offering replaces the Zeno in-memory data distribution framework the company announced in 2013.

Hollow provides an alternative to traditional strategies for distributing larger sets of data not easily defined as "Big Data" -- product and document metadata, for example -- over a network. Instead of storing the data in a central repository or serializing it in JSON or XML (or implementing a hybrid of both), and then distributing it via network edge servers, Hollow caches an entire read-only dataset in memory. This solution solves core performance issues for the globally distributed streaming media service, but also offers to "liberate" large datasets in other contexts.

"Hollow shifts the scale in terms of appropriate dataset sizes for an in-memory solution," Netflix senior software engineer Drew Koszewnik explained on the company's tech blog. "Datasets for which such liberation may never previously have been considered can be candidates for Hollow."

Hollow may be appropriate, for example, for datasets which, if represented with JSON or XML, might require more than 100GB, Koszewnik said.

Netflix has been using Hollow internally for about two years, evolving its capabilities so that it now provides in-memory caching of datasets of up to several gigabytes. It was built "with servers busily serving requests at or near maximum capacity in mind," the company said. Hollow can also generate a custom API automatically based on a specific data model. And it can, over time, calculate the changes in a dataset automatically.

"Hollow has been enormously beneficial at Netflix," Koszewnik said. "We've seen server startup times and heap footprints decrease across the board in the face of ever-increasing metadata needs. Due to targeted data modeling efforts identified through detailed heap footprint analysis made possible by Hollow, we will be able to continue these performance improvements." The company also reported "huge productivity gain" around the dissemination of its product catalog. "This is due in part to the tooling that Hollow provides, and in part due to architectural choices which would not have been possible without it," he said.

Netflix has been providing open source tooling and resources like Hollow via its Open Source Software Center for some time now. Its offerings range from Big Data tools and services to build and delivery tools, runtime services, and security applications that run on the NetflixOSS platform. The OSS Center focuses on technology "providing immersive experiences across all internet-connected screens."

The Hollow code is available now on GitHub, and documentation is available on the company's Web site.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].