Facebook Open Sources Presto Distributed SQL Query Engine for Big Data -- ADTmag

Facebook Open Sources Presto Distributed SQL Query Engine for Big Data

By David Ramel
November 7, 2013

Facebook yesterday open sourced Presto, its distributed SQL query engine built to improve Big Data analytics beyond existing solutions such as Hadoop MapReduce and Hive.

The improvements are highlighted by low-latency queries and interactivity, Facebook's Martin Traverso said in a post yesterday. About a year ago, he explained, a team started to tackle the problem of improving analytics on Facebook's 300-petabyte data warehouse, one of the largest in the world. All that information is stored in Hadoop clusters using the Hadoop Distributed File System (HDFS), but existing Big Data query engines didn't provide enough performance.

"Hadoop MapReduce and Hive are designed for large-scale, reliable computation, and are optimized for overall system throughput," Traverso said. "But as our warehouse grew to petabyte scale and our needs evolved, it became clear that we needed an interactive system optimized for low query latency."

Presto has provided up to 10x improvement in CPU efficiency and latency over Hive and MapReduce for most queries, Traverso said. Key to the performance boost are in-memory processing and improved scheduling that eschews the MapReduce system of running tasks consecutively.

The architecture is shown below:

**[Click on image for larger view.]** *The Presto Architecture (Source: Facebook)*

Also, it supports a large part of the ANSI SQL standard and can access several types of disparate data sources. "A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization," claims the Presto site. Those multiple sources include traditional RDBMS systems, Hadoop-related stores such as Apache Hive and Apache HBase, the Facebook-developed Scribe data aggregation server and proprietary systems such as Facebook's News Feed.

Facebook said Presto is used by more than 1,000 employees who each day rack up some 30,000 queries that combined process more than a petabyte of data. It's in use--at least in pilot testing programs--by companies such as Airbnb and Dropbox.

Traverso noted that the biggest restrictions of the system are a limit on the size of joined tables, inability to write data output back to tables and the "cardinality of unique keys/groups." He said the company is working on addressing those restrictions and also improving performance further by developing a new data format optimized to accelerate queries. Another feature in the works is a high-performance HBase connector.

Implemented in Java, Presto is available on GitHub. It runs only on Linux or Mac OS X and requires 64-bit Java 7, Maven 3 and Python 2.4+.

About the Author

David Ramel is an editor and writer at Converge 360.

Featured

AppTrends

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

Live! 360 2-Day Hands-On Seminar: AI-Powered .NET Development with Claude & Claude Code
July 9-10, 2026

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training with CoPilot: 4-Day Hands-On Experience
July 14-17, 2026

Visual Studio Live! @ Microsoft HQ
July 27-31, 2026

Visual Studio Live! @ San Diego
September 14-18, 2026

The AI Pivot
September 25, 2026

Live! 360 6-Week Training & Certification Course: Mastering the Microsoft AI Framework: Building Enterprise-Ready AI Agents with Microsoft Foundry
October 6–November 10, 2026

VSLive! 6-Week Training & Certification Course: Blazor Developer Accelerator: Hands-On Skills for Real-World .NET Teams
October 7 – November 11, 2026

Live! 360 Orlando
November 15-20, 2026

Artificial Intelligence Live! Orlando
November 15-20, 2026

AI Enterprise Architecture Live! Orlando
November 15-20, 2026

Cybersecurity & Ransomware Live! Orlando
November 15-20, 2026

Data Platform Live! Orlando
November 15-20, 2026

Visual Studio Live! Orlando
November 15-20, 2026

Live! 360 2-Day Hands-On Seminar: AI-Powered .NET Development with Claude & Claude Code
December 8-9, 2026

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training with CoPilot: 4-Day Hands-On Experience
December 15-18, 2026

Visual Studio Live! Las Vegas
March 22-26, 2027

Visual Studio Live! @ Microsoft HQ
August 2-6, 2027

Free White Papers

More Tech Library