Open Sourced Greenplum Data Warehouse Now on GitHub -- ADTmag

Open Sourced Greenplum Data Warehouse Now on GitHub

By David Ramel
October 28, 2015

Pivotal Software Inc. followed through on its February promise to open source core components of its Big Data platform, placing the Greenplum data warehouse software on GitHub with an Apache 2 license.

"Today, Pivotal unveils the first massively parallel processing (MPP) data warehouse to open source," exec Gavin Sherry said in a blog post today. "Built over the past 10 years, with nearly 2 million lines of code, Pivotal has released the last of its core data solutions, Greenplum Database, under the Apache Software 2.0 license."

"The Greenplum Database is an advanced, fully featured, open source data warehouse," says the site on the popular source code repository. "It provides powerful and rapid analytics on petabyte-scale data volumes. Uniquely geared toward Big Data analytics, Greenplum Database is powered by the world's most advanced cost-based query optimizer delivering high analytical query performance on large data volumes."

Greenplum was originally based on the open source PostgreSQL object-relational database, the project's main Web site explains, with features added on such as MPP architecture, petabyte-scale loading, a query optimizer, polymorphic data storage and execution, and advanced machine learning via Apache MADLib technology.

**[Click on image for larger view.]** Pivotal's Move to Open Source *(source: Pivotal screen capture)*

"Setting the Greenplum Database apart, both architecturally and functionally, from any other open source data processing system such as Apache Hadoop, MySQL or even PostgreSQL, is Greenplum Database usage of MPP to execute complex SQL analytics on very large data sets at speeds multiple times faster than any other solution tested," Sherry said. "Powered by a next-generation query optimization technology that has never been available commercially outside of Pivotal, Greenplum Database comes packed with data management quality features, upgrade and expansion capabilities also not found in open source today."

Early this year Pivotal announced plans to open source Greenplum and other technology underlying its Big Data Suite as part of a move toward openness and interoperability that also saw the formation of the Open Data Platform to "promote Big Data technologies based on open source software from the Apache Hadoop ecosystem and optimize testing among and across the ecosystem's vendors."

"Our detailed plans are still being finalized, but we plan to begin release and incubation of Pivotal GemFire, Pivotal Hawq, and Pivotal Greenplum Database in a quarterly cadence," exec Michael Cucchi said in a February blog post. "We're closing in now on the structure of ownership of GemFire, Greenplum Database, and Hawq code to the most appropriate entity for working with the Big Data community."

That most appropriate entity turned out to be the Apache Software Foundation, overseeing code placed on GitHub, where Greenplum joins the GemFire NoSQL database and Hawq SQL-on-Hadoop project, under an Apache Software 2.0 license. Pivotal claimed that open sourcing the data warehousing technology would improve it, and yesterday reiterated its embrace of contributions to the project, which today sported 23,790 commits, one branch and 14 contributors.

"We believe that the release of such a proven and widely adopted data warehouse to open source will have substantial ripple effects throughout the industry," Sherry said. "Most importantly, with the barrier of entry to large-scale, real-time data analytics lowered, more companies are now equipped to tackle Big Data challenges. As a result, we expect to see greater success in large-scale, Big Data analytics across every industry."

Pivotal said the move indicates a "massive change" in the industry, with commercial vendors seeing open source as the best delivery mechanism and transformative driver of software development.

"This is better for vendors as well, given that so many customers simply will not consider software solutions outside of open source," Sherry said. "With major software vendors like Pivotal responding to market demands to open source all of their cloud and data products inside of 10 months -- that's nearly 10 million lines of code -- commercial vendors worldwide will have to accept that the days of a closed source, vendor-lock in, legacy models no longer suffice in today's modern era of computing."

About the Author

David Ramel is an editor and writer at Converge 360.

Featured

AppTrends

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

Live! 360 2-Day Hands-On Seminar: AI-Powered .NET Development with Claude & Claude Code
July 9-10, 2026

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training with CoPilot: 4-Day Hands-On Experience
July 14-17, 2026

Visual Studio Live! @ Microsoft HQ
July 27-31, 2026

Visual Studio Live! @ San Diego
September 14-18, 2026

The AI Pivot
September 25, 2026

Live! 360 6-Week Training & Certification Course: Mastering the Microsoft AI Framework: Building Enterprise-Ready AI Agents with Microsoft Foundry
October 6–November 10, 2026

VSLive! 6-Week Training & Certification Course: Blazor Developer Accelerator: Hands-On Skills for Real-World .NET Teams
October 7 – November 11, 2026

Live! 360 Orlando
November 15-20, 2026

Artificial Intelligence Live! Orlando
November 15-20, 2026

AI Enterprise Architecture Live! Orlando
November 15-20, 2026

Cybersecurity & Ransomware Live! Orlando
November 15-20, 2026

Data Platform Live! Orlando
November 15-20, 2026

Visual Studio Live! Orlando
November 15-20, 2026

Live! 360 2-Day Hands-On Seminar: AI-Powered .NET Development with Claude & Claude Code
December 8-9, 2026

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training with CoPilot: 4-Day Hands-On Experience
December 15-18, 2026

Visual Studio Live! Las Vegas
March 22-26, 2027

Visual Studio Live! @ Microsoft HQ
August 2-6, 2027

Free White Papers

More Tech Library