MapR To Integrate Drill and Spark Big Data Projects -- ADTmag

MapR To Integrate Drill and Spark Big Data Projects

By David Ramel
October 14, 2014

Enterprise Hadoop distribution vendor MapR Technologies Inc. is seeking to integrate the open source Apache Drill and Apache Spark projects used for Big Data analytics in the Hadoop ecosystem.

In addition to its MapR Distribution for Apache Hadoop, the company has been leading development efforts on the Drill project, a low-latency SQL-on-Hadoop query engine for Hadoop and NoSQL that it says provides real-time, self-service exploration of data residing on multiple data sources.

Now MapR is seeking to integrate that technology with Spark, an in-memory data analytics cluster computing framework that it said provides advantages in speed, easier programming and real-time processing. Spark is often described as an upgrade to the MapReduce technology that was a mainstay of early Hadoop systems but was widely criticized for its limitations as the ecosystem evolved.

Spark is an increasingly popular project whose development has primarily been stewarded by Databricks Inc., though Hortonworks Inc. -- a primary competitor of MapR -- recently announced it was committing more developer resources to the project in advance of including it in Hortonworks' own Hadoop-based platform. MapR added Spark to its distribution in April. Databricks last week announced that Spark broke the record for large-scale sorting.

"The MapR initiative to integrate Apache Drill with Apache Spark's high-performance, in-memory data processing will provide a powerful combination," MapR quoted analyst John Webster at Evaluator Group as saying in its announcement yesterday. "MapR support for the complete Spark stack provides Drill users the ability to create advanced data pipelines that leverage Drill's data agility and Spark's batch processing capabilities."

A key feature of Drill is the ability to immediately conduct queries across complex data residing in native formats, even if that data is nested or isn't described by schemas or uses schemas that rapidly evolve. "Because SQL queries can run directly on various file formats, live data can be explored as it is coming in, versus spending weeks preparing and managing schemas and setting up ETL tasks," MapR said. "Additionally, Apache Drill supports ANSI SQL so users can easily leverage their SQL skills and existing investments in business intelligence (BI) tools."

Databricks, meanwhile, praised the MapR initiative to integrate the two technologies, just as it welcomed increased development efforts by Hortonworks. "As the driving force behind Spark, Databricks is pleased to see continued and expanded innovation around Spark to help users derive value from big data faster," said Ion Stoica, CEO of Databricks. "We are looking forward to MapR integrating Drill with Spark to enable enterprises to expand processing options and unlock deeper insights from their data faster."

About the Author

David Ramel is an editor and writer at Converge 360.

Featured

AppTrends

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

Live! 360 2-Day Hands-On Seminar: AI-Powered .NET Development with Claude & Claude Code
July 9-10, 2026

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training with CoPilot: 4-Day Hands-On Experience
July 14-17, 2026

Visual Studio Live! @ Microsoft HQ
July 27-31, 2026

Visual Studio Live! @ San Diego
September 14-18, 2026

The AI Pivot
September 25, 2026

Live! 360 6-Week Training & Certification Course: Mastering the Microsoft AI Framework: Building Enterprise-Ready AI Agents with Microsoft Foundry
October 6–November 10, 2026

VSLive! 6-Week Training & Certification Course: Blazor Developer Accelerator: Hands-On Skills for Real-World .NET Teams
October 7 – November 11, 2026

Live! 360 Orlando
November 15-20, 2026

Artificial Intelligence Live! Orlando
November 15-20, 2026

AI Enterprise Architecture Live! Orlando
November 15-20, 2026

Cybersecurity & Ransomware Live! Orlando
November 15-20, 2026

Data Platform Live! Orlando
November 15-20, 2026

Visual Studio Live! Orlando
November 15-20, 2026

Live! 360 2-Day Hands-On Seminar: AI-Powered .NET Development with Claude & Claude Code
December 8-9, 2026

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training with CoPilot: 4-Day Hands-On Experience
December 15-18, 2026

Visual Studio Live! Las Vegas
March 22-26, 2027

Visual Studio Live! @ Microsoft HQ
August 2-6, 2027

Free White Papers

More Tech Library