MapR To Integrate Drill and Spark Big Data Projects -- ADTmag

MapR To Integrate Drill and Spark Big Data Projects

By David Ramel
October 14, 2014

Enterprise Hadoop distribution vendor MapR Technologies Inc. is seeking to integrate the open source Apache Drill and Apache Spark projects used for Big Data analytics in the Hadoop ecosystem.

In addition to its MapR Distribution for Apache Hadoop, the company has been leading development efforts on the Drill project, a low-latency SQL-on-Hadoop query engine for Hadoop and NoSQL that it says provides real-time, self-service exploration of data residing on multiple data sources.

Now MapR is seeking to integrate that technology with Spark, an in-memory data analytics cluster computing framework that it said provides advantages in speed, easier programming and real-time processing. Spark is often described as an upgrade to the MapReduce technology that was a mainstay of early Hadoop systems but was widely criticized for its limitations as the ecosystem evolved.

Spark is an increasingly popular project whose development has primarily been stewarded by Databricks Inc., though Hortonworks Inc. -- a primary competitor of MapR -- recently announced it was committing more developer resources to the project in advance of including it in Hortonworks' own Hadoop-based platform. MapR added Spark to its distribution in April. Databricks last week announced that Spark broke the record for large-scale sorting.

"The MapR initiative to integrate Apache Drill with Apache Spark's high-performance, in-memory data processing will provide a powerful combination," MapR quoted analyst John Webster at Evaluator Group as saying in its announcement yesterday. "MapR support for the complete Spark stack provides Drill users the ability to create advanced data pipelines that leverage Drill's data agility and Spark's batch processing capabilities."

A key feature of Drill is the ability to immediately conduct queries across complex data residing in native formats, even if that data is nested or isn't described by schemas or uses schemas that rapidly evolve. "Because SQL queries can run directly on various file formats, live data can be explored as it is coming in, versus spending weeks preparing and managing schemas and setting up ETL tasks," MapR said. "Additionally, Apache Drill supports ANSI SQL so users can easily leverage their SQL skills and existing investments in business intelligence (BI) tools."

Databricks, meanwhile, praised the MapR initiative to integrate the two technologies, just as it welcomed increased development efforts by Hortonworks. "As the driving force behind Spark, Databricks is pleased to see continued and expanded innovation around Spark to help users derive value from big data faster," said Ion Stoica, CEO of Databricks. "We are looking forward to MapR integrating Drill with Spark to enable enterprises to expand processing options and unlock deeper insights from their data faster."

About the Author

David Ramel is an editor and writer at Converge 360.

Featured

AppTrends

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

Live! 360 2-Day Hands-On Seminar: From Traction to Production: Building Generative AI Applications with Azure AI Studio
March 25-26, 2025

VSLive! 4-Day Hands-On Training Seminar: Hands-on with Blazor
May 5-8, 2025

Cybersecurity & Ransomware Live! VirtCon 2025
May 13-15, 2025

VSLive! 3-Day Hands-On Training Seminar: Master Modern JavaScript: Unlock the Full Potential of Your Code
June 2-4, 2025

VSLive! 2-Day Hands-On Training Seminar: Asynchronous and Parallel Programming in C#
June 24-25, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
July 15-18, 2025

Visual Studio Live! @ Microsoft HQ
August 4-8, 2025

Visual Studio Live! San Diego
September 8-12, 2025

Live! 360 2-Day Hands-On Seminar: Swimming in the Lakes of Microsoft Fabric and AI – A Hands-on Experience
September 18-19, 2025

Live! 360 Orlando
November 16-21, 2025

Artificial Intelligence Live! Orlando
November 16-21, 2025

Cloud & Containers Live! Orlando
November 16-21, 2025

Cybersecurity & Ransomware Live! Orlando
November 16-21, 2025

Data Platform Live! Orlando
November 16-21, 2025

Visual Studio Live! Orlando
November 16-21, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
December 16-19, 2025

Free White Papers

More Tech Library