Spark Gets R Language API -- ADTmag

Spark Gets R Language API

By David Ramel
June 12, 2015

A new API for the R programming language -- a favorite of data scientists doing Big Data analytics -- heads the list of updates in the new open source Apache Spark 1.4, commercial steward Databricks Inc. announced yesterday.

Spark has been described as perhaps the hottest open source project under development (Apache lists some 230 contributors to version 1.4). The data processing framework was originally designed to provide live, interactive queries and iterative algorithms, improving upon MapReduce, a core component of the Apache Hadoop ecosystem. It has proved so popular that some observers see it as rivaling Hadoop itself.

With the R support, coming with the API named SparkR, Spark uses get more functionality from the language featuring specialized statistical capabilities so instrumental to Big Data analytics -- so much so that Big Data player Microsoft acquired R language specialist Revolution Analytics earlier this year. SparkR has been described by its creators as a language binding that facilitates seamless integration of R and Spark with a lightweight front end, allowing R-based programs to scale in a distributed environment.

"Spark 1.4 introduces SparkR, an R API for Spark and Spark's first new language API since PySpark was added in 2012," Databricks exec Patrick Wendell said in a blog post yesterday. "SparkR is based on Spark's parallel DataFrame abstraction. Users can create SparkR DataFrames from 'local' R data frames, or from any Spark data source such as Hive, HDFS, Parquet or JSON. SparkR DataFrames support all Spark DataFrame operations including aggregation, filtering, grouping, summary statistics, and other analytical functions."

Spark DataFrames organize distributed data into named columns, the company said, somewhat like a relational database table. They're similar to the "data frames" in R and Python, but with added capabilities. This functionality lets developers use SQL query mix-ins and convert query results to DataFrames and vice versa. Wendell said that by leveraging Spark's parallel query engine, multiple machines and multiple cores can be put to work, allowing scale-out not possible with standalone R projects.

SparkR was announced as a developer preview project by UC berkeley's AMPLab in January 2014. AMPLab originally developed Spark and Databricks was formed by AMPLab researchers.

In addition to the R support, Databricks said Spark 1.4 features improvements in Windows functions, a stable machine learning (ML) pipeline API, and improved visualization and monitoring capabilities across the Spark stack. More information can be found in the release notes. Spark 1.4 can be downloaded here.

About the Author

David Ramel is an editor and writer at Converge 360.

Featured

AppTrends

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

VSLive! 2-Day Hands-On Training Seminar: Asynchronous and Parallel Programming in C#
June 24-25, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
July 15-18, 2025

Securing IT in the AI Era
July 23, 2025

VSLive! 4-Hour In-Depth Workshop: Immersive .NET Full Stack Training: C# Interfaces: Effective Usage while Avoiding Pitfalls
July 29, 2025

Visual Studio Live! @ Microsoft HQ
August 4-8, 2025

4-Hour VSLive! Workshop: Testability in .NET
August 27, 2025

Visual Studio Live! San Diego
September 8-12, 2025

Live! 360 2-Day Hands-On Seminar: Swimming in the Lakes of Microsoft Fabric and AI – A Hands-on Experience
September 18-19, 2025

VSLive! 2-Day Hands-On Training Seminar: Hands-On with .NET Web Development in 2025
October 7-8, 2025

Live! 360 Orlando
November 16-21, 2025

Artificial Intelligence Live! Orlando
November 16-21, 2025

Cloud & Containers Live! Orlando
November 16-21, 2025

Cybersecurity & Ransomware Live! Orlando
November 16-21, 2025

Data Platform Live! Orlando
November 16-21, 2025

Visual Studio Live! Orlando
November 16-21, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
December 16-19, 2025

Visual Studio Live! Las Vegas
March 16-20, 2026

Free White Papers

More Tech Library