IBM Announces Spark Development Environment in the Cloud -- ADTmag

IBM Announces Spark Development Environment in the Cloud

By David Ramel
June 7, 2016

IBM, which last year announced a huge investment in Apache Spark technology as part of a mission to transform it into a kind of "analytics OS," today took that investment a step further by announcing a Spark development environment housed on its IBM Cloud Bluemix platform.

Described as the "first cloud-based development environment for near real-time, high performance analytics," the IBM Data Science Experience, now in preview, will provide some 250 curated data sets, a variety of open source tools and a collaborative workspace specifically targeting data scientists, "making it easier to rapidly develop applications that are infused with intelligence" with data developers.

"Today we are excited to announce the IBM Data Science Experience, an environment that has everything a data scientist needs to be successful," says a project blog post published last week. "IBM Data Science Experience is an interactive, collaborative, cloud-based environment where data scientists can use multiple tools to activate their insights. Data scientists can use the best of open source, tap into IBM's unique features, grow their capabilities, and share their successes."

A year ago, IBM hopped on the Spark bandwagon in a big way -- promising to put more than 3,500 researchers and developers to work on related projects at labs around the world -- while calling it "potentially the most significant open source project of the next decade." The company said today's announcement is building on that "$300 million investment in developing Apache Spark as a type of 'analytics operating system.'"

"IBM's Digital Science Experience is the killer enterprise app for Apache Spark, and gives data scientists new opportunities to deliver insight-driven models to developers, and opens the door for unprecedented innovation from the open source community," said exec Bob Picciano.

**[Click on image for larger view.]** The IBM Digital Science Experience *(source: IBM)*

The project site invites data scientists to get started by enrolling in a course, starting a project from a provided sample or from scratch, using tools such as RStudio, Jupyter Notebooks, Python, R and Scala, though most site links now just present a message that says: "IBM Data Science Experience is in limited preview. We will be in touch shortly with new features and functionality." Interested data scientists can sign up to be added to the waitlist.

A key component of the project -- when it becomes operational -- will be the RStudio open source statistical computing environment using the R programming language in the flagship RStudio IDE.

In addition to the project's open source capabilities, IBM said it's also adding new features and APIs, such as:

Sparkling.Data: Cleaning and preparing data for analysis are the tasks that data scientists typically spend the majority of their time on. We created a library that helps you discover the different file types and returns a data frame loaded with data (by default) from the file type that occurs the most. You can use it to infer the schema, discover data types, profile data sets, view range and distribution, reveal and fix bad data and much more.
Prescriptive Analytics: The Decision Optimization CPLEX Modeling library (DOcplex) contains modeling packages such as Mathematical Programming and Constraint Programming.
Shiny: Data scientists typically create visualizations to share their analysis with others. We include Shiny in the IBM Data Science Experience to allow you to create interactive analytic Web applications without coding any HTML, CSS, or JavaScript—only R. Check here to see a gallery of useful examples to learn more.
Data Connections: From the Notebook interface, you can set up data connections to Bluemix data services like Cloudant or dashDB or to on-premises or external services.
Schedule Jobs: From the Notebook interface, you can schedule jobs to run periodically.

Upon signing up and opening an account, data scientists will be provided with a deployed Spark-as-a-Service instance for analyzing data and 5 GB of object storage to store that data.

"Just as IBM played a critical role in the development of computer science, we can see many similarities today" said Picciano. "Computer science went mainstream with the introduction of the PC. With Data Science, the major roadblock is having access to large data sets and having the ability to work with so much data. With today's announcement, clients can have both."

Today's news came during the second day of the Spark Summit conference in San Francisco, where several other companies have issued Spark-based product announcements.

About the Author

David Ramel is an editor and writer at Converge 360.

Featured

AppTrends

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

Cybersecurity & Ransomware Live! VirtCon 2025
May 13-15, 2025

VSLive! 3-Day Hands-On Training Seminar: Master Modern JavaScript: Unlock the Full Potential of Your Code
June 2-4, 2025

VSLive! 2-Day Hands-On Training Seminar: Asynchronous and Parallel Programming in C#
June 24-25, 2025

4-Hour Hands-on Workshop: MCP Demystified
June 30, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
July 15-18, 2025

VSLive! 4-Hour In-Depth Workshop: Immersive .NET Full Stack Training: C# Interfaces: Effective Usage while Avoiding Pitfalls
July 29, 2025

Visual Studio Live! @ Microsoft HQ
August 4-8, 2025

4-Hour VSLive! Workshop: Testability in .NET
August 27, 2025

Visual Studio Live! San Diego
September 8-12, 2025

Live! 360 2-Day Hands-On Seminar: Swimming in the Lakes of Microsoft Fabric and AI – A Hands-on Experience
September 18-19, 2025

VSLive! 2-Day Hands-On Training Seminar: Hands-On with .NET Web Development in 2025
October 7-8, 2025

Live! 360 Orlando
November 16-21, 2025

Artificial Intelligence Live! Orlando
November 16-21, 2025

Cloud & Containers Live! Orlando
November 16-21, 2025

Cybersecurity & Ransomware Live! Orlando
November 16-21, 2025

Data Platform Live! Orlando
November 16-21, 2025

Visual Studio Live! Orlando
November 16-21, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
December 16-19, 2025

Visual Studio Live! Las Vegas
March 16-20, 2026

Free White Papers

More Tech Library