IBM, Microsoft Lead New Spark-Based Offerings

With the Strata + Hadoop World conference underway, IBM, Microsoft and other companies are announcing new solutions based on the popular open source Big Data analytics framework, Apache Spark. Here's a roundup of this week's news.

  • IBM announced native Spark data processing on mainframes. "IBM z/OS Platform for Apache Spark enables Spark, an open-source analytics framework, to run natively on the z/OS mainframe operating system," the company said in a statement today. "The new offering, available now, enables data scientists to analyze data in place on the system origin, without the need to extract, transform and load (ETL), by breaking the tie between the analytics library and underlying file system."

    The new solution comes after IBM last June announced a massive developer investment in the wildly popular Spark project, calling it "potentially the most significant open source project of the next decade." After that announcement, IBM then revamped many of its data products while incorporating Spark technology and released new Spark-based solutions, of which IBM z/OS Platform for Apache Spark is the latest.

    "As businesses of all sizes transform into real-time digital organizations, they must be able to get a clear picture of all their enterprise data without the excessive time and risk of ETL," said exec Rod Smith in a statement today. "With Apache Spark enabled natively on IBM platforms -- now including z Systems -- customers can perform analytics alongside the transactional systems that house key data, while drawing contextual insights from other data sources, enabling them to engage with customers and generate revenue in real time."

    While mainframes might not receive much attention in the new era of cloud-first, enterprise mobility and Big Data, IBM said its z Systems still handle critical data transactions for many of the world's banking, insurance, retail and transport companies. The z Systems mainframes feature "the industry's fastest commercial microprocessor and the ability to perform in-transaction analytics, scoring predictive models within a transaction in 2 milliseconds or less," IBM said. "Organizations can now leverage these capabilities, applying advanced in-memory analytics through Spark without moving data off the mainframe, saving time and money and limiting risk."

    The company said data scientists and developers can leverage existing knowledge of programming languages such as Scala, Python, R and SQL to lessen the time needed to glean actionable insights from Big Data. Developers can use standard Spark APIs to work with data without having to move or copy the data first, IBM said, by directly accessing traditional z/OS data sources such as VSAM or SMF with SparkSQL. "This is creating new opportunities for data scientists and developers to apply advanced analytics to the system's rich data sets for real-time insights," the company said.

  • Microsoft announced an updated Spark for Azure HDInsight offering, taking advantage of enhancements to the open source framework in its latest edition, Apache Spark 1.6. HDInsight is Microsoft's managed Apache Hadoop, Spark, R, HBase and Storm cloud service.

    "Spark is one of the most popular Big Data projects, known for its ability to handle large-scale data applications in memory, batch and interactive queries, real-time streaming, machine learning, and graph processing with the same common execution model," company exec Joseph Sirosh said in a blog post today.

    Microsoft said Spark 1.6 brings critical performance improvements such as a 10x speedup for streaming state management, along with automatic memory management and new machine learning (ML) algorithms and functionality.

    "With Spark for Azure HDInsight, we offer customers more value with an enterprise-ready Spark solution that's fully managed, and a choice of compelling and interactive experiences with different BI tools and popular notebooks such as Jupyter (iPython)," Sirosh said. "This makes it easier for business analysts and data scientists to find new insights over Big Data."

    Microsoft also announced R Server for HDInsight, bringing the popular Big Data programming language to its managed cloud service, along with the general availability tomorrow of the Azure Data Catalog, described as "an enterprise metadata catalog and portal for the self-service discovery of data sources."

  • Tamr Inc., a Big Data analytics specialist with a self-described "human-guided, machine-driven approach to enterprise data preparation," announced its platform is now compatible with Spark.

    "Tamr is currently working with customers including GE, Toyota Motors Europe, GlaxoSmithKline and others to unify their data for making better decisions with better data," the company said in a statement yesterday. "Spark's in-memory architecture is ideal for scalable machine learning and greatly compliments Tamr's human-guided, machine-driven approach to enterprise data preparation."

    "Our customers are typically operating with hundreds of data sources and have dozens of consumers for that data," added exec Nidhi Aggarwal. "Adding Spark compatibility to Tamr ensures that we can continue to scale with our customers, differentiating us from data preparation tools designed for individual users working with a handful of data sources."

    Tamr also announced a new partnership with cognitive data science company DataRPM.

  • Platfora Inc. today announced the general availability of Platfora 5.2, its Big Data Discovery platform built natively on Spark and Apache Hadoop. The company said its offering helps business users quickly and visually interact with all of their data at scales ranging up to petabytes, letting them find new opportunities easier and manage risk.

    "The new release democratizes Big Data across an organization, moving it beyond IT and early adopters by enabling business users to explore Big Data and discover new insights through their favorite business intelligence (BI) tool," the company said.

    That's done via native integration to Tableau and Lens-Accelerated SQL accessible through any SQL client, said the company, which added that it provides its users with the option to run analytics directly on a Hadoop cluster using YARN.

    "Big Data discovery will help advance the analytics maturity of the organization, will start training some of the future data scientists, can provide the first batch of insights that may raise awareness to new opportunities and may provide enough return on investment to justify the business case for Big Data analytics," the company quoted Gartner analyst Joao Tapadinhas as saying. "It is the missing link that will make Big Data go mainstream."

  • Speaking of Tableau, that visual analytics specialist has entered into a partnership with Altiscale Inc., which provides Big Data-as-a-Service (BDaaS).

    Altiscale said the partnership will bring the "visual agility" of Tableau's software to its customers.

    "Enterprises are increasingly embracing Apache Hadoop and Apache Spark as the foundation for their Big Data strategy, storing and analyzing massive volumes of structured and unstructured data," Altiscale said in a statement today. "The newly launched Altiscale Insight Cloud delivers a secure, scalable interactive analytics platform in the cloud based on Hadoop, Spark and Hive, enabling exceptionally quick time to business value and transforming data into insight."

    Altiscale said the new pact with Tableau will help business analysts, IT pros and data scientists access, analyze and visualize the tremendous amounts of data available in Hadoop.

    "Altiscale shares our mission to help people see and understand their data," said Tableau exec Dan Kogan. "Partnerships with leading Hadoop and Spark providers such as Altiscale help us to bring rich visual analytics to anyone within the enterprise looking to derive value from data."

The Strata + Hadoop World conference runs through Thursday.

About the Author

David Ramel is an editor and writer for Converge360.