Big Data Product Watch 9/30/16: Apache Spark 2.0, Microservices, HDInsight, More -- ADTmag

Big Data Product Watch 9/30/16: Apache Spark 2.0, Microservices, HDInsight, More

By David Ramel
September 30, 2016

With an industry conference just concluded in New York, here's a roundup of this week's Big Data news, featuring new products and services from Cloudera, MapR, Hortonworks, Pentaho, Cask, Zoomdata, Blue Talon, Alation, Splice Machine and ODPi.

Cloudera Inc., fresh off teaming up with CenturyLink to provide Big Data-as-a-Service (BDaaS), announced a beta version of its Apache Spark 2.0 distribution, along with a 1.0 distribution of technology it originally developed, Apache Kudu.

Spark, of course, is the popular data processing engine that subsumed MapReduce in the Apache Hadoop ecosystem, facilitating fast, in-memory, real-time streaming analytics. The open source project hit version 2.0 in July, and Cloudera said it provides:
- Better performance and enhanced usability with the new Dataset API.
- Structured Streaming for better performance and easier ingestion of traditional structured data for time series, tabular and Internet of Things (IoT) data.
- Compile-time type safety for user-defined functions for improved reliability in mission-critical applications.
- Machine learning model, pipeline persistence and newly supported machine learning libraries to take on new data sets and analytic applications.

Kudu, a high-performance columnar store for Hadoop that allows for fast analytics on fast (rapidly changing) data, hit 1.0 earlier this month. Cloudera said it provides:

A simplified architecture that enables very fast batch and stream processing.
Fault tolerance and scalability into the hundreds of nodes.
A columnar structure that enables analytic analysis on the latest data, for real-time use cases such as time series data, machine data analytics and online reporting.

Pentaho announced increased integration for Spark in its namesake Big Data platform, along with other integration enhancements.
The Hitachi Group company said its Spark integration lowers the skills barrier for adopting the technology because data pros can use it through the company's Pentaho Data Integration, leveraging the SQL query language.

The company also announced expanded metadata injection capabilities; expanded Hadoop data security integrations; Apache Kafka support; and enhanced support for popular Hadoop file formats, enabling the output of files in Avro and Parquet formats in PDI.

"These Big Data integration enhancements help IT teams deliver value from Big Data projects faster with existing resources, by eliminating the need for manual coding, providing tighter security and supporting more of the Big Data technology ecosystem," the company said.
MapR Technologies Inc. announced support for event-driven microservices on its Converged Data Platform. The company said such microservices can provide "continuous analytics, automated actions and rapid response to better impact business as it happens."
The platform's microservices support relies on underlying capabilities such as:
- Comprehensive monitoring of cluster-wide operations and resource usage in a single pane of glass view.
- Microservices-specific volumes for application versioning, simplifying the development lifecycle and production deployment.
- Microservices for A-B and multivariate testing enabling rapid machine learning model development and optimization.
"Microservices applications and other converged application development is simplified on the MapR Platform," the company said. "Developers have the freedom to combine file, database, document and streaming analytics functionality. With a single line of code, developers can easily persist complex data types with JSON in MapR-DB, so they can focus on developing innovative features. Customizable dashboards deliver full visibility of cluster hardware and software operations, utilization and service logs. The Exchange, part of the MapR Converge Community, provides a public forum for sharing best practices around microservices, dashboards and code snippets."
Cask Data Inc. announced its Cask Data App Platform (CDAP) has been certified with the Microsoft Azure cloud service and is integrated with Microsoft Azure HDInsight, a Hadoop distribution powered by the cloud.
"The 100 percent open source ... CDAP accelerates time to value from Hadoop through standardized APIs, pre-built templates and visual interfaces," the company said. "Furthermore, it increases efficiencies through reusable and portable components. CDAP also includes two built-in self-service extensions, which integrate with the rest of the Azure and Microsoft stack."

Those are Cask Hydrator, which connects to Azure Storage, SQL Server and other cloud services and provides code-free, drag-and-drop data ingestion and processing pipelines, and Cask Tracker, enabling self-service data discovery across data ingested with Hydrator.

The company earlier this month announced that CDAP 4 is coming soon, featuring a "Big Data app store" called Cask Market that features pre-built Hadoop solutions, reusable templates and production-ready data pipelines including S3 to Azure Storage, SQL Server to HBase for HDInsight and more.
Splice Machine Inc. announced native PL/SQL support on its namesake platform. "PL/SQL support dramatically reduces the time and cost for companies to offload their Big Data workloads from Oracle databases. It is available immediately through the Splice Machine Enterprise Edition," the company said.

Zoomdata Inc. announced that its visual analytics platform for Big Data -- via a partnership with analytics company Teradata -- features new support for Teradata Database (including the Amazon Web Services version and the Cloudera and Hortonworks editions of Teradata Appliance for Hadoop. "The partnership enables customers to leverage a distributed Teradata environment with a single, unified visual analytics front end," the company said.

Hortonworks Inc. announced updates to the Microsoft Azure HDInsight Hadoop Cloud solution, powered by Hortonworks Data Platform 2.5 and the Spark 2.0 platform. "As part of its collaboration with Microsoft, these updates will allow customers to achieve big data query speeds that approach data warehousing performance, provide a highly secure Hadoop solution in the cloud, and allow for an easier experience for administrators to spin up third party ISV applications," the company said.

Blue Talon Inc., specializing in data-centric security, announced the availability of BlueTalon Test Drive, designed to enable organizations to easily trial its data-aware security solution. Along with consulting support from a company expert, Blue Talon said, "The BlueTalon Test Drive provides an opportunity to easily trial BlueTalon security for one month and evaluate how data-aware access controls can simplify security and compliance on big data technology such as Hadoop."

Alation Inc. announced it will release in the fourth quarter the Alation Data Catalog in version 4.0 with Alation Connect, described as a new connectivity layer that can catalog queries from various compute engines such as Presto, SparkSQL and IBM Watson DataWorks. "Alation Connect uses machine learning algorithms to automatically catalog queries executed through popular compute engines and track patterns in how joins, filters and query logic are used by analysts to interpret data," the company said.

ODPi, a nonprofit organization that seeks to accelerate the open ecosystem of Big Data solutions, announced that DataTorrent, IBM, Pivotal, SAS, Syncsort, WANdisco and Xavient committed to the organization's Interoperable Compliance Program. "This makes it easier for enterprises to choose and adopt Big Data technologies and ensures these applications are interoperable across a wider range of commercial Apache Hadoop platforms," the ODPi said.