Big Data Product Watch 9/25/15: Managed Services, Container Deployment and More

In a busy week on the Big Data front, Google announced a managed service for Hadoop and Spark, two technologies also addressed by BlueData's new container deployment functionality. Also coming out this week were Internet of Things (IoT) solutions, new open source NoSQL databases and more.

  • Google Cloud Dataproc is a new beta offering described as "a managed Hadoop MapReduce, Spark, Pig and Hive service designed to easily and cost effectively process big datasets."

    Running on the Google Cloud Platform, Cloud Dataproc is designed to reduce the complexity of the myriad tools used to analyze large amounts of data for business insights, addressing concerns such as deployment, scaling, monitoring, utilization and cost. It leverages open source tools for batch processing, querying, streaming and machine learning functionality.

    "Cloud Dataproc automation helps you create clusters quickly, manage them easily and save money by turning clusters off when you don't need them," the company said in a Wednesday blog post. "With less time and money spent on administration, you can focus on your jobs and your data."

    The offering comes with developer tools that provide multiple ways to manage clusters, such as a Web UI, the Google Cloud SDK and RESTful APIs, along with SSH access.

  • BlueData Software Inc. announced a new version of its infrastructure software platform for Big Data analytics that is designed to speed up deployment times. BlueData EPIC 2.0 adds Docker container functionality that was first introduced in free versions of BlueData's software.

    "This new fall release leverages Docker containers to simplify Big Data clusters, supports Apache Zeppelin notebooks and other new functionality for Apache Spark, and includes an enhanced App Store that provides one-click access to Big Data distributions and analytics tools," the company said in a statement. With the Docker functionality, BlueData said enterprises can quickly deploy Hadoop or Spark technology in a lightweight container implementation, claiming that on-premises deployments can be reduced from taking several months to just a couple of days.

    BlueData said the containerized Big Data environments can be run on bare-metal physical servers or virtual machines, making container usage invisible to end users of its software.

  • At its Cassandra Summit 2015 conference, DataStax Inc., commercial steward of the Cassandra NoSQL database, unveiled a new version of its enterprise offering, released the open source graph database Titan 1.0 and announced the availability of a preview of Apache Cassandra 3.0.

    DataStax Enterprise 4.8 "supplies upgrades and enhancements for running transactional-analytical applications that require lightning-fast search," the company said, targeting its purpose-built database platform at Internet of Things (IoT), Web and mobile applications.

    Titan 1.0 was described as "a highly scalable open source graph database optimized for storing and querying graphs containing billions of vertices and edges distributed across a multi-machine cluster." New enhancements to the Titan project include support for TinkerPop and Gremlin 3.0 OLTP and OLAP query capability, support for Spark/Giraph/Hadoop OLAP operations, and an advanced query optimizer with a speedier query rewrite engine. The new Titan supports Cassandra 2.2, HBase 1.x and Elasticsearch 1.5 and features numerous performance optimizations and fixes, the company said.

    DataStax also congratulated the Apache Cassandra open source project for the first release candidate of Cassandra 3.0. "Cassandra 3.0 will deliver impressive storage savings over prior versions," said exec Jonathan Ellis. "Developers can also look forward to 3.0's new materialized views, which greatly simplify application development scenarios that use denormalized tables to optimize multiple query patterns."

  • Objectivity Inc., a high-performance distributed object-oriented database specialist, this week introduced ThingSpan, described as "the first purpose-built Information Fusion platform that simplifies and accelerates any organization’s ability to deploy Industrial IoT applications to enhance value derived from Big Data and Fast Data."

    With the advent of more Internet-connected sensors and devices, "Fast Data" is defined by Objectivity as high-volume, time-sensitive data.

    Objectivity said ThingSpan is designed to integrate with leading open source Big Data technologies such as Hadoop Distributed File System (HDFS), YARN and Apache Spark. Through Spark-based abstraction, the company said, it provides simpler and faster way to build, deploy and manage advanced analytics solutions.

About the Author

David Ramel is an editor and writer for Converge360.