Big Data Product Watch 6/2/2017: Enterprise Spark, MongoDB DBaaS, MariaDB TX, Cloudera PaaS, More

Here's a roundup of recent news about Big Data, including Pentaho scaling Spark across the enterprise, MongoDB expanding the reach of its Database-as-a-Service (DBaaS), MariaDB unifying its offerings, Cloudera's new Altus Platform-as-a-Service (PaaS) and more.

  • Pentaho announced it was scaling Apache Spark across the enterprise with the launch of Pentaho Business Analytics 7.1, providing a drag-and-drop environment that supports Spark with all data integration steps. The Hitachi Group company said it features adaptive execution -- letting developers choose an execution engine at run time -- which opens up the number of people who can leverage Spark across the enterprise. Spark is initially supported, but the product will later allow developers to choose the best engine to execute specific data workloads.

    Pentaho's Adaptive Execution
    [Click on image for larger animated Gif.] Pentaho's Adaptive Execution (source: Pentaho)

    "Other vendors require users to create Spark-specific data integration logic, often requiring advanced Java programming skills, at a time when developer talent shortages are a reality," Pentaho said. "With adaptive execution, Pentaho 7.1 makes big data developers two times productive and expands the profile of technology talent who can work with Spark across the enterprise."

    Also, the new release supports Microsoft Azure HDInsight, Azure SQL and Azure SQL Server, adding to the existing support of Amazon EMR. In the security department, the new version adds to its existing enterprise-level security for Cloudera with the addition of similar security for Hortonworks with Kerberos Impersonation support, which the company said protect clusters from intrusion. It also announced enhanced visualization capabilities across the data pipeline.

  • MongoDB announced an expansion of its Atlas cloud-hosted DBaaS offering to 14 regions on the Amazon Web Services Inc. (AWS) platform, adding nine regions to the five regions previously available.

    "The new AWS regional capabilities allow customers to deploy their cloud databases locally for better performance, reduced costs and improved regulatory compliance," the company said. "Local, in-region backups are currently available in Dublin and the US Northeast, with comprehensive coverage planned for the end of 2017."

    The company also announced a new live migration service that it says makes it easier for developers to migrate existing MongoDB deployments to MongoDB Atlas.

    Furthermore, MongoDB said it was recently named a "Strong Performer" by Forrester Research in a new report covering the DBaaS market for Q2 2017.

    "Launched in June of last year, MongoDB Atlas has seen significant growth and already has thousands of users, including industry leaders such as eHarmony and Thermo Fisher Scientific," the company said.

  • MariaDB, which provides an open source transactional database solution designed for modern app development, announced the unification of its offerings under the new MariaDB TX 2.0.

    The new package encapsulates MariaDB Server, MariaDB MaxScale, database connectors, services and tools, the company said, including the new releases MariaDB Server 10.2 and MariaDB MaxScale 2.1. The company said the new product bundle is another step in its effort to support specific workloads, whether they are transactional, analytical or developer focused.

    MariaDB TX
    [Click on image for larger view.] MariaDB TX (source: MariaDB)

    "MariaDB Server, MariaDB MaxScale and MariaDB Cluster (Galera Cluster for MariaDB), along with MariaDB connectors and drivers, form the base technology in MariaDB TX," the company said in a blog post. "By bringing them together, we are building a modular, integrated platform rather than separate, independent products -- making them easier to deploy, easier to use and easier to manage ... together."

  • Cloudera, which was once characterized as one of the "Big 3" distributors of Hadoop-based offerings and which now calls itself "the provider of the leading modern platform for machine learning and advanced analytics," announced the release of Cloudera Altus.

    The company said the PaaS offering helps data engineers run large-scale data processing applications on public cloud platforms -- starting with AWS -- via on-demand infrastructure that speeds up the creation and operation of elastic data pipelines to power sophisticated, data-driven applications.

    Cloudera Altus Architecture
    [Click on image for larger view.] Cloudera Altus Architecture (source: Cloudera)

    "The Cloudera Altus Data Engineering service simplifies the development and operations of elastic data pipelines; putting data engineering jobs front and center and abstracting infrastructure management and operations that can be both time consuming and complex," the company said. "Altus also reduces the risk associated with cloud migrations. It provides users with familiar tools packaged in an open, unified, enterprise-grade platform service that delivers common storage, metadata, security and management across multiple data engineering applications."

    While Altus only runs on AWS now -- available in most regions -- the company said it intends to extend support to other public clouds such as Microsoft Azure in the future.

    The initial release supports Apache Spark, Apache Hive on MapReduce2 and Hive on Spark.

  • IBM announced a new DBaaS toolkit to run on its Power Systems servers built with open technologies and designed for Big Data.

    The new Open Platform for DBaaS on IBM Power Systems offering is optimized to work with open source databases such as MongoDB, EDB PostgreSQL, MySQL, MariaDB, Redis, Neo4j and Apache Cassandra.

    "The new platform gives database administrators and developers the ability to easily deploy a fully configured private cloud with automated provisioning for open source database services," IBM said. "Users can easily gain the efficiency of a cloud delivery model, while also maintaining oversight and control of resource allocation and secure data policies. Because the Open DBaaS Platform is built on OpenStack, it can also easily be incorporated into the organization's hybrid cloud management strategy."

    The company said the new offering includes:

    • A self-service portal for end users to deploy their choice of the most popular open source community databases in minutes
    • An elastic cloud infrastructure for a highly scalable, automated, economical and reliable open platform for on-premises, private cloud delivery of DBaaS
    • A disk image builder tool for clients who want to build and deploy their own custom databases to the database image library
    • An open source, cloud-oriented operations manager with dashboards and tools to visualize, control, monitor, and analyze the physical and virtual resources
    • A turnkey, engineered solution comprised of compute, block and archive storage servers, JBOD disk drawers, OpenStack control plane nodes, and network switches pre-integrated with the open source DBaaS toolkit
  • Pythian announced its new "Kick Analytics as a Service" (Kick AaaS), which it described as "a customized analytics solution that integrates multiple data types from both internal and external sources, empowering businesses to access insights and derive value from their data."

    The global technology services company said its fully managed, end-to-end service running in the cloud brings together data from multiple sources in multiple formats for advanced analytics and visualization in a central data hub, targeting various classes of users from IT pros to data scientists to business intelligence (BI) specialists.

    "The services combine the latest technologies (such as cloud, analytics, automation and machine learning) and business-savvy expertise to deliver insights, automate processes and enhance products," the company said. "Pythian's certified cloud, Big Data and analytics experts help clients at every stage of their data journey -- from adopting a scalable platform to housing Big Data assets, to predicting outcomes through advanced analytics."

  • Redis Labs announced several new capabilities for its Redis Enterprise offering designed expand Redis usage in mission-critical applications. In addition to the enterprise product the company is behind the open source Redis product, which is open source, in-memory data structure store, acting as a database, cache and message broker.

    The new capabilities for the enterprise product include:

    • A blueprint for using Redis Enterprise in IoT applications with the new Redis Stream data structure
    • Multi-master geographically distributed Redis Enterprise, implemented with Conflict-Free Replicated Data Types (CRDTs) for strong eventual consistency between replicated instances
    • Redis Cloud Private with a new zero-touch experience for Redis Enterprise Flash with multi-region, multi-cloud replication

    The company also defined an end-to-end blueprint designed to help organizations implement Redis in Internet of Things (IoT) applications consisting of:

    • Device support -- Redis now supports Raspberry Pi and ARM processors suitable for small footprint IoT devices and endpoints.
    • Edge computing --Small form factor clusters of Redis Enterprise capable of cost effective (with Redis Enterprise Flash) handling of millions of IoT events and a variety of data types with Redis modules such as time-series, geo, graph, JSON, machine learning and search.
    • Datacenter/Cloud implementation -- Geographically distributed Redis Enterprise deployments in private environments or public clouds for large-scale, real-time IoT data.
    • End-to-end streaming data processing -- The Redis Stream data structure, API, Stream client, and modules provide aggregation, transformation, filtering and forwarding necessary for streaming data, and can be run on the devices, the edge and the datacenter.
  • Datasparc announced DBHawk 3.2.1, the latest version of its Web-based SQL editor.

    It works with Oracle, MSSQL, MySQL, DB2, PostgreSQL, AWS Redshift, Microsoft Azure, Teradata, Netezza, SAP Hana and other databases that are JDBC-compliant. It runs on Windows, Mac OS X and Linux platforms and can run with Docker, the company said.

    Datasparc emphasized the product's access control capabilities, which it said were more sophisticated than existing databases that rely on a simple access control model.

    "While working with various business users, it is not easy to grant access to a particular user to view only certain objects of the databases and maintaining this policy," said exec Dave Shaw in a statement. "We are happy to announce this new addition in the database security tools and this Web-based SQL tool will help many business users worldwide."

With some Big Data conferences and summits coming up soon, stay tuned for more news in this growing space.

About the Author

David Ramel is an editor and writer for Converge360.