Big Data Product Watch 1/28/16: Top Hadoop Distros, Free Training, Cloud Deployment and More

Here's a roundup of recent news from Cloudera Inc., MapR Technologies Inc., VoltDB Inc. and others, including an analyst's rankings of the top Hadoop distributions, expanded free Hadoop training, new products and more.

  • Cloudera announced the new Cloudera Director 2.0 for deploying and managing cloud-based Hadoop environments in the enterprise. Claiming to be the only Hadoop vendor supporting hybrid cloud implementations, Cloudera is providing Director 2.0 as a free download for customers using its CDH and Cloudera Enterprise distributions.

    Cloudera said the new Director edition lowers operating costs and simplifies the running of common Hadoop workloads in the cloud, such as Extract, Transform, Load (ETL) and modeling; business intelligence (BI) and analytics; and application delivery.

    Enhancements include support for AWS Spot instances on the Amazon Web Services Inc. cloud and Preemptible instances on the Google Cloud Platform (GCP); support for Apache Hive and Apache Spark on Amazon Simple Storage Service (Amazon S3) (released with Cloudera 5.5); cluster cloning and repair; external database connectors and integration with Cloudera Enterprise backup and disaster recovery; customizable templates and configurations to easily manage and repeat deployments; and more.

    "Recent ESG research shows that the No. 1 spending priority amongst those responsible for their companies' strategic Big Data investments is leveraging cloud-based analytics offerings," the company quoted Nik Rouda, an analyst at Enterprise Strategy Group, as saying. "Cloudera Director 2.0 is very well aligned to this imperative, enabling businesses to manage Hadoop deployments across both cloud and on-premises environments, providing a robust and mature solution."

  • MapR announced an expansion of its free on-demand Hadoop and Spark training courses, offered as a way to help address the continuing Big Data skills shortage that is said to be hampering enterprise Big Data initiatives.

    "As more organizations invest in Big Data, the shortage of available skills and capabilities will become more acute," the company quoted Gartner Inc. analysts Nick Heudecker and Lisa Kart as saying in a recent report. "Instead of facing a difficult recruiting market, organizations should focus on adapting available skills and engaging with established service providers to fill the skills gap."

    MapR said its classes -- utilized by more than 50,000 participants -- provide depth and content equivalent to instructor-led training, featuring hands-on exercises, labs and quizzes in an effort to provide engaging and interactive experiences for data professionals. The company also offers certification exams for targeted audiences that can lead to certified Hadoop and HBase professional designations for developers, analysts and others.

    "When we launched these free on-demand courses, we set an ambitious enrollment goal of 10,000 students, which is estimated to be a $50 million in-kind contribution to the open source community," said MapR exec Suzanne Ferry. "With a response five times over original projections, the On-Demand Training offerings are updated continually to address new technologies, like Spark, which in turn helps meet the ongoing skill development interests of company employees, industry consultants and individuals. Our comprehensive online curriculum is driving the emergence of a strong community in support of the latest advancements in Big Data technologies."

  • Coincidentally, both Cloudera and MapR were named as "Leaders" in a new report from analyst firm Forrester Research Inc.

    "Five top vendors have significantly improved their offerings," says the for-sale report, "The Forrester Wave: Big Data Hadoop Distributions, Q1 2016."

    Joining Cloudera and MapR in the "Leaders" category are Hortonworks Inc. and IBM, while Pivotal Software Inc. scored lower and was classified as a "Strong Performer."

    "Enterprise Hadoop is a market that is not even 10 years old, but Forrester estimates that 100 percent of all large enterprises will adopt it (Hadoop and related technologies such as Spark) for Big Data analytics within the next two years," says the report, authored by analysts Mike Gualtieri and Noel Yuhanna. "The stakes are exceedingly high for the pure-play distribution vendors Cloudera, Hortonworks and MapR Technologies, which have all of their eggs in the Hadoop basket. Currently, there is no absolute winner in the market; each of the vendors focuses on key features such as security, scale, integration, governance, and performance critical for enterprise adoption."

    Though Forrester sells the report, both Cloudera and MapR are providing it free upon supplying registration information.

  • VoltDB -- eschewing the term "Big Data" in favor of "fast data" -- announced new geospatial query support in the latest release of its "world's fastest operational SQL database," VoltDB 6.0.

    Based on an in-memory massively parallel database architecture, VoltDB 6.0 also features improved fast data ingestion and export connections; cross-datacenter "active/active" replication for improved availability and disaster recovery; and enhanced deployment achieved through improved management and usability with the VoltDB Management Center, including better cluster administration, workload balancing and monitoring.

    "The theme for the VoltDB v6.0 release is geo-distributed fast data, and we're pretty excited about the types of applications that v6 enables," said exec John Piekos in a blog post yesterday. "Our new geospatial SQL support, coupled with cross-datacenter replication, provides a transactional low latency/high throughput database foundation for today's emerging globally oriented fast data applications."

  • Paxata Inc., which provides an enterprise solution for interactive, self-service data preparation at scale with its Adaptive Data Preparation platform, announced its Winter '15 release.

    "Delivered on a flexible platform, the latest version provides business analysts with comprehensive data preparation capabilities and unparalleled governance and administrative controls," the company said in a statement yesterday. "Paxata's platform serves as the connected information layer within some of the world's largest, most complex infrastructures, whether used on-premises, in private clouds or in a company's proprietary cloud-based Data Preparation-as-a-Service (DPaaS) offering."

    Paxata said its product features security and a multi-tenant governance model that facilitates deployment in heterogeneous environments such as the Hortonworks Data Platform on YARN and with various flavors of Apache Spark. "The latest version also significantly improves how business analysts can find, access and apply data by delivering additional one-click capabilities powered by machine learning innovations," the company said.

  • Embarcadero Technologies Inc. announced its new ER/Studio 2016 data architecture solution features expanded data modeling.

    The company said the new product provides a combination of multi-platform data modeling, design and reporting along with cross-organizational team collaboration, targeting companies of all sizes, along with what it claims to be the first data modeling tool to represent Business Data Objects (BDOs) in data models.

    "With ER/Studio 2016, IT and business stakeholders gain increased visibility into corporate data assets through data model objects such as BDOs, centralized glossaries and terms and model image navigation, thereby improving data quality and consistency for better decision making across the enterprise," the company said in a statement. "Users can define and view BDOs, assign naming standards to models and sub-models and view interactive model images in ER/Studio Team Server, improving control of complex data environments."

    Embarcadero said new features include: improved organization of model entities; extensive platform support; and better usability.

    "Companies of all sizes need to collect, manage and maintain large amounts of data in data models and enterprise glossaries -- from both Big Data and relational database sources," said exec Ron Huizenga. "ER/Studio 2016 enables data professionals and business stakeholders to share and analyze essential models and metadata in complex relational and Big Data platforms, helping companies adapt to new technology platforms and collaborate with other stakeholders."

About the Author

David Ramel is an editor and writer for Converge360.