News

Big Data Product Watch 8/4/17: Open Source Visualization, Cloud Reference Architecture, More

Here's an update on recent happenings in the world of Big Data analytics, featuring a new open source visualization tool, reference architecture, integrator developer's edition and more.

  • Google last month released Facets, an open source visualization tool for working with machine learning training data.

    The tool was released in conjunction with the company's PAIR (People+AI Research Initiative) for "making people + AI partnerships productive, enjoyable and fair."

    "Getting the best results out of a machine learning (ML) model requires that you truly understand your data," Google said in a post last month. "However, ML datasets can contain hundreds of millions of data points, each consisting of hundreds (or even thousands) of features, making it nearly impossible to understand an entire dataset in an intuitive fashion. Visualization can help unlock nuances and insights in large datasets. A picture may be worth a thousand words, but an interactive visualization can be worth even more."

    Facets in Action
    [Click on image for larger, animated GIF.] Facets in Action (source: Google)

    The GitHub project indicates Facets includes two visualizations for understanding and analyzing machine learning datasets called Facets Overview and Facets Dive.

    "The visualizations are implemented as Polymer Web components, backed by Typescript code and can be easily embedded into Jupyter notebooks or webpages," it says.

  • The Cloud Standards Customer Council yesterday announced a new paper describing reference-architectural elements and components for building out Big Data and analytics solutions in the cloud.

    The Cloud Customer Architecture for Big Data and Analytics paper covers:

    • Business reasons to adopt cloud computing for Big Data and analytics capabilities
    • Proven architecture patterns that have been deployed in successful enterprise Big Data analytics projects
    • An architectural overview of a Big Data analytics solution in a cloud environment with a description of the capabilities offered by cloud providers

    "Big Data analytics (BDA) and cloud computing are a top priority for CIOs," the organization said. "Many companies are experimenting with different cloud configurations to understand and refine requirements for their BDA solutions. The volume of Big Data streams has increased exponentially in the past several years and the majority of the streams are now hosted in the cloud. This offers a cost-effective delivery model for cloud-based analytics. Version 2.0 of the reference architecture has been expanded to support more BDA use cases, including cognitive computing."

  • Information Builders on Wednesday released a Developer's Edition of its iWay Big Data Integrator.

    "This limited edition enables developers to test the features of Hadoop and Apache Spark for their Big Data integration projects without the need for extensive knowledge or training in Hadoop, and accelerate time to value for Big Data projects," said the company, which says it specializes in business intelligence (BI) and analytics, data integrity, and integration solutions.

    The iWay Big Data Integrator is said to ease the creation, management, and use of data lakes on the Hadoop framework, providing a modern, native approach to data integration on Hadoop through management functionality designed to ensure high levels of capability, compatibility and flexibility.

  • Highlights of the new offering were listed as:
    • Simplified ingestion, replacement, and de-duplication of data sets using pipelines or native Spark, Flume support, and other techniques -- all without programing.
    • Native Hadoop performance and resource negotiation by running under YARN's cluster management technology.
    • Support for a variety of big data integration use cases, including structured, defined, and streaming sources in batch and real-time via Spark and Hadoop.
    • Spark data operations allow organizations to shape data, while Hive and HQL transformations enable developers to modify or join tables.
    • Improved security and encryption, with sophisticated process management and governance to the Hadoop ecosystem.

    The Developer's Edition is available in a six-month trial.

About the Author

David Ramel is an editor and writer at Converge 360.