New Projects Fill in Big Data Gaps -- ADTmag

New Projects Fill in Big Data Gaps

By David Ramel
March 10, 2015

As some original Apache Hadoop projects mature and graduate to commercial stewardship for further refinement, the open source Apache Software Foundation (ASF) and private companies are continually incubating and launching new open source projects to fill in the gaps in Big Data analytics.

The ASF yesterday announced a new version of a Big Data warehousing solution providing increased "SQL-on-Hadoop" functionality: Apache Tajo.

"Apache Tajo is used for low-latency and scalable ad hoc queries, online aggregation and extract-transform-load process (ETL) on large data sets stored on the Hadoop Distributed File System (HDFS) and other data sources," the ASF said. "By supporting SQL standards and leveraging advanced database techniques, Tajo allows direct control of distributed execution and data flow across a variety of query evaluation strategies and optimization opportunities."

The SQL-on-Hadoop movement is a primary driver in the maturing Big Data ecosystem, as it expands from its NoSQL roots to be more inclusive and accessible.

Some of the key new features coming in Tajo v0.10.0 listed by the ASF include:

Oracle and PostgreSQL catalog store support.
Direct JSON file support.
HBase storage integration (allowing users to directly access HBase tables through Tajo).
Improved JDBC driver for easier use of JDBC applications.
Improved Amazon S3 support.

Early this year, the ASF announced Apache Flink had graduated to a top-level project, providing a system for "expressive, declarative, and efficient batch and streaming data processing and analysis."

"Apache Flink is an open source distributed data analysis engine for batch and streaming data," the ASF said. "It offers programming APIs in Java and Scala, as well as specialized APIs for graph processing, with more libraries in the making."

Also updated to a top-level project in January was Apache Falcon, "an open Source Big Data processing and management solution for Apache Hadoop in use at Hortonworks, InMobi and Talend, among others." The project addresses data motion, data pipeline coordination, lifecycle management and data discovery, the ASF said.

The ASF also recently updated other key Big Data components, including Apache HBase, which last month graduated to v1.0, featuring a comprehensive API reorganization.

The private sector has also been releasing Big Data project updates. Last month, MapR Technologies Inc. -- one of the "big three" commercial vendors of Hadoop-based distributions -- teamed up with Mesoshpere Inc. for a new Big Data framework called Myriad.

"Today, Mesosphere and MapR are proud to announce project Myriad, an open source framework for running YARN on Mesos that integrates the two major powerhouses in the datacenter -- Mesos and Hadoop -- and makes them fully compatible technologies," Mesosphere said.

Mesosphere -- which provides the Mesosphere Datacenter Operating System (DCOS) for managing large-scale datacenter and cloud resources -- said the project started out under the direction of the Apache Mesos project, but was to be submitted to the Apache incubator program to hopefully attain independent status.

"Apache Mesos abstracts CPU, memory, storage and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively," the project's Web site states.

MapR further expounded on the Myriad project. "Based on an open source and collaborative development effort between MapR, Mesosphere and eBay, Myriad is an open source project built on the vision of consolidating Big Data with other workloads in the datacenter into a single pool of resources for greater utilization and operational efficiency," the company said.

With last month's announcement of an Open Data Platform (ODP), another organization will be helping to fill in the gaps to provide missing Big Data analytics functionality. "The ODP will promote Big Data technologies based on open source software from the Apache Hadoop ecosystem and optimize testing among and across the ecosystem's vendors," cofounder Pivotal Software Inc. said in a news release. "These efforts will accelerate the ability of enterprises to build or implement data-driven applications."

About the Author

David Ramel is an editor and writer at Converge 360.

Featured

AppTrends

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

VSLive! 2-Day Hands-On Training Seminar: Asynchronous and Parallel Programming in C#
June 24-25, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
July 15-18, 2025

Securing IT in the AI Era
July 23, 2025

VSLive! 4-Hour In-Depth Workshop: Immersive .NET Full Stack Training: C# Interfaces: Effective Usage while Avoiding Pitfalls
July 29, 2025

Visual Studio Live! @ Microsoft HQ
August 4-8, 2025

4-Hour VSLive! Workshop: Testability in .NET
August 27, 2025

Visual Studio Live! San Diego
September 8-12, 2025

Live! 360 2-Day Hands-On Seminar: Swimming in the Lakes of Microsoft Fabric and AI – A Hands-on Experience
September 18-19, 2025

VSLive! 2-Day Hands-On Training Seminar: Hands-On with .NET Web Development in 2025
October 7-8, 2025

Live! 360 Orlando
November 16-21, 2025

Artificial Intelligence Live! Orlando
November 16-21, 2025

Cloud & Containers Live! Orlando
November 16-21, 2025

Cybersecurity & Ransomware Live! Orlando
November 16-21, 2025

Data Platform Live! Orlando
November 16-21, 2025

Visual Studio Live! Orlando
November 16-21, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
December 16-19, 2025

Visual Studio Live! Las Vegas
March 16-20, 2026

Free White Papers

More Tech Library