Apache Hive Updated with SQL-on-Hadoop Features -- ADTmag

Apache Hive Updated with SQL-on-Hadoop Features

By David Ramel
April 22, 2014

Hortonworks Inc. yesterday announced a new version of Apache Hive, the open source data warehouse software running on top of Hadoop, with new SQL query features and performance improvements.

Hive, formerly a Hadoop subproject that graduated to top-level status of its own, provides the infrastructure to conduct Big Data analytics with a SQL-based query language called HiveQL.

Hive 0.13 was described as a "significant release" by Harish Butani in a blog post announcing the new release. He said more than 70 members of the project conducted a major effort to implement more than 1,000 features or fixes.

In addition to new SQL features and myriad other improvements, Hive 0.13 was released in coordination with Apache Tez 0.4, an interactive alternative to the oft-maligned batch-oriented MapReduce programming model used to extract business insights from Big Data stores.

Improvements in Hive 0.13 (source: Hortonworks blog post)

"With the delivery of Hive on Tez, users have the option of executing queries on Tez," Butani said. "Tez's dataflow model on a DAG of nodes facilitates simpler, more efficient query plans, which translates to significant performance improvements and interactive query on Hive/Hadoop."

One of the improvements to Tez 0.4 is better Windows support. "The community fixed bugs and made changes to Tez so that it runs as smoothly on Windows as it does on Linux," said Bikas Saha in announcing the new release. "We hope this will encourage adoption of Tez on Windows-based systems."

Also announced in conjunction with the releases of Hive 0.13 and Tez 0.4 was the completion of the Stinger Initiative, a community effort to improve Hive with up to 100x performance boosts at petabyte scale while using SQL semantics familiar to developers and users. Butani said, "These improvements extend Hive beyond its traditional roots and brings true interactive SQL query to Hadoop."

Hive 0.13 introduces SQL standard-based authorization, and the SQL language was extended to support grant and revoke on entities. Other authorization-related improvements include support for show roles, user privileges and active privileges. Gaps in authorization checks have been plugged with a reworked, pluggable authorization API.

Other SQL-related improvements include support features such as DECIMAL and CHAR types, permanent functions, common table expressions and many more.

Hive integrates with other tools via the popular JDBC interface, and version 0.13 improves on JDBC operations with support for job cancel and async execution.

"All of these Hive improvements mean that Hive 0.13 accepts a very large percentage of TPC-DS benchmark queries without rewrites," Butani said. TPC-DS is a decision support benchmark standard.

The development team improved its own efficiency by moving builds to the Apache Maven project management tool and creating a new parallel testing framework, a new project wiki and support for the Parquet file format.

Many other improvements were listed, and Butani said the huge revamp took a tremendous development effort. "Ultimately, over 145 developers representing 44 companies, from across the Apache Hive community, contributed over 390,000 lines of code to the project in just 13 months, nearly doubling the Hive codebase," he said.

See the following YouTube video for more information on Hive:

Introduction to Hive video

About the Author

David Ramel is an editor and writer at Converge 360.

Featured

AppTrends

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

Live! 360 2-Day Hands-On Seminar: From Traction to Production: Building Generative AI Applications with Azure AI Studio
March 25-26, 2025

VSLive! 4-Day Hands-On Training Seminar: Hands-on with Blazor
May 5-8, 2025

Cybersecurity & Ransomware Live! VirtCon 2025
May 13-15, 2025

VSLive! 3-Day Hands-On Training Seminar: Master Modern JavaScript: Unlock the Full Potential of Your Code
June 2-4, 2025

VSLive! 2-Day Hands-On Training Seminar: Asynchronous and Parallel Programming in C#
June 24-25, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
July 15-18, 2025

Visual Studio Live! @ Microsoft HQ
August 4-8, 2025

Visual Studio Live! San Diego
September 8-12, 2025

Live! 360 2-Day Hands-On Seminar: Swimming in the Lakes of Microsoft Fabric and AI – A Hands-on Experience
September 18-19, 2025

Live! 360 Orlando
November 16-21, 2025

Artificial Intelligence Live! Orlando
November 16-21, 2025

Cloud & Containers Live! Orlando
November 16-21, 2025

Cybersecurity & Ransomware Live! Orlando
November 16-21, 2025

Data Platform Live! Orlando
November 16-21, 2025

Visual Studio Live! Orlando
November 16-21, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
December 16-19, 2025

Free White Papers

More Tech Library