Hortonworks Improves Hadoop Developer Experience
- By David Ramel
- July 24, 2015
Hortonworks Inc. highlighted an improved experience for Big Data developers in its new open enterprise Apache Hadoop solution, released to general availability this week.
While listing features designed for Hadoop operators, the company also noted several benefits of special import to coders in the Hortonworks Data Platform (HDP) 2.3. Functionality around the SQL query language received special attention, the company said.
"While we care very much about making the lives of Hadoop operators easier, what are we doing about for folks who write SQL queries, develop Pig Script, and develop data pipelines?," the company said in discussing the preview release last month. "How does HDP 2.3 make their lives easier?"
Hortonworks said it worked with developers to enhance the platform for them, finding out what tools they currently used and what features they were requesting.
"The initial focus was on the SQL developer and looking at the most common tasks they perform," the company said. It came up with an integrated experience designed to:
- Build SQL queries.
- Provide a visual "explain plan."
- Allow an extended debugging experience when using the Tez execution engine.
The SQL-related improvements also include enhanced semantics for working with Apache Hive, a data warehouse infrastructure for Hadoop. "Hive adds time intervals and UNION semantics, 2.5x performance improvements and improved query scheduling, along with a more streamlined user interface for Hive within Ambari," the company said.
Speaking of Amabari -- a project for provisioning, managing and monitoring Hadoop clusters -- Hortonworks also announced version 2.1 of that component was being made generally available. "Aside from delivering a breakthrough configuration and customization experience, Ambari 2.1 includes support for installing, managing and monitoring Apache Accumulo and Apache Atlas, along with expanded high-availability support for Apache Ranger and Apache Storm," the company said.
HDP 2.3 also brings a modern browser-based IDE approach to using Apache Pig -- a scripting platform used to process and analyze large data sets -- with a new Pig Latin Editor. Also, a new File Browser was added to work with the Hadoop Distributed File System (HDFS).
For working with Apache Falcon -- a feed processing and feed management system -- a new web-forms approach was implemented to speed development.
"The new Falcon UI also allows you to search and browse processes that have executed, visualize lineage and setup mirroring jobs to replicate files and databases between clusters or to cloud storage such as Microsoft Azure Storage," the company said on its "what's new" page for HDP 2.3.
New features designed for operators, as opposed to developers, include "smart configuration" functionality for HDFS, YARN, HBase and Hive, along with a YARN capacity scheduler and customized dashboards. YARN, along with HDFS, is a core component of Hadoop, providing the architectural backbone for various data processing engines. HBase is an open-source, distributed, non-relational database.
To further tune HDP 2.3 for enterprise use, Hortonworks also added security and governance enhancements, along with proactive support provided by Hortonworks SmartSense.
Company exec Tim Hall highlighted the give-and-take with the overall open source Hadoop developer community in developing its commercial distribution. "Literally, hundreds of developers have been collaborating with us to evolve each of the individual Apache Software Foundation (ASF) projects from the broader Apache Hadoop ecosystem," Hall said in a blog post this week. "The various project teams have coalesced these new facets into a comprehensive and open HDP, delivering both new features and closing out a wide variety of issues across Apache Hadoop and its related projects."
Hall listed those myriad projects in a graphic:
David Ramel is an editor and writer for Converge360.