Hortonworks Starts Hadoop Summit with Data Platform Update

With the Hadoop Summit kicking off today in San Jose, Calif., host Hortonworks Inc. announced several updates to its Big Data platform, including improved security, easier Spark analytics and developer productivity advancements.

Those updates and more are featured in the latest release of the Hortonworks Data Platform (HDP) 2.5, a suite of fully open source Big Data component technologies.

A rising star in that ecosystem, Apache Spark, is easier to use in the enterprise with the new HDP 2.5, Hortonworks said.

"Analysts and data scientists can now use Apache Zeppelin, a Web-based notebook, for interactive data analytics and the creation of beautiful data-driven, interactive and collaborative documents with SQL, Scala, Python and more," the company said in a statement today. "HDP 2.5 also includes a technical preview of the latest version of Apache Spark from the community, which makes Spark dramatically easier, faster and smarter."

Hortonworks exec Tim Hall, in a blog post today, provided more information on the new functionality. He said the Hortonworks team learned Spark could benefit from some additional tools to help with data visualization and exploration to glean business insights.

The Hortonworks Data Platform
[Click on image for larger view.] The Hortonworks Data Platform (source: Hortonworks)

"To address the ease of use of Spark through visual tools, Hortonworks began working with the team from NFLabs within the Apache Zeppelin community in 2015," Hall said. "Zeppelin addresses use cases like data exploration, data discovery, and interactive code snippets while providing built-in visualization. We believe that Zeppelin has the potential to modern data science studio and it is particularly powerful in the context of Spark. The work we have focused on within the community has centered around enterprise readiness with a particular focus on security. Zeppelin now runs on a secure cluster and has basic authentication and authorization capabilities that allow it to be used within the enterprise."

What's New in HDP 2.5
[Click on image for larger view.] What's New in HDP 2.5 (source: Hortonworks)

Under the "developer productivity" category, Hortonworks said the HDP 2.5 includes a new Apache Phoenix Query Server that benefits developers using Apache Phoenix with HBase. Phoenix is a relational database engine supporting OLTP for Hadoop that uses Apache HBase as its backing data store. With the addition of the query server, developers now have a broader choice of development languages to use in querying HBase.

"In addition, Apache Storm allows for large scale deployments with advanced capabilities to address real-time stream processing such as native heartbeat support and automatic back pressure," Hortonworks said. "Further developer productivity advancements include new connectors for search and NoSQL databases. Storm also has streamlined operations for resource aware scheduling through Ambari views."

Yet more Apache components -- Apache Atlas and Apache Ranger -- provide the security and governance enhancements.

"HDP 2.5 delivers enterprise-ready features including the unique integration of industry-leading comprehensive security and trusted data governance to define and implement dynamic classification-based security policies," Hortonworks said. "Enterprises can use Apache Atlas to classify and assign metadata tags, which are then enforced through Apache Ranger to enable various access policies. In addition, Atlas now provides cross-component lineage."

Hall again provided more details. "By integrating Atlas with Ranger enterprises can now implement dynamic classification-based security policies, in addition to role-based security," he said. "Ranger's centralized platform empowers data administrators to define security policy based on Atlas metadata tags or attributes and apply this policy in real-time to the entire hierarchy of assets including databases, tables and columns, thereby preventing violations from occurring. Ranger also allows for location, time, and other dynamic policies to be defined."

In other news released at the summit, Hortonworks: named Microsoft Azure HDInsight as its Premier Connected Data Platforms cloud solution; announced a reseller partnership with business intelligence (BI) on Hadoop specialist AtScale; expanded its Partnerworks program with new Managed Service Providers and ISV/IHV partners; and announced the formation of "a new consortium to define and develop an open source genomics platform to accelerate genomics-based precision medicine in research and clinical care."

The summit, co-hosted by Yahoo, runs through Thursday.

About the Author

David Ramel is an editor and writer for Converge360.