Datameer Adds Governance to Tame 'Wild West' Hadoop

The Apache Hadoop/Big Data ecosystem has been too much like the "Wild West," according to Datameer Inc., with different vendors doing their own things and producing many different solutions, but with too little overall concern for data governance and security protocols.

Thus the company today announced new data governance capabilities in its native Hadoop offering.

Datameer aims to address the increasing complexity of enterprise Big Data analytics, with different departments and workers developing separate data pipelines involving multiple data sources. This has led to a growing number of use cases, many of which involve sensitive data or require compliance measures to meet different regulations.

In this environment, the company said, it's important to recognize that features such as data quality, consistency, access policies, standards, security, privacy, regulatory compliance and retention should be "must have" capabilities throughout enterprise data initiatives.

"Today we're excited to announce that Big Data Governance is also becoming a part of the Datameer offering," Datameer exec Andrew Brust said in a blog post. "Features supporting audit, lineage, impact analysis, security and versioning are being added to Datameer, as is a listener-based REST API, facilitating the integration of these features with external governance tools and frameworks that exist today, and those which may emerge within the Big Data ecosystem in the future."

The governance capabilities are supplied through a new premium module that can be added to a Datameer Big Data environment, giving enterprises transparency into their data pipelines and providing IT with tools to conduct compliance audits to ensure internal and external regulations are met.

"Hadoop has been seen as the Wild West in which vendors have been developing different products for the ecosystem without really thinking about data governance and sophisticated security protocols," said CEO Stefan Groschupf. "With these new features we're driving home the point that we're serious about helping enterprises transform their business into data-driven organizations."

In the area of data security and privacy, Datameer said it provides measures that go beyond the intrinsic capabilities of the Hadoop Distributed File System (HDFS), such as Lightweight Directory Access Protocol (LDAP)/Active Directory integration, access control based on user roles, permissions, sharing, Apache Sentry 1.4 integration and column and row security/anonymization functionality.

For quality and consistency, Datameer provides data profiling, monitoring of data statistics, management of metadata and impact analysis. "Datameer's data profiling tools enable you to check and remediate issues like dirty, inconsistent or invalid data at any stage in a complex analytics pipeline, and provides transparency into every change, from the original dataset all the way through to the final visualization," the company said.

The company also listed new functionality in the areas of data policies and standards, regulatory compliance, and retention and archiving.

"The world of Big Data, which includes Hadoop, needs to take data governance more seriously in order to become ready for enterprise-grade deployments," the company quoted Enterprise Management Associates exec John L. Myers as saying. "As more technologies join next-generation data management environments, open architectures such as Datameer's are going to be critical in meeting both internal and external data governance requirements to make those solutions enterprise ready."

About the Author

David Ramel is an editor and writer for Converge360.