Databricks Open Sources Project Aimed at Data Lake Reliability
- By Becky Nagel
- April 24, 2019
San Francisco, Calif.-based Databricks, original creators of Apache Spark, today announced the release of Delta Lake, an open source solution designed to provide "reliability for both batch and streaming data" for data lakes.
Data lakes are large repositories of storage, often used by enterprises, that store the data in its "raw" or "natural" format in a flat structure -- unlike data warehouses, which are generally hierarchical and store data using folders or files -- with each item tagged with a unique identifier and metadata. The data can then be pulled by a variety of uses, whether data mining applications, machine learning, analytics or something else.
According to Databricks, while the architecture of data lakes offers enterprises benefits, reliability often isn't one of them. "Data reliability challenges derive from failed writes, schema mismatches and data inconsistencies when mixing batch and streaming data, and supporting multiple writers and readers simultaneously," the company explained in its announcement of Delta Lake.
Databricks said that Delta Lake offers better reliability "by managing transactions across streaming and batch data and across multiple simultaneous readers and writers."
"Delta Lakes can be easily plugged into any Apache Spark job as a data source, enabling organizations to gain data reliability with minimal change to their data architectures," the company continued. "Organizations no longer need to spend resources building complex and fragile data pipelines to move data across systems. Instead, developers can have hundreds of applications reliably upload and query data at scale."
More information and Delta Lake code is now available for download here.
Becky Nagel is the vice president of Web & Digital Strategy for 1105's Converge360 Group, where she oversees the front-end Web team and deals with all aspects of digital projects at the company, including launching and running the group's popular virtual summit and Coffee talk series . She an experienced tech journalist (20 years), and before her current position, was the editorial director of the group's sites. A few years ago she gave a talk at a leading technical publishers conference about how changes in Web browser technology would impact online advertising for publishers. Follow her on twitter @beckynagel.