News

Databricks Open Sources Project Aimed at Data Lake Reliability

San Francisco, Calif.-based Databricks, original creators of Apache Spark, today announced the release of Delta Lake, an open source solution designed to provide "reliability for both batch and streaming data" for data lakes.

Data lakes are large repositories of storage, often used by enterprises, that store the data in its "raw" or "natural" format in a flat structure -- unlike data warehouses, which are generally hierarchical and store data using folders or files -- with each item tagged with a unique identifier and metadata. The data can then be pulled by a variety of uses, whether data mining applications, machine learning, analytics or something else.

According to Databricks, while the architecture of data lakes offers enterprises benefits, reliability often isn't one of them. "Data reliability challenges derive from failed writes, schema mismatches and data inconsistencies when mixing batch and streaming data, and supporting multiple writers and readers simultaneously," the company explained in its announcement of Delta Lake.

Databricks said that Delta Lake offers better reliability "by managing transactions across streaming and batch data and across multiple simultaneous readers and writers."

"Delta Lakes can be easily plugged into any Apache Spark job as a data source, enabling organizations to gain data reliability with minimal change to their data architectures," the company continued. "Organizations no longer need to spend resources building complex and fragile data pipelines to move data across systems. Instead, developers can have hundreds of applications reliably upload and query data at scale."

More information and Delta Lake code is now available for download here.

About the Author

Becky Nagel serves as vice president of AI for 1105 Media specializing in developing media, events and training for companies around AI and generative AI technology. She also regularly writes and reports on AI news, and is the founding editor of PureAI.com. She's the author of "ChatGPT Prompt 101 Guide for Business Users" and other popular AI resources with a real-world business perspective. She regularly speaks, writes and develops content around AI, generative AI and other business tech. Find her on X/Twitter @beckynagel.