Amazon Cloud Handles Data Lake Overhead -- ADTmag

Amazon Cloud Handles Data Lake Overhead

By David Ramel
August 9, 2019

Data lakes became the de-facto storage scheme for advanced analytics of Big Data as the movement gained traction and enterprises were faced with the problem of housing different types and formats of data to be gleaned for business insights.

Now, for customers of the Amazon Web Services Inc. (AWS) cloud, creating, setting up and managing data lakes is easier as AWS Lake Formation has graduated from preview to become generally available, promising to relieve some of the associated drudgery.

According to AWS: "A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data, and run different types of analytics -- from dashboards and visualizations to Big Data processing, real-time analytics, and machine learning to guide better decisions."

To reduce the complexity of creating a data lake and preparing data for analytics, the new service is designed to simplify and automate typically manual steps such as collecting, cleaning, and cataloging data.

"Customers can easily bring their data into a data lake from a variety of sources using pre-defined templates, automatically classify and prepare the data, and centrally define granular data access policies to govern access by the different groups within an organization," AWS said in a news release.

AWS broke down the process into three main steps:

Customers must clean and prepare the data -- including partitioning, indexing, and transforming the data -- to optimize the performance and cost that comes with running analytics on the data.
Then, they have to set up data access roles and enforce security policies across their storage and each of their different analytics engines, and update the security policies when permissions change or new end users are added.
And, finally, customers are required to make the data available in a secure way to their data analysts so that they can analyze and process the data using any of the available analytics engines.

"These steps require customers to perform a lot of manual work, and as a result, most customers can take up to several months to set up a data lake," AWS said.

Most customers use Amazon S3 buckets for data lake storage, and Lake Formation works with several other AWS services including Amazon Redshift (data warehouse), Amazon Athena (serverless interactive query service) and AWS Glue (extract, transform, and load [ETL] service). Support for Apache Spark analytics with Amazon EMR will follow over the next few months, along with Amazon QuickSight (business intelligence service) and Amazon SageMaker (machine learning platform) support.

AWS Lake Formation doesn't incur any extra charges beyond the AWS services used with it, and is initially available in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland) and Asia Pacific (Tokyo) regions.

More information is available in:

A new blog post that details how to create and set up a data lake with the new service
The "Data Lakes and Analytics on AWS" site
The "What Is a Data Lake?" article
A "Data Lake Foundation on AWS" quick start
The AWS Lake Formation site, which includes a FAQ and other resources

About the Author

David Ramel is an editor and writer at Converge 360.

Featured

AppTrends

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

VSLive! 4-Day Hands-On Training Seminar: Hands-on with Blazor
May 5-8, 2025

Cybersecurity & Ransomware Live! VirtCon 2025
May 13-15, 2025

VSLive! 4-Hour In-Depth Workshop: Deep Dive into ASP.NET Core Razor Pages
May 29, 2025

VSLive! 3-Day Hands-On Training Seminar: Master Modern JavaScript: Unlock the Full Potential of Your Code
June 2-4, 2025

VSLive! 2-Day Hands-On Training Seminar: Asynchronous and Parallel Programming in C#
June 24-25, 2025

4-Hour Hands-on Workshop: MCP Demystified
June 30, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
July 15-18, 2025

VSLive! 4-Hour In-Depth Workshop: Immersive .NET Full Stack Training: C# Interfaces: Effective Usage while Avoiding Pitfalls
July 29, 2025

Visual Studio Live! @ Microsoft HQ
August 4-8, 2025

4-Hour VSLive! Workshop: Testability in .NET
August 27, 2025

Visual Studio Live! San Diego
September 8-12, 2025

Live! 360 2-Day Hands-On Seminar: Swimming in the Lakes of Microsoft Fabric and AI – A Hands-on Experience
September 18-19, 2025

VSLive! 2-Day Hands-On Training Seminar: Hands-On with .NET Web Development in 2025
October 7-8, 2025

Live! 360 Orlando
November 16-21, 2025

Artificial Intelligence Live! Orlando
November 16-21, 2025

Cloud & Containers Live! Orlando
November 16-21, 2025

Cybersecurity & Ransomware Live! Orlando
November 16-21, 2025

Data Platform Live! Orlando
November 16-21, 2025

Visual Studio Live! Orlando
November 16-21, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
December 16-19, 2025

Visual Studio Live! Las Vegas
March 16-20, 2026

Free White Papers

More Tech Library