Amazon Simplifies Big Data Queries of Cloud Data with SQL
- By David Ramel
- November 30, 2016
The new Amazon Athena tool from Amazon Web Services Inc.(AWS) enables serverless queries of large amounts of data stored in Amazon Simple Storage Service (Amazon S3), obviating the need to spin up Hadoop clusters or set up data warehouses.
Amazon Athena was unveiled today at the company's re:Invent 2016 conference.
In a news release issued today, the company said, "With a few clicks in the AWS Management Console, customers can point Amazon Athena at their data stored in Amazon S3 and begin using standard SQL to run queries and get results in seconds. With Amazon Athena there are no clusters to manage and tune, no infrastructure to setup or manage, and customers pay only for the queries they run."
Athena promises to simplify the process of querying petabyte-scale data stored in standard data formats such as CSV, log files, JSON, Apache ORC and Apache Parquet.
Although it eliminates the need for using standard Big Data tools primarily found in the open source Apache Software Foundation ecosystem such as Hadoop, Spark, Hive and Pig, its underlying architecture is based on another Apache project, the Presto distributed SQL engine.
"Athena includes an interactive query editor to help get you going as quickly as possible," AWS spokesperson Jeff Barr said in a blog post today. "Your queries are expressed in standard ANSI SQL and can use JOINs, window functions, and other advanced features.
"You can run your queries from the AWS Management Console or from a SQL client such as SQL Workbench, and you can use Amazon QuickSight to visualize your data. You can also download and use the Athena JDBC driver and run queries from your favorite Business Intelligence tool."
An AWS FAQ explains more about the product, including what use cases are best suited for Athena as opposed to other Big Data services such as the Amazon Redshift data warehouse or more sophisticated data processing frameworks such as Amazon EMR.
"The announcement of AWS Athena provides further validation that the demand for data processing in the cloud has been exploding," said Bob Muglia, CEO of Snowflake Computing. "In fact, data processing and analytics has become one of the most important workloads in the cloud. Customers are demanding fast, flexible, and easy ways to store, access and analyze data in the cloud to drive business results."
Barr said Athena is now available only in the US East (Northern Virginia) and US West (Oregon) regions, but in the coming months will become available in other regions.
Furthermore, he said, "You pay only for the queries that you run; you are charged based on the amount of data scanned by each query (the console will display this information after each query). This means that you can realize significant cost savings by compressing, partitioning, or converting your data to a columnar format."
About the Author
David Ramel is an editor and writer for Converge360.