New Amazon Service Uses SQL To Query Streaming Big Data -- ADTmag

New Amazon Service Uses SQL To Query Streaming Big Data

By David Ramel
August 12, 2016

In the birth of the Big Data revolution, first there was Apache Hadoop, leveraging the batch-oriented MapReduce processing engine and scale-out NoSQL databases.

Ever since, the technology has been evolving, with an emphasis on incorporating streaming data into the mix, a need driven by the growing Internet of Things (IoT) spewing petabytes of data from networked devices. Streaming data leads to interactive, real-time analytics. Meanwhile, more SQL functionality was introduced (exemplified in SQL-on-Hadoop solutions), so developers and data scientists don't have to learn new query languages or programmatic querying via customized APIs. Along the way, managed services emerged to take care of many of the details involved in running in-house, on-premises Big Data processing frameworks.

These trends have converged to result in services such as Amazon Kinesis Analytics -- just announced by Amazon Web Services Inc. (AWS) -- which leverages standard SQL to query streaming data.

The reason for the new service is simple, according to a blog post published yesterday by AWS spokesperson Jeff Barr.

"We want you, whether you are a procedural developer, a data scientist, or a SQL developer, to be able to process voluminous clickstreams from Web applications, telemetry and sensor reports from connected devices, server logs, and more using a standard query language, all in real time!" Barr said.

"Today I am happy to be able to announce the availability of Amazon Kinesis Analytics," Barr continued. "You can now run continuous SQL queries against your streaming data, filtering, transforming and summarizing the data as it arrives. You can focus on processing the data and extracting business value from it instead of wasting your time on infrastructure. You can build a powerful, end-to-end stream processing pipeline in 5 minutes without having to write anything more complex than a SQL query."

The tool uses two other Kinesis components -- Firehose and Streams -- to provide real-time analysis via SQL queries. Firehose is used to automatically load streaming data into AWS services such as S3 (cloud storage), Redshift (data warehouse) and Amazon Elasticsearch Service (a search and analytics engine). Streams, meanwhile, is used to build custom applications to work with streaming data for a variety of needs.

"Being able to continuously query and gain insights from this information in real-time -- as it arrives -- can allow companies to respond more quickly to business and customer needs," AWS said in a statement. "However, existing data processing and analytics solutions aren't able to continuously process this 'fast moving' data, so customers have had to develop streaming data processing applications -- which can take months to build and fine-tune -- and invest in infrastructure to handle high-speed, high-volume data streams that might include tens of millions of events per hour."

Using Kinesis Analytics is done with a three-step workflow: configure an input stream from a console; write SQL queries with a built-in SQL editor and templates; and configure an output stream, specifying where you want the processed results to be loaded, such as the aforementioned S3, Redshift or Elasticsearch Service. Analytics tools can then be used to create alerts and respond to changing data, useful for IoT applications, for example. This can be done with the aid of built-in machine learning algorithms that provide stream processing functionality such as anomaly detection, top-K analysis and approximate distinct items, exposed as SQL functions.

Like other AWS services, Kinesis Analytics infrastructure can be scaled up and down as needed and users pay for what they use.

Along with IoT scenarios, Kinesis Analytics can be used for use cases such as serving up personalized content for Web surfers based on clickstream data, or the real-time placing of appropriate ads. The most common usage patterns, AWS said, are time-series analytics, real-time dashboards, and real-time alerts and notifications.

Barr provides a basic Kinesis Analytics example in his blog post, and Ryan Nienhuis provides more in-depth guidance in a blog post yesterday -- the first of a two-part series -- titled "Writing SQL on Streaming Data with Amazon Kinesis Analytics – Part 1."

That post demonstrates how Kinesis Analytics uses processing "windows" to control the records used by a query. These windows come in three types: tumbling, used for periodic reports to summarize data over time, for example; sliding, for monitoring or other kinds of trend detection; and custom, when the best grouping isn't based on time series.

"Previously, real-time stream data processing was only accessible to those with the technical skills to build and manage a complex application," Nienhuis concluded. "With Amazon Kinesis Analytics, anyone familiar with the ANSI SQL standard can build and deploy a stream data processing application in minutes.

"This application you just built provides a managed and elastic data processing pipeline using Analytics that calculates useful results over streaming data. Results are calculated as they arrive, and you can configure a destination to deliver them to a persistent store like Amazon S3."

Nienhuis promised that part two of his blog series will delve into more advanced stream processing concepts.

About the Author

David Ramel is an editor and writer at Converge 360.

Featured

AppTrends

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

VSLive! 2-Day Hands-On Training Seminar: Asynchronous and Parallel Programming in C#
June 24-25, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
July 15-18, 2025

Securing IT in the AI Era
July 23, 2025

VSLive! 4-Hour In-Depth Workshop: Immersive .NET Full Stack Training: C# Interfaces: Effective Usage while Avoiding Pitfalls
July 29, 2025

Visual Studio Live! @ Microsoft HQ
August 4-8, 2025

4-Hour VSLive! Workshop: Testability in .NET
August 27, 2025

Visual Studio Live! San Diego
September 8-12, 2025

Live! 360 2-Day Hands-On Seminar: Swimming in the Lakes of Microsoft Fabric and AI – A Hands-on Experience
September 18-19, 2025

VSLive! 2-Day Hands-On Training Seminar: Hands-On with .NET Web Development in 2025
October 7-8, 2025

Live! 360 Orlando
November 16-21, 2025

Artificial Intelligence Live! Orlando
November 16-21, 2025

Cloud & Containers Live! Orlando
November 16-21, 2025

Cybersecurity & Ransomware Live! Orlando
November 16-21, 2025

Data Platform Live! Orlando
November 16-21, 2025

Visual Studio Live! Orlando
November 16-21, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
December 16-19, 2025

Visual Studio Live! Las Vegas
March 16-20, 2026

Free White Papers

More Tech Library