News

Amazon DevOps Guru Release from AWS

Amazon Web Services (AWS) unveiled a slew of products and services during last week's re:Invent online conference, including its new Amazon DevOps Guru managed operations service.

DevOps Guru uses machine learning (ML) to detect operational issues and recommend specific actions for remediation automatically. It's designed to collect and analyze application metrics, logs, events, and traces to recognize behaviors that deviate from normal operating patterns--things like under-provisioned compute capacity, database I/O over-utilization, and memory leaks, among others.

When Amazon DevOps Guru identifies anomalous app behavior that could cause potential outages or service disruptions, the company says, it alerts developers with issue details, such as the resources involved, issue timeline, and related events. via Amazon Simple Notification Service (SNS) and partner integrations with companies like Atlassian's Opsgenie and PagerDuty to help them quickly understand the potential impact and likely causes of the issue with specific recommendations for remediation.

When the DevOps Guru service is deployed in a cloud environment, the company says, it can identify missing or misconfigured alarms to warn of approaching resource limits and code and config changes that might cause outages. Also DevOps Guru 'spotlights' things like under-provisioned compute capacity, database I/O overutilization, and memory leaks while recommending the remediating actions.

"Customers have asked us to continue adding services around areas where we can apply our own expertise on how to improve application availability and learn from the years of operational experience that we have acquired running Amazon.com," said Swami Sivasubramanian, VP of Amazon's Machine Learning group, in a statement. "With Amazon DevOps Guru, we have taken our experience and built specialized machine learning models that help customers detect, troubleshoot, and prevent operational issues while providing intelligent recommendations when issues do arise. This enables teams to immediately benefit from operational best practices Amazon has learned from running Amazon.com, saving customers the time and effort that would otherwise be spent configuring and managing multiple monitoring systems."

The DevOps Guru service not only analyzes system and app data to detect anomalies, but it also groups this data into 'operational insights' that include anomalous metrics, visualizations of application behavior over time, and recommendations on actions for remediation, the company says. The service also correlates and groups related application and infrastructure metrics, such as web app latency spikes, running out of disk space, bad code deployments, and memory leaks.

The result is reduced redundant alarms and help for users focusing on so-called high-severity issues. Users can see configuration change histories and deployment events, along with system and user activity, to generate a prioritized list of likely causes for an operational issue in the Amazon DevOps Guru console.

The service was also designed to provide intelligent recommendations with remediation steps and integration with AWS Systems Manager for runbook and collaboration tooling, which gives users the ability to more effectively maintain applications and manage infrastructure for their deployments.

Paired with Amazon CodeGuru, another ML-powered developer tool that provides intelligent recommendations and identifies an application's most expensive lines of code, Amazon DevOps Guru provides users with the automated benefits of ML for their operational data, so that developers can more easily improve application availability and reliability, the company said.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].