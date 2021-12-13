Chaos broke out last Tuesday (7) and humanity reached the edge of its annihilation when services such as Amazon, Disney Plus, ‘League of Legends’, ‘PUBG’, Canva, Prime Video and national services such as C6 Bank and iFood experienced instability and outages. And according to Amazon Web Services (AWS), the cause was an overhead in an automated process on their network.

The explanation came in a report published on the company’s own website. “An automated activity to scale the capacity of one of the AWS services hosted on the AWS core network has triggered unexpected behavior from a large number of clients within the internal network,” says Amazon.

This behavior resulted in a large “spurt of connection activity” that overloaded network devices between the internal network and the main AWS network, “resulting in delays in communication between these networks” according to the report. The issue even affected Amazon’s ability to see what exactly was wrong with the system and delayed the repair by seven hours.

Several services started having issues at the same time as Amazon Web Services: Image: DownDetector/Playback

As Amazon’s Support Contact Center also runs on the AWS network, customers were unable to contact the company during the outage. Amazon’s Service Health dashboard, which the platform uses to provide status updates, was also impacted, resulting in delayed recognition of the issue.

AWS warrants that it is working on a way to improve its response to outages and plans to release a revamped version of the Service Health Dashboard that should help customers receive timely updates should an outage occur. “We want to apologize for the impact this event has had on our customers. While we are proud of our history of availability, we know how critical our services are to our customers, their applications and end users, and their business. We know that this event impacted many customers in significant ways. We will do everything possible to learn from this event and use it to further improve our availability”, concludes the report.

This is not the first time AWS has had an error

Amazon Web Services already experienced a failure of this magnitude in November 2020, causing online services to decline. At the time, Amazon pointed out that the instability mainly hit the Kinesis Data Stream API, causing, as a consequence, several resources that depend on it to fail.

As of the latest release, this includes ACM, Amplify Console, API Gateway, AppMesh, AppStream2, AppSync, Athena, AutoScaling, Batch, CloudFormation, CloudTrail, CloudWatch, Cognito, Connect, DynamoDB, EventBridge, IoT Services, Lambda, LEX, Managed Blockchain, Marketplace, Personalize, Resource Groups, SageMaker, Support Console, Well Architected and Workspaces.

