Amazon on Friday issued a detailed analysis and apology on last week’s massive crash of its cloud service, an event that brought down dozens of websites.
The disruption to Amazon (AMZN, Fortune 500) Web Service’s Elastic Compute Cloud, or EC2, limited customers’ access to much of the information that was stored in the company’s East Coast regional data centers. About 75 sites crashed because of the outage.
Until now, Amazon had stayed relatively silent about the cause. But after completing a post-mortem assessment of the mess, the company issued a technically detailed, 5,700-word explanation of what went wrong.
The event — the first prolonged, widespread outage EC2 has suffered since launching five years ago — was a technical perfect storm. A mistake made by Amazon’s engineers triggered a cascade of other bugs and glitches.
“As with any complicated operational issue, this one was caused by several root causes interacting with one another,” Amazon wrote.
On April 21, AWS tried to upgrade capacity in one storage section of its regional network in Northern Virginia. That section is called an “availability zone.” There are multiple availability zones in each region, with information spread across several zones in order to protect against data loss or downtime.
To Read More: Click Here