What the AWS Outage Teaches Us

amazon data center interior mock up

On Wednesday, Amazon Web Services experienced an outage, and when AWS experiences an outage so do a lot of household names. What does this say about Cloud computing? In real terms it serves to underline Cloud as a critical infrastructure. It’s not going away and nor is the need for prudent IT management.

There is No 100%

We don’t agree with the commentators who railed against the Cloud in reaction to this news. What the recent outages for AWS, Google and Office 365 remind us is that in computing, 100% uptime is an impossible dream. Now, as was ever the case, you need a back-up plan. On the world’s trading floors, the banks we work with have largely adopted “Hybrid Cloud” models where some level of in-house computing provision is maintained for mission-critical services. An alternative would be to have a back-up Cloud provider. One or two banks were so far into building data centers when Cloud reached its tipping point that they added a distributed computing layer and became Cloud providers in their own right, while reserving enough capacity to support their own hybrid-models.

Five 9s uptime has typically been the trading floor technology mantra, especially in the age of high-frequency trading. The Cloud SLA offering is generally four 9s i.e. 99.99% uptime or you receive a service credit. That means you get a credit if there are more than 52 minutes and 35 seconds of unexpected downtime when the SLA is based on years. Most agreements have shorter periods to measure over. Still, there will be a lot of credits this year.

Brave New World

When it comes to downtime, people have short memories. Cloud offers huge benefits of scalability, including the ability to scale down as well as up. It is flexible and cost-efficient, there is no depreciating asset and in-house maintenance costs are reduced. While a Cloud outage is a major inconvenience, it is mild compared to the nightmare of a serious in-house data center fault. Even in terms of security, Cloud can now outperform in-house computing. The result is that Cloud adoption will tend to reduce both a company’s risk register and its total cost of operations, especially as subscription services, rather than generic computing, are increasingly available in all areas. Its adoption will only continue.

Amazon was a pioneer when it launched AWS back in 2006 and the Virginia data center grouping affected by this latest incident was among its oldest. Despite regular updates about recovery time objectives, the company remained secretive about the nature and cause of the problem. The September outage that took down Netflix was due to human error; the implication is that this one wasn’t. Amazon now has 45% global market share for Cloud with its closest rival Microsoft back at 19%. Incredibly, for the world’s dominant retailer, AWS generated 57% of Amazon’s profit last year.

Critical Cloud Infrastructure

The world has become very reliant on Cloud. It should be noted that Cloud itself still depends on the Internet service providers. This is another subject that has been acutely relevant for banks in 2020 as employees, including traders, have worked remotely over domestic ISPs.

Cloud and Internet must be considered critical infrastructure to the same extent as transport networks and the power grid. As with other utilities, it’s important to have policy in place that prioritizes the security of supply. It is uncomfortable for some when hugely profitable firms are seen to drop the ball in serving enterprise, public services and the wider population. The temptation is to compare existing services with the impossible 100% uptime dream rather than the more expensive, less dependable offerings that existed a decade ago. Maybe policy will be sharpened in the coming year when the focus finally moves away from the pandemic. Whatever happens, from our perspective, firms that have bet on the Cloud have backed the right horse.