How much have cloud failures cost us?

Last week, the International Working Group on Cloud Computing Resiliency (IWGCR) released a study that calculated the impact of cloud outages on the economy. The research, which tracked downtime in the market since 2007, had some startling results: In the past five years, we’ve seen 568 hours of downtime at 13 well-known cloud services, which had an economic impact of more than $71.7 million dollars.

This also impacted the average time for unavailability in cloud services, bringing it to 7.5 hours per year. However, there is good news: Even with those numbers, the availability rate is still at 99.9 percent. But it’s obvious that gone are the days of the five nines: 99.999 percent uptime, seen as standard in the TelcoData world.

Needless to say, it’s time to take a look at what could be causing these issues, and what we can be doing to fix them. Below is my take on the problem. I’ve listed what I think are the main causes below, along with a corresponding list of what we can consciously be doing to prevent downtime.

Issues:

  • Complications within the cloud stack and internal applications
  •  Hardware failure
  •  DDoS
  •  Human error
  •  Improper change control process
  •  Switch and routing issues
  •  Lack of proper proactive DR testing and capacity planning

Best practices:

  • Test for cloud outages and develop a contingency and disaster recovery plan
  • Implement a high-availability strategy to build out redundancies — i.e. Test between availability zones, cloud environments, and data centers
  • Employ regular performance testing to ensure that performance remains consistent

Have any more questions about preventing outages? Feel free to reach out to me in an email.