Friday Thought: All Outages are Not Equal

Last week Google Docs experienced an outage lasting about 30 minutes.  Almost immediately, the “reconsider the cloud” articles and blogs began to appear.   Articles like this one on Ars Technica, immediately lump the Google Docs outage with other cloud outages, including Amazon’s outage earlier this year and the on-going problems with Microsoft’s BPOS and Office365 services.

And well no outages are good, they are not all the same.  In most cases, the nature of the outages and their impact reflect the nature of the architecture and the service provider.

  • The Google Docs outage was caused by a memory error and was exposed by an update.  Google acknowledged the error and resolved the issue in under 45 minutes.
  • Amazon’s outage was a network failure that took an entire data center off-line.  Customer that signed up for redundancy were not impacted.
  • Microsoft’s flurry of outages, including a 6 hour outage that took Microsoft almost 90 minutes to fully acknowledge, appear to be related to DNS, load, and other operational issues.

Why is it important to understand the cause and nature of the outage?  With this understanding, you can provide rational comparisons between cloud and in-house systems and between vendors.

Every piece of software has bugs and some bugs are more serious than others.  Google’s architecture enables Google to roll forward and roll back changes rapidly across their entire infrastructure.  The fact that a problem was identified and corrected in under an hour is evidence of the effectiveness of their operations and architecture.

To compare Google to in-house systems, Microsoft releases bug fixes and updates monthly which generally require server reboots.  Depending on the size and use of each server (file/print, Exchange, etc), multiple reboots may be necessary and reboots can run well over an hour.  In the last two years, over 50% of all “patch Tuesday” releases have been followed up with updates, emergency patches, or hot-fixes with the recommendation of immediate action.  Fixing a bug in one of Microsoft’s releases can take from hours to days.  Comparatively, under an hour is not so shabby.

When looking across cloud vendors, the nature of the outage is also important.  Amazon customers that chose not to pay extra for redundancy knowingly assumed a small risk that their systems could become unavailable due to a large error or event.  Just like any IT decision, each business must make a cost/benefit analysis.

Customers should understand the level of redundancy provided with their service and the extra costs involved to ensure better availability.

The most troubling of the cloud outages are Microsoft’s.  Why?  Because the causes appear to relate to an inability to manage a high-volume, multi-tenant infrastructure.  Just like you cannot watch TV without electricity, you cannot run online services (or much of anything on a computer) without DNS.  That Microsoft continues to struggle with DNS, routing, and other operational issues leads me to believe that their infrastructure lacks the architecture and operating procedures to prove reliable.

Should cloud outages make us wary? Yes and no.  Yes to the extent that customers should understand what they are buying with a cloud solution — not just features and functions, but ecosystem.  No, to the extent that when put in perspective, cloud solutions are still generally proving more reliable and available than in-house systems.



Friday Thought: Maybe the Backup Should Be The Primary

When Hurricane Irene seemed like a bigger threat to the Mid-Atlantic and Northeast, I started receiving emails with emergency contact information.  From non-profits I work with, organizations to which I belong, businesses I use, and event customers of Cumulus Global.  While some noted likely or planned closing, most were providing alternate means of communication “just in case” power outages caused their email server and phones to go down or be unreachable.

Every single one of these alternate emails ended in Go figure!  When businesses and non-profits need an email service that will be available during the storm and that can be accessed from phones and tablets as easily as from computers, they turn to Gmail.

In-house email servers are susceptible to power outages, Internet downtime, and other local or regional crises.  Gmail is not.  Gmail runs redundantly across many geographically dispersed data centers.  And while it is easy to seamlessly connect your iPhone, Android, or Blackberry, all you really need is an Internet connection and a browser.

For all of the organizations that went out of their way to tell me about their backup email service, the backup service is more reliable and effective than their in-house system.   Why then wouldn’t they switch?

I’m not talking about Gmail, either.  I’m talking about businesses and non-profits moving to Google Apps for Business and Google Apps for Education, respectively.

For 501c3 non-profits and schools, Google Apps of Education is free.  You get better service and save money.  And, we can help you migrate your data and your team.  Other non-profits are eligible for discounts, contact me and find out more.

For businesses, our Google Apps for Business packages, with end user support, start at less than the equivalent of $10 per user per month.

Think of the benefits of having your email on the most reliable, most accessible communication and collaboration platform available.  Think of your piece of mind know that your organization, its employees, its customers, and its constituents will be able to communicate without jumping through hoops.

Migration is quick and painless.  Email or call us toll free (866-356-1202).  Let’s discuss how we can help you.