Risky Office365 Update to Take 10 Months

According to reports on ZDnet, Microsoft will begin rolling out updates to Office 365 as early as this month.  The addition of new features will take 10 months, and should be completed by November 2013.

Included in the update:

  • Office 2013 Components
  • Updates to the Exchange Management Console
  • Office Web Apps will move to the latest versions
  • Lync will include multi-person video conferencing

For many users, however, they will need to upgrade desktops and on-premise servers.  An update to Exchange is needed to use the new management console features and the Office 2013 components require Windows 7 or Windows 8 on the desktop.

This upgrade is a test for Microsoft and its customers, triggering a series of webinars and meetings between Microsoft and its partners.  Under the prior BPOS service, Microsoft never provided an upgrade.  Rather, customers had to start over with Office365. With this upgrade, Microsoft is testing its ability to perform the same system upgrades its customers want to avoid on a massive, multi-tenant scale.

In comparison, Google rolled out Hangouts and Hangouts on Air to Google Apps for Business customers over a 3 week period with zero customer disruption.  The difference is that Google Apps is designed for innovation and enhancement.  The underlying architecture of Office365 relies on old models of complex, periodic service packs and upgrades applied to virtual servers on shared hardware.

An outdated architecture and higher costs of ownership and use — a winning combination for your business?


Friday Thought: All Outages are Not Equal

Last week Google Docs experienced an outage lasting about 30 minutes.  Almost immediately, the “reconsider the cloud” articles and blogs began to appear.   Articles like this one on Ars Technica, immediately lump the Google Docs outage with other cloud outages, including Amazon’s outage earlier this year and the on-going problems with Microsoft’s BPOS and Office365 services.

And well no outages are good, they are not all the same.  In most cases, the nature of the outages and their impact reflect the nature of the architecture and the service provider.

  • The Google Docs outage was caused by a memory error and was exposed by an update.  Google acknowledged the error and resolved the issue in under 45 minutes.
  • Amazon’s outage was a network failure that took an entire data center off-line.  Customer that signed up for redundancy were not impacted.
  • Microsoft’s flurry of outages, including a 6 hour outage that took Microsoft almost 90 minutes to fully acknowledge, appear to be related to DNS, load, and other operational issues.

Why is it important to understand the cause and nature of the outage?  With this understanding, you can provide rational comparisons between cloud and in-house systems and between vendors.

Every piece of software has bugs and some bugs are more serious than others.  Google’s architecture enables Google to roll forward and roll back changes rapidly across their entire infrastructure.  The fact that a problem was identified and corrected in under an hour is evidence of the effectiveness of their operations and architecture.

To compare Google to in-house systems, Microsoft releases bug fixes and updates monthly which generally require server reboots.  Depending on the size and use of each server (file/print, Exchange, etc), multiple reboots may be necessary and reboots can run well over an hour.  In the last two years, over 50% of all “patch Tuesday” releases have been followed up with updates, emergency patches, or hot-fixes with the recommendation of immediate action.  Fixing a bug in one of Microsoft’s releases can take from hours to days.  Comparatively, under an hour is not so shabby.

When looking across cloud vendors, the nature of the outage is also important.  Amazon customers that chose not to pay extra for redundancy knowingly assumed a small risk that their systems could become unavailable due to a large error or event.  Just like any IT decision, each business must make a cost/benefit analysis.

Customers should understand the level of redundancy provided with their service and the extra costs involved to ensure better availability.

The most troubling of the cloud outages are Microsoft’s.  Why?  Because the causes appear to relate to an inability to manage a high-volume, multi-tenant infrastructure.  Just like you cannot watch TV without electricity, you cannot run online services (or much of anything on a computer) without DNS.  That Microsoft continues to struggle with DNS, routing, and other operational issues leads me to believe that their infrastructure lacks the architecture and operating procedures to prove reliable.

Should cloud outages make us wary? Yes and no.  Yes to the extent that customers should understand what they are buying with a cloud solution — not just features and functions, but ecosystem.  No, to the extent that when put in perspective, cloud solutions are still generally proving more reliable and available than in-house systems.



Friday Thought: Is Microsoft Afraid of a Fair Fight?

I do not condemn Microsoft for promoting its cloud services.  Nor do I think they are wrong to compare their services to others, including those from Google.  Watching their marketing efforts, I do wonder if Microsoft is afraid of a fair fight.  Here is why …

In an effort to create viral support for Office365, Microsoft has produced several videos on YouTube.  These videos, attempt a humorous comparison of Office365 to other services.  This video, as an example, is making the rounds on IT discussion forums as it claims to compare Office365 and Google Apps.

Using Fear, Uncertainty, and Doubt (or “FUD”) is a time honored sales technique, which can be quite effective.  This video, however, is intentionally deceptive, comparing Office365 as a paid service against free versions of Gmail and Google Apps.  Microsoft’s claims about ads are false when looking at Google Apps for Business, for Education, and for Government, and Microsoft knows this.

Why would Microsoft blur a comparison between Office365 and Google Apps?

Why would Microsoft shy away from a fair comparison?

Google Apps for Business costs less than comparable Office365 capabilities

Google Apps integrates with Office 2003/2007/2010 for added features

  • Office365 requires Office 2010 licenses for full feature access

Google Apps has 1 pricing plan for each type of customer (business, government, education, non-profit)

  • Office365 has 11 pricing plans spread over 2 types of licenses; you cannot switch license types once you start using the service

Google Apps customers always receive the latest updates and versions, with incremental, scheduled releases every few weeks

  • Companies using Microsoft BPOS (based on Exchange & Sharepoint 2007) have no upgrade path to Office365 (based on Exchange & Sharepoint 2010), without starting over and a full data migration project

Google Apps is designed for 100% availability – 24 x 7 x 365 – and Achieved 99.984% Availability in 2010 (see here for more)

  • Office365 still requires scheduled and emergency maintenance windows that interrupt service to users
  • Less than 6 weeks after launch, Office365 had an Exchange outage effecting most users in North America for between 3 and 5 hours
  • In August 2010, Microsoft’s BPOS service in North America had more than 40 hours of scheduled and unscheduled down time

Google Apps was designed from the ground up to be a secure, reliable, multi-tenant, service in which all users have access to the latest features.

  • Office365 is a modified version of Microsoft’s “2010” generation of Exchange, Sharepoint, and other services
  • The technology dates back more than 3 years in development and was originally designed for use as in-house, single-tenant, servers
  • New features arrive months apart and only with service packs and upgrades

Looking at the current differences between Google Apps and Office365, I understand Microsoft’s marketing strategy.  Do you?

Tuesday Take-Away: The True Role of the SLA

As you look towards cloud solutions for more cost effective applications, infrastructure, or services, you are going to hear (and learn) a lot about Service Level Agreements, or SLAs.  Much of what you will hear is a big debate about the value of SLAs and what SLAs offer you, the customer.

Unfortunately, the some vendors are framing the value of their SLAs based on the compensation customers receive when the vendor fails to meet their service level commitments.  The best example of this attitude is Microsoft’s comparison of its cash payouts to Google’s SLA that provides free days of service.  Microsoft touts its cash refunds as a better response to failure.  Why any company would send out a marketing message that begins with “When we fail …” is beyond me.  But, that is a subject for another post someday.

That said, Microsoft and its customers that are comforted by the compensation, are totally missing the point of the SLA in the first place.  Any compensation for excessive downtime is irrelevant with respect to the actual cost and impact on your business.  And unless a vendor is failing miserably and often, the compensation itself is not going to change the vendor’s track record.

The true rule of the SLA is to communicate the vendor’s commitment to providing you with service that meets defined expectations for Performance, Availability, and Reliability (PAR).  The SLA should also communicate how the vendor defines and sets priorities for problems and how they will respond based on those priorities.  A good SLA will set expectations and define the method of measuring if those expectations are met.

Continuing with the Microsoft and Google example.  Microsoft sets an expectation that you will have downtime.  While the downtime is normally scheduled in advance, it may not be.  Google, in contrast, sets an expectation that you should have no downtime, ever.   The details follow.

Microsoft’s SLA is typical in that it excludes maintenance windows, periods of time the system will be unavailable for scheduled or emergency maintenance.  While Microsoft does not schedule these windows at a regular weekly or monthly time frame, they do promise to give you reasonable notice for maintenance windows.  The SLA, however, allows Microsoft to declare emergency maintenance windows with little or no maintenance.

In August 2010, Microsoft’s BPOS service had 6 emergency maintenance windows, totaling more than 10 hours, in response to customers losing connectivity to the service, along with 30 hours of scheduled maintenance windows.  In line with Microsoft’s SLA, customers experienced more than 40 hours of downtime that month, which is within the boundaries of the SLA and its expectations.  On August 17, 2011, Microsoft experienced a data center failure that resulted in loss of Exchange access for its Office365 customers in North America for as long a five hours.  The system was down for 90 minutes before Microsoft acknowledged this as an outage.

Google’s SLA sets and expectation for system availability 24x7x365, with no scheduled downtime for maintenance and no emergency maintenance windows.

The difference in SLAs sets a very different expectation and makes a statement about how each vendor builds, manages, and provides the services you pay for.

When comparing SLAs, understand the role of maintenance windows and other “exceptions” that give the vendor an out.  Also, look at the following.

  • Definitions for critical, important, normal, and low priority issues
  • Initial response times for issues based on priority level
  • Target time to repair for issues based on priority level
  • Methods of communicating system status and health
  • Methods of informing customers of issues and actions/results

Remember, if you need to use the compensation clause, your vendor has already failed.