Posts

Microsoft Acknowledges Security Best Practice Failures


It was an easy post to miss in the run up to the Thanksgiving holiday.  On November 25, we posted the results of an Electronic Frontier Foundation (EFF) survey detailing how Microsoft fails to meet 4 out of 5 security best practices for its cloud service data centers and its customers’ data (Google and Dropbox were the only vendors surveyed that meet all 5 criteria).

This week, Microsoft acknowledged that not all customer data is encrypted in their data centers — at rest, or in transit within and between data centers.  In a ZDNet article dated December 5th, Chris Dunkett reports that Microsoft will not fully protect stored user data until the end 2014.

The article also quotes Brad Smith, Microsoft general counsel and executive vice president, legal and corporate affairs, stating that Microsoft will work “…with other companies across the industry to ensure that data traveling between services — from one email provider to another, for instance — is protected.”  Microsoft is acknowledging that they currently do not run STARTTLS services, and industry security best practice.

While Microsoft is actively positions itself as the “enterprise knowledgeable” competitor to a “consumer-centric” Google, pointing out how Microsoft runs its own large data centers. Once again, however, Microsoft fails to realize that the methods and practices used to run their own data centers do not translate to multi-tenant data centers hosting customer data.

 

Incompetence 16; Microsoft 0

 

Last week, Microsoft’s new Outlook.com service suffered its second major outage since its launch earlier this year.  The most recent outage, a 16 hour fiasco impacting Outlook.com, Hotmail, and SkyDrive users, was due to an botched firmware update resulting in overheating servers in one of its data centers.  As reported in PC World, the switch-over to alternate servers also failed.

This outage follows a 9 1/2 hour Outlook.com outage in February that Microsoft acknowledge on Twitter but neglected to not on its status dashboard.  February also saw a major Azure outage, caused when Microsoft failed to renew and install new SSL security certificates (a mistake they also made one year earlier).  In November, the Office 365 service was down for most of a day when Microsoft was unable to allocate adequate resources.

These strings of outages, all due to operational errors and architectural limitations, raise serious questions about Microsoft’s ability to manage a multi-tenant data center.

They also raise questions about the Microsoft’s integrity with respect to marketing and customer expectations.  While Microsoft promotes Office 365 and it’s other services as redundant, these outages demonstrate that service reliability is facility-dependent.

 

Microsoft Azure Fail! Will Customers Bail?

 

Once again, a flagship Microsoft cloud service blows through the Service Level Agreement like a blizzard through the Midwest.  Th February 22nd outage, impacting all Azure users worldwide, lasted more than 12 hours.

The culprit:  Microsoft failed to purchase and replace expiring SSL certificates.  In other words, Microsoft neglected to renew one of the most basic components that secure the Azure service.

As noted on RedmondMag.com

“Furious customers wanted to know how something as simple as renewing a SSL cert could fall through the cracks. Even worse, how could that become a single point of failure capable of bringing down the entire service throughout the world?”

Once again, an operational error puts thousands of customers  in the dark.  And this outage is one in a string of major service outages, including:

Microsoft described the issue as “A breakdown in our procedures”.  If not for the disruption and financial impact for thousands for companies, this statement might be considered almost comical.  Ironically, a different certificate error was behind a major Azure outage in February 2012.

To put this in perspective, how would you respond if your internal IT department had Microsoft’s track record of catastrophic failure?

 

It is difficult to trust that Microsoft has the operational maturity and rigor to design and manage multi-tenant, hosted services.  The Azure outage, and others like it, demonstrate immaturity, negligence, or incompetence.  Do the reasons matter given the frequency and impact?  With certificate outages on two subsequent annual renewal terms, it is hard to believe that Microsoft is learning from its mistakes.