Posts

Where is Your Cloud Bandwidth Bottleneck?

When speaking with companies and schools about moving to cloud solutions like Google Apps and Google Cloud Storage, we are often asked about bandwidth demands.  Many organizations worry that their current Internet connections are not sufficient for cloud computing.  While most organizations already have more than enough Internet bandwidth, they may still have performance bottlenecks from their internal network.

Many small and mid-size enterprises make infrastructure decisions, electing to save money with consumer grade and so-called “SMB” products.  In many instances, these products are not designed to handle business traffic.

WiFi Access Points: Low-end WiFi Access Points, or WAPs, are not designed for a large number, or large traffic, connections.  While these devices claim they can support dozens of devices, the reality is that their antennae systems, channel management, and software are not up  to the task.  These devices can bog down with collisions, reducing the effective bandwidth to near zero with as few as 5 or 10 active users.

Switches and Hubs: The same load considerations exist for low-end switches and hubs, particularly those with slower back-planes and less memory.  Traffic bursts can overload these devices, creating “collisions” that slow down your network.

Routers: Many entry level and SMB routers do not have the processor or back-plane speed needed to meet the traffic demands for today’s network.  The router between your network and the Internet needs to be fast, with the ability to buffer traffic, and provide network services.  While changing to cloud solutions may not dramatically alter the amount of traffic, it changes the pattern.  An underpowered router can slow traffic like a broken toll booth gate.

For most small and mid-size businesses, network performance planning for cloud solutions should start at the ends and work towards the middle.  Look at your Direct Internet Access capacity and your WiFi and move inwards to the router, hubs, and switches.  A well planned network will improve performance, reliability, and productivity.

 

 

Tuesday Take-Away: The True Role of the SLA

As you look towards cloud solutions for more cost effective applications, infrastructure, or services, you are going to hear (and learn) a lot about Service Level Agreements, or SLAs.  Much of what you will hear is a big debate about the value of SLAs and what SLAs offer you, the customer.

Unfortunately, the some vendors are framing the value of their SLAs based on the compensation customers receive when the vendor fails to meet their service level commitments.  The best example of this attitude is Microsoft’s comparison of its cash payouts to Google’s SLA that provides free days of service.  Microsoft touts its cash refunds as a better response to failure.  Why any company would send out a marketing message that begins with “When we fail …” is beyond me.  But, that is a subject for another post someday.

That said, Microsoft and its customers that are comforted by the compensation, are totally missing the point of the SLA in the first place.  Any compensation for excessive downtime is irrelevant with respect to the actual cost and impact on your business.  And unless a vendor is failing miserably and often, the compensation itself is not going to change the vendor’s track record.

The true rule of the SLA is to communicate the vendor’s commitment to providing you with service that meets defined expectations for Performance, Availability, and Reliability (PAR).  The SLA should also communicate how the vendor defines and sets priorities for problems and how they will respond based on those priorities.  A good SLA will set expectations and define the method of measuring if those expectations are met.

Continuing with the Microsoft and Google example.  Microsoft sets an expectation that you will have downtime.  While the downtime is normally scheduled in advance, it may not be.  Google, in contrast, sets an expectation that you should have no downtime, ever.   The details follow.

Microsoft’s SLA is typical in that it excludes maintenance windows, periods of time the system will be unavailable for scheduled or emergency maintenance.  While Microsoft does not schedule these windows at a regular weekly or monthly time frame, they do promise to give you reasonable notice for maintenance windows.  The SLA, however, allows Microsoft to declare emergency maintenance windows with little or no maintenance.

In August 2010, Microsoft’s BPOS service had 6 emergency maintenance windows, totaling more than 10 hours, in response to customers losing connectivity to the service, along with 30 hours of scheduled maintenance windows.  In line with Microsoft’s SLA, customers experienced more than 40 hours of downtime that month, which is within the boundaries of the SLA and its expectations.  On August 17, 2011, Microsoft experienced a data center failure that resulted in loss of Exchange access for its Office365 customers in North America for as long a five hours.  The system was down for 90 minutes before Microsoft acknowledged this as an outage.

Google’s SLA sets and expectation for system availability 24x7x365, with no scheduled downtime for maintenance and no emergency maintenance windows.

The difference in SLAs sets a very different expectation and makes a statement about how each vendor builds, manages, and provides the services you pay for.

When comparing SLAs, understand the role of maintenance windows and other “exceptions” that give the vendor an out.  Also, look at the following.

  • Definitions for critical, important, normal, and low priority issues
  • Initial response times for issues based on priority level
  • Target time to repair for issues based on priority level
  • Methods of communicating system status and health
  • Methods of informing customers of issues and actions/results

Remember, if you need to use the compensation clause, your vendor has already failed.