Follow up to Exchange Domain 20 and 21 Service Outage

We deeply regret and apologize for the recent outage and the disruption it caused to our impacted customers’ and partners’ business.  They entrust us with their critical business communications. We take this responsibility seriously.

The following post reviews the outage source, comprehensive measures we’re taking to prevent its recurrence, and how we are improving communications to customers.  A full Reason for Outage (RFO) was sent to impacted customers on Wednesday, 4/21. Under our service level agreement, their accounts will be proactively credited by Friday, 4/23.  

Summary of the Reason for Outage (RFO)

At approximately 6:15 a.m. PT on Thursday 4/15, a hardware failure occurred on one of the storage area networks (SANs) located in Intermedia’s New Jersey datacenter. The service processor for one of the controller nodes had a failure. This failure caused the entire load for that SAN to be shifted to the service processor on the redundant controller node. The spare capacity on the single service processor was not enough to handle the entire load of all systems connected to the SAN. This caused performance issues in Domains 20 and 21. 

For customers on Domain 21, a backlog of email rapidly developed. This caused major problems with mail delivery throughout Thursday, 4/15.

For customers on Domain 20, the backlog was large enough that it took 32 hours to clear.  At approximately 2 p.m. PT on 4/16, all systems were functioning normally and mail delivery was considered to be “real-time.”

Corrective Actions

Our SAN vendor analyzed the system logs for the event and determined that the service processor failure occurred due to a unique bug in the specific version of firmware on the system. Our vendor performed an emergency upgrade. The newer version of firmware includes a fix for the bug. We are taking additional corrective actions to make certain that there is enough spare capacity on the SAN. This will assure it performs without performance degradation in the event of a single hardware failure.

Improving Communications

Intermedia received significant constructive feedback regarding our communication throughout the outage. We recognize how important it is to proactively communicate timely, detailed information that clearly explains the impact on our customers’ service. We recognize that our current client notification tools and processes are more reactive than proactive.

We have taken a number of steps in response. These steps include development of a new client notification tool that will be used by Technical Support to proactively notify and communicate with clients during a service interruption. The notification tool will be released next week and put into operation in May. It includes automated SMS notification (text messaging). We are also revising our communication processes to assure that clear, non-technical information on service impact is included alongside technical details.

About Cynthia Greenberg

Cynthia joined Intermedia as head of communications in September 2009. In her role she develops and manages communications for the company including: PR/media relations, social media relations, crisis communications, corporate communications and internal communications. Prior to her current role with Intermedia, Cynthia spent many years at the top 10 marketing/communications firm, Ogilvy PR Worldwide, as Vice President. There she was part of the firm’s Corporate Practice serving professional services, financial services, healthcare and consumer companies. Her client roster included, but is not limited to: Adecco/Ajilon, LG Electronics and Kaplan. With 15 years of experience developing and executing regional and global consumer PR campaigns and corporate communications strategies, Cynthia’s in-house experience includes roles at Gap Inc, Accenture and KPMG, LLP.