January 2, 2013
First disaster struck on Christmas Eve, then again on New Year’s Eve for Netflix viewers.
The company, which suffered an interruption of its video-streaming service Dec. 24, reported on New Year’s Eve that some customers were unable to access its DVD website because of a technical malfunction.
The glitch brought a multitude of complaints on Twitter and some users reported seeing the following message on the Netflix website: “The Netflix site is temporarily unavailable. Our engineers are working hard to bring the site back up as quickly as possible.”
According to an e-mailed statement by Netflix spokesman Joris Evers, the streaming operation of the service was never affected.
The timing, once again, is bad for the company — with many people off from work and children out of school for the holidays, Dec. 31 was bound to be a high-volume viewing day for Netflix.
On Christmas Eve, the world’s biggest video-streaming service was offline from 3:30 p.m. until late Dec. 24.
Netflix spokesman Joris Evers said the outage was caused due to issues with Amazon Web Services, an online business hosted that’s separate from the online retail store.
Amazon has since apologized for the mishap.
“We want to apologize,” said Amazon in a statement. “We know how critical our services are to our customers’ businesses, and we know this disruption came at an inopportune time for some of our customers.”
The following statement was released on the company’s website:
“The service disruption began at 12:24 p.m. PST on Dec. 24 when a portion of the ELB state data was logically deleted. This data is used and maintained by the ELB control plane to manage the configuration of the ELB load balancers in the region (for example tracking all the backend hosts to which traffic should be routed by each load balancer). The data was deleted by a maintenance process that was inadvertently run against the production ELB state data. This process was run by one of a very small number of developers who have access to this production environment. Unfortunately, the developer did not realize the mistake at the time. After this data was deleted, the ELB control plane began experiencing high latency and error rates for API calls to manage ELB load balancers.”
Netflix also released a statement indicating it is working to find a solution to deal with regional outages internally.
“Netflix is designed to handle failure of all or part of a single availability zone in a region as we run across three zones and operate with no loss of functionality on two,” the statement reads. “We are working on ways of extending our resiliency to handle partial or complete regional outages.”