July 13, 2015
Apple and Rackspace have both had their data centers hit with disasters in the past month. A chemical spill at one Apple location and a fire at another are not the types of disasters most businesses think of when we talk about business continuity and disaster recovery plans.
Outages can be extremely costly, sometimes even leading to business failure. As we mentioned in Disaster Recovery: Does Your Business Have What It Takes To Face The Worst Case Scenario?: InformationWeek reports that “IT downtime can cost as much as $26 billion annually. In addition to this, the report estimates that 56 percent of enterprises in North America have poor disaster recovery plans.”
Data center outages occur for many reasons from human error to natural disasters. Most businesses are not prepared. Half of businesses surveyed in Managing Growth, Risk and the Cloud indicated they have experienced an outage in the past 10 years.
Causes of Extended Outages
The longest outages of computer systems can occur in data centers that believe they are prepared. In the AS400 environment, more times than not the backups have never been tested and don’t work when they’re needed.
In a mainframe account I serviced, their excellent power backup system caused an extended outage with some hardware down for 24 to 36 hours. Because it worked so well, none of the hardware in that account had power dropped for years.
During a storm that resulted in lost city power, a system operator threw the wrong switch and crashed the entire room containing three CPUs and attached peripherals. That hard crash caused failures in multiple machines that would have failed in prior times if the power had ever gone down before.
Power outages can be caused by storms, flooding, operator error, bad backup generators, or in Apple’s case, a car running into a power pole out on the street.
Even if you have power backup, computer rooms can and do go down — and when they do, data loss is always a possibility. In California, data center hardware is installed with special “earthquake” supports, but a major earthquake can buckle foundations and dump equipment into the raised floor.
Have you ever seen an entire computer room laying over on its side from a raised floor failure? I have, and it took every tech we had to set the equipment back up again. Fortunately, none toppled completely over and no data was lost — although it easily could have been.
Flooding doesn’t occur only due to rain. We installed a 4361 system in a room at a Palm Springs hotel for a demonstration of 3D CAD. During the night, a large water pipe burst, pouring water into a 3725. Hotel staff alerted maintenance when they saw water pouring out from under double doors in the locked room.
When we arrived a janitor was using a shop vac standing in water in a room full of equipment run by 220V power. (Please never do that — Never stand in water that could be electrified.)
Fortunately, clean water does not typically damage equipment. When a roof caved in and flooded a room full of terminals, I poured the water out and let the keyboards dry and they still worked.
Their staff was using them with water wicking up their pants legs from the still wet carpet! (Don’t do that, either! Electrocution is a definite possibility.)
Disaster Recovery Processes Must Be Tested
Disaster recovery has such a bad connotation that many prefer the term business continuity plans instead. Whatever you call them, many businesses are reluctant to test them because they fear causing an outage.
Failure to test is asking for an outage that lasts much longer or may even cause serious business losses. If your staff is not comfortable testing your backups and disaster process (and it is a scary prospect without experience), you must hire a consultant or outsource to someone who has confidence in their process implementation and testing.
How to Develop a Disaster Recovery Plan
If you do not have experience in-house, seriously consider contracting with a company that does. If your team feels they can handle it, refer them to How to Develop a Disaster Recovery Plan.
They should also find this SlideShare useful: The A to Z Guide to Business Continuity and Disaster Recovery from Forsythe Technology.
No matter how painful it is, budget must be allocated to ensure your data and operations will continue no matter what happens. Now that most businesses use the cloud, every business can be protected from outages.
Make sure the decision makers know exactly what the cost would be for failure to protect your business from any type of outage. Calculate the cost of losing your customer data or being down a week during your busiest season as Myer was during the holiday shopping season in 2014.
Budgeting to ensure your business can continue to run is a lot less expensive than the consequences of failing to plan and test. Should your superiors refuse to act, make sure you have in writing that it was their decision — and not yours — to leave your company unprotected.