Disaster Planning… how much is enough? Part 4
One of our initiatives right now is to build a Disaster Recovery site. Unfortunately, those cost money. So the people signing the checks want to know what the project is going to provide before approving funds. This is where things get tricky. It is a DR site, it is going to allow the IT infrastructure that the company relies on to remain operational in a worst case scenario. Sounds about right? Such a statement may very well get checks written… but it also could very easily land you in big trouble down the road. There are so many dependencies, so many interconnects between the many variables in the grand picture that a lot of thought and investigation needs to take place to truly know what your vulnerabilities are and to ensure they are covered.
So let’s take a typical DR setup, for a typical company. This may not reflect your situation exactly, but it is likely going to give you food for thought, and that is all I can hope to do.
SamCo is a company that houses all of its servers in a central location at HQ. They have multiple offices, all connected to HQ. Let’s focus on a specific database server, ServerDB. ServerDB has all the typical precautionary configurations, RAID, redundant power, etc. Data is being written to tape nightly and taken offsite.
The worry is that a disaster hits and the server is unavailable (for the sake of simplicity, all the other servers are still functioning). So you get another server, maybe even the exact same server hardware, and stick it at your shiny new DR site.
Using some solution, all data from ServerDB is automatically sent to ServerDB2. Everything is golden, you feel ready for anything (so long as ‘anything’ only affects ServerDB!). But just because ServerDB can see ServerDB2, and they are replicated, doesn’t necessarily mean you are done. How do all the clients connect to ServerDB? How are they going to be “programmed” to connect to ServerDB2 when needed? Maybe this is a policy; maybe there is a technological solution. Oh wait, users in HQ can connect to ServerDB2, but what about the other offices? There may be network changes you need to do.
With that taken care of, everything is certainly golden now, right? Maybe we should look at various “disaster” scenarios…
A VIP user goes and deletes a VIP file/data element on the ServerDB. Ah, this one is simple, you don’t even need your DR site for this… get the backup tape and restore the data. Oh wait, the backup tape failed. Good thing you have a very expensive DR site, where all the data is replicated to. Oh, that’s right, replicated servers replicate the data, but they also replicate the deletions. Better hope Mr. VIP didn’t have a hand in signing off on the DR setup, otherwise he is likely to determine that since it didn’t allow him to recover from his disaster, it wasn’t a worthwhile purchase, and the person who requested the DR site may be in for a bit of trouble.
Or maybe the situation is a bit different. Perhaps HQ suffers from an issue which causes ServerDB to go down, and also a loss of connectivity to the DR site. The DR site isn’t very useful if HQ can’t connect to it. No problem, you anticipated this, and asked for redundant networking. However, the decision was made that HQ could make keep busy without ServerDB/ServerDB2 so the redundant networking wasn’t approved. Trouble is, all of the remote sites connect to HQ in a star topology. With HQ down, none of them can access ServerDB2 at the DR site either. Not good.
There are countless scenarios and variations that could occur. I think the hardest part of a DR site/ disaster preparedness strategy isn’t the technical aspects… there is always someone that can provide the answer to a question for a price. The difficult part is knowing what questions need asking, what problems need solving. You really need to do a very thorough analysis of all systems, and try to run through all the possible breakpoints in various combinations. No system is entirely safe. Even Gmail goes down from time to time. The trick is to try to identify the issues you consider most likely, and find the balance between financial outlay, and potential return on investment for each scenario. Of course, when the scenario deemed too expensive to account for given its likelihood of occurrence actually happens, management will likely forget about all the caveats you presented them with when the DR site was approved. I’m still looking for an answer to that one myself!








