Thoughts on high availability
With all the news and buzz around vSphere it’s easy to get carried away by all the new stuff appearing. But even with all these new features we still need to think about High Availability and how to design the infrastructure.
Everytime we design a virtual infrastructure we design it for high availability. We enable HA, we put in a lot of network cards etc to make the infrastructure resilient. Even vSphere brings more high availability options like FT.
But what IS high availability? According to Wikipedia it is:
a system design protocol and associated implementation that ensures a certain absolute degree of operational continuity during a given measurement period.
Well, nice definition, but what does this mean for designing a infrastructure?Designing for High Availability is much, much more than just designing and using a HA virtual environment.
We design the physical systems with redundant power supplies, multiple paths to network and storage. In our highly virtualized environments we take stuff like HA from VMware for granted, but is it really safe and high available? And is it really necessary?
- What if you are using something like Site Recovery Manager from VMware to create a disaster recovery site in case your main equipment room totally fails because of a fire? Do you have desktops at your DR site? Can they use the phone? Are there enough seats and desks?
- What if the air conditioning in your main equipment room fails? Do you really need a spare in the MER, or is it enough to have alerting?
What it means for me is that you take a lot/all possible precautions to prevent outage of your IT infrastructure and make sure that the outage, if it still occurs, has minimal effect on the business. For some components this means that you have hot spares or redundant components to cope with failure. For other components it may be just enough to have good monitoring and alerting.
It also means that you have to consider other items as well: Who can do what, who is allowed in the datacenter, whodecides which patches are deployed and so on.
The message I want to give you: If you design for high availability take as much into consideration as needed. For the things you can’t control make sure that you write these things down and let someone else decide if they want to take the risk or solve the problem. Ask if there are DR runbooks in place, you don’t want to do the work twice ;)
More and more companies are relying on High Availability because people use the systems 24/7 so its really important to consider your design how you can support that behaviour. It’s all for the business and the users in the end.
And last: High Availability is much more than just a technical solution, although it is getting easier with the software from today.