Yesterday I ran into VMware HA problems again. With my past HA troubleshooting experiences I thought configuring HA wouldn’t be a problem any more but unfortunately it bit me in the ass again.

We all know VMware ESX implementations where not all information, access, etc is available when you start with the installation. Exactly this was the case at this project. I only had 4 ESX hosts connected to the network and an EqualLogic SAN and the network was limited to the rack so only the ESX hosts and the SAN were connected. The network guys had to establish a connecting to the HQ but to do that the had to change IP addresses.

I installed the complete VMware Infrastructure with the new addresses waiting for the connection to be made so I could add the servers, like vCenter, to the corporate Active Directory.

Last weekend the network connection was established but they gave me the wrong IP addresses so I had to change all IP addresses from 10.120. to 10.130.. No problem, 15 minutes of work per ESX host and all is well. Indeed all was well until I enable HA on the cluster.

I almost smashed my laptop against the wall, yelling ‘NOT AGAIN!’ but the professional in me accepted the challenge. I checked all familiar things like HA settings, FT_HOST files, DNS settings, etc but after and hour I decide to go for the quick fix, re-install vCenter server and start from scratch. After half an hour and a fresh vCenter installation I enabled HA again and ……………………………….. FAILED!

Luckily at that moment Anne Jan called and he had seen this behavior before two weeks earlier (and he didn’t write a blog about it ;-)). He told me that changing the IP address on the ESXi console is not sufficient. When you change the IP address on the console and restart the management network you can connect to, manage and add ESX hosts to vCenter but you can not enable HA due to the fact that the /etc/hosts file still contains the old IP address.

After manaually changing the /etc/hosts file on all four hosts enabling HA was a piece of cake but due to a bug in ESXi it took me a few hours 🙁