NSX security group misconfiguration
Recently I ran into an issue in which the tenant was not able to connect to its virtual machine from their office network. The tenant environment sits on an infrastructure build with VMware NSX and the vRealize suite.
Now the solution to this problem wasn’t really rocket science, but since I am fairly new to NSX and had written down the situation, I thought why not share it here as well.
When I received the call the only thing that was mentioned was that tenant was unable to make a connection to the virtual machine using any service. But first things first, lets describe the basic overlay of the situation so we are all on the same page.
So now that we have the network part written down, let’s head on to the security policies.
|Internal to Internal||“Internal” security group||“Internal” security group||Any|
|Internal to External||“Internal” security group||“External” security group||Any|
|External to Internal||“External” security group||“Internal” security group||Any|
The Firewall rules mentioned here are not the ones used in the real situation. For the sake of keeping this article nice and clean I dumped it down to these 3.
Figuring it out
The first thing I noticed was that there was an IPsec connection between the tenant office and the ESG. So before heading on I first checked all the status flags for the tunnel via the vSphere webclient.
Then I tried if the virtual machine was able to make a connection to the internet or virtual machines on the same subnet. That went without any problem, so on the network level everything seemed ok. I was curious if there where packages being dropped before reaching the virtual machine.
Using the “Debug packet display interface vNic” command on both inside and outside interfaces of the DLR and ESG, I found that packages where reaching the virtual machine. But the virtual machine was not sending any replies. Thus I concluded that the virtual machine OS or its distributed firewall was denying the connection. The firewall of the OS was disabled, so for me only the distributed firewall was left to inspect.
At first glance all the firewall policies seemed to be fine, but after taking a closer look at the security groups there was one that fancied my interest.
The security group called “External” was based on “IP sets” that can be defined in the “grouping objects” of the NSX manager. There were two IP sets available in the security group:
- Any: this one was defined as 0.0.0.0/0 thus including every IP address
- Private: in this set the private ranges where defined 10.0.0.0/8; 172.16.0.0/12; 192.168.0.0/16
In the security group the “any” ip set was used in the include section, while the “private” set was used in the exclude.
Applying this knowledge on the security policy, the virtual machines where basically allowed to connect to external address. But not to private addresses. Since the customer was trying to connect from their own subnet over the IPsec tunnel, their workstations where hanging somewhere in the middle.
The workstations where not a member of the external security group, but neither where they part of the internal security group since they did not have the security tag available in the datacenter.
To resolve this, I made a new IP set that included the peer subnet of the tenant IPsec tunnel. I then used this IP set in a newly created security group. After the tenant added this new security group to the right security policies, they were able to connect with the virtual machines.