Monday, April 04 2011 13:03 Written by VMGuru
Approximately 2 years ago, there was a community conversation that was kicked off from Arnim Van Lieshout's blog post on memory management. Over 31,000 blog hits later, this topic still remains one of the most talked about subjects in VMware virtualization. At the end of the day, it is still NOT a good scenario to have in your ESX environment, and we consistently run across the situation when talking with partners and customers simply due to lack of education on how setting a memory limit can ultimately impact performance of an entire host.
The first question that keeps coming up is "Don't memory management methods only kick in when there is contention?" My answer to this is two-fold. First, I've only seen the VMkernel waiting for contention when looking at shares to determine priority, not to execute a method of memory savings/management. Secondly, we need to define "contention". In this particular case it is when a Guest OS needs more resources than the VMkernel can assign to it at a point in time. This can be from a lack of available resources, or by forcing a restriction (like a limit) as to how much the VMkernel can give to a guest. That's the one thing about a limit in VMware... The limit is a hard value. There is no "If someone else isn't using it we will let you go above it", it's static and absolute.
OK, so let's review what's going on here with a scenario I've seen all too often. I configure my template with 512MB of memory and start deploying a bunch of VMs with 1024MB of memory. For whatever reason, whether it is a bug (Yes, even in a world where we are running vSphere 4.1, bugs from ESX 2.x are still quite prevalent) or a static misconfiguration on the part of the VMware Admin, all my VMs go out with a hard limit of 512MB of memory and an assignment of 1024MB.
As my virtual machine boots up and loads it's applications, it runs along happily until it hits that 512MB limit. At this point, the OS and applications don't know anything about a 512MB limit (They think they have the full 1024MB that is assigned to the VM). They are going to keep requesting the use of more and more memory. The VMkernel, whose job it is to assign memory, is simply going to say "No, you have a limit, and you will stick to that".
The guest simply doesn't know that it is being rejected, it just knows it cannot get access to more memory. As far as it is concerned, it still sees 512MB sitting there unused. This is naturally going to cause a performance hit to your applications, as they are being deprived of memory resources. This is where "Contention" by my definition enters the equation. The Guest OS sees that it should have access to, and demands more resources. The VMkernel simply will not allocating any more memory due to the hard limit.
This is where things get complicated, and I can ONLY speak from experience and not with a 100% certainty about the process, but I have seen and resolved this issue enough times to make me pretty damn sure I am correct. Since the guest keeps pressuring the VMkernel, and the VMkernel not having any more memory than it is allowed to assign to the guest, the VMkernel does the next best thing, which is trigger the balloon driver inside the Guest OS.
As you can see in the previous image, in order for the Guest OS to actually swap effectively, it needs to balloon the full amount of memory so the guest can clean up and only keep the more frequently accessed pages in memory, and move everything else to its OS Swap file. When it comes to ballooning, the VMkernel seems to be capable of breaking its memory limit restriction for this purpose. When this happens you will see a HUGE spike in Balloon memory for that virtual machine.
The balloon will effectively grow to ((memoryAssigned - memoryLimit) + memoryFreedbyBalloon) before it deflates and frees up the OS memory. Within 10 minutes, the process should be complete, and the balloon will deflate, leaving the OS with access to more memory assignment from the VMkernel, again, up to the specified limit.
Of course, my workloads are never happy and always want more memory, so this process is going to repeat itself and the OS is going to hit the limit again.
This time, the VMkernel is nearly out of options and it will try to balloon again, but it will not work. Starting with ESX 4.1, VMware implemented a feature called memory compression. In this instance, the VMkernel sets aside a certain % of each VM's memory that will be used as a compression cache. By default, this value is at 10%, but can be adjusted in the advanced settings of the VM up to a maximum of 100%. When ballooning fails, the VMkernel will grab less-frequently accessed memory pages, compress them, and store them in the compression cache, which resides on local memory. Keep in mind that having a compression cache will increase overall memory overhead, but will significantly mitigate the impact of sever overcommitment and provides one file fail-safe prior to the ESX Host having to swap memory to disk. If you would like to read a more in-depth blog post regarding how Memory Compression in vSphere 4.1 functions, I HIGHLY recommend Gabe's excellent blog post.
Image to be inserted in the near future
When all else fails, and things are quite literally hitting the fan in memory overcommitment, the VMkernel has no choice but to swap memory pages to disk. What you will see in your ESXTOP or monitoring tool is that the balloon is going to significantly inflate again. Instead of deflating in 10 minutes, it is going to stay inflated, and keep trying to get the guest to page. This causes a very major performance hit to your VM, and is the final warning that you have that your VM workload is about to use the VMware swap file. This is already beyond the point of no return as the Virtual Machine performance will be quite seriously degraded.
Assuming the memory swapping is a result of a rogue memory limit, you MUST know what you are looking for in order to adequately troubleshoot the issue. VMotioning the workload will have no effect, as the limit will follow the VM. Unless you know how to detect and resolve this issue, the VM (or as far too often the case, multiple VMs) will be degraded. If there are too many VMs having this condition on a single ESX Server, the host itself will grind to a halt. After detecting and resolving this issue in a "worst case" scenario, many customers are able to get more than double the amount of VM workloads running on their system with noticeably improved performance. Far too often it actually goes undetected because people either 1) Don't know its an issue or 2) don't know how to effectively troubleshoot the problem.
At the end of the day, it is always safer for your infrastructure to lower the assigned amount of memory of a VM vs. messing around with setting limits to restrict them! I would personally never recommend to a customer to ever use a memory limit, and would only concede after telling them of the risks and make sure the issue and fix are documented as one of the first troubleshooting steps. There are some simple ways to quickly detect and remediate these memory issues using a simple PowerShell script, which we will discuss in a separate article.
The last part in which you write that the vmkernel first tries memory compression and vmkernel swap as last resort is not completely correct. Just before memory is swapped out to disk, ESX will compress the memory. That is after ballooning and TPS have not resolved the memory contention. With memory compression ESX will try and compress the 4K swap candidate pages. When a compression of more than 50% is obtained the page (now 2K or less) will then be stored in the compression cache. If a compression of more than 50% is not obtained, the 4K page will be swapped out (to disk). If in a later stadium this compressed page is access by a VM, it will first be decompressed and removed from the compression cache.
See my post on: http://www.gabesvirtualworld.c...phere-4-1/
@Ryan: If by allocated memory you mean the amount of memory you assign to the VM, then the maximum limit is always equal to the assigned memory and if you set the limit equal to the assigned memory, there will be no impact for the VM. However it will impact DRS, since for there is a difference for DRS between "unlimited" and a value, when doing its calculations. Which means that there will be a slightly higher load on vCenter while DRS is doing its calculations.
So the max limit is equal to assigned memory regardless of the "limit" setting? One of my VM's have been set in a way that indicates otherwise (see pic):
I'm trying to troubleshoot some host memory usage alarms (occurring hourly) and I think they are due to improper VM settings.
According to that pic and my theory, your VM should have at least 32768MB of RAM.
But my advice is to always stay away from memory and CPU limits unless you really really know what you're doing and need them. There are some powershell scripts available that check all your VMs for default settings and can help you correct them.