Hypervisor Overhead – Reservable vs Raw Compute Resources

While working on capacity planning for one of my clients I encountered one not very well documented fact about vSphere cluster reservable resources. Common practice when calculating available compute resources (CPU and RAM) takes physical values of CPU and RAM of one host, multiplies them with the number of hosts in the cluster and subtracts the HA fail over capacity. However this is not correct as it does not take into account resources that vmkernel processes are reserving for themselves and that cannot be reserved by user workloads – which is important in service provider cloud environment where tenants pay for they allocated and reserved resources. Resources that cannot be reserved cannot be sold and mean lower ROI.

Real life example: 8 host cluster where each host has two 8 core @ 2.899 GHz CPUs and 384 GB RAM. Theoretically this should result in 371072 MHz CPU capacity and 3072 GB RAM however Resource Allocation tab of the cluster in vSphere client shows that only 330424 MHz and 2988 GB RAM is the total cluster capacity.

Cluster capacity

There is a KB article 1033443 describing the behavior with a title almost as long as the whole article: Cluster level memory capacity on Resource Allocation tab is less than the sum of the memory available for virtual machines for ESX hosts in the cluster, that unfortunately does not explain why and how much resources are missing.

As already hinted above, vmkernel processes are reserving some resources for themselves. If you select in vSphere Client a single host and go to Configuration > System Resource Allocation you will see value for System Resource Reservation – by default 301 MHz for CPU and 0 MB RAM.

System Resource Allocation

However this view does not show the whole story – if you change from Simple to Advanced view (top right) you will be presented with a tree of resource pools each with their own reservations.

System Resource Allocation - Advanced

There is a host resource pool at the root of the tree which has all the theoretical physical resource available as reservation which also equals its limit. However then there are 4 children resource pools at the same level:

  • idle, which is always empty
  • system (this is the one we could edit on the previous page) containing low level kernel, driver and similar sub-resource pools for each process
  • vim, which contains sub-resource pools for host management processes (hostd, vpxa, DCUI, …) which used to run in Console OS in the ESX 4 classic times.
  • user, which is available for the VMs deployed on the host

All these 2nd level resource pools and all their children resource pools have expandable reservations, which means that if one of the children will request more resources that are available in the resource pool the resource pool will try to get more resources from its parent. And the top parent is the host resource pool. The system processes and management (VIM) processes are started immediately when the host boots up before VM workloads are placed on the host therefore take the part of the available host resources for themselves.

You can easily see that some processes like hostd or vpxa reserve relatively significant amount of resources. The relativity depends on the size of the host – in my small lab environment 36% of CPU and 20% of RAM resources were not available to be reserved for VMs. In big environments as was in the example above, the CPU overhead is about 11% but memory only 3%.

It should be also noted that with more and more intelligent hypervisor (VXLAN VTEP, vApp firewalling, antivirus inspection, vSAN, etc.) the overhead will go up and capacity planning should include it.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s