vCloud Director offers three consumption types of cloud resources (organization virtual datacenters) – so called allocation models. These are:
- Reservation Pool, which is mostly used when a whole cluster (and its resources) is dedicated to the tenant
- Pay-As-You-Go, which is similar to Amazon cloud consumption model, where tenant gets the resources needed only when particular VM is deployed
- Allocation Pool when only a percentage of provider resources is dedicated
In the first release of vCloud Director 1.0, provider virtual datacenter (provider vDC) had to be backed up by either DRS cluster or its child resource pool. vSphere cluster can scale up to 32 hosts which means that the provider vDC has maximum compute capacity determined by the maximum cluster size. Tenants get their compute by getting chunks of provider vDC, called organization vDCs and are limited by the available free capacity in the cluster. One of the characteristic of cloud delivery model is to be able to scale in order to enable tenants to consume cloud resources in elastic way. However if the cluster backing up the provider vDC is full and the tenant is asking for more resources, the provider has to create a new organization vDC for the tenant from a different provider vDC backed by another resource pool from another cluster. And this is not an ideal solution for the tenant as managing multiple organization vDCs is more demanding from capacity perspective.
The second major release of vCloud Director (version 1.5) came up with the notion of elastic vDC. Provider vDC could be backed up by multiple resource pools and thus not bound to one vSphere cluster. However only Pay-As-You-Go organization vDCs were able to take advantage of this elasticity. Allocation and Reservation organization vDCs were still bound only to the primary resource pool of the provider vDC.
The third major release of vCloud Director and the most recent – version 5.1 extends the elasticity also to Allocation pool organization vDCs. The consumption models rely on vSphere resource management – virtual machine and resource pool reservations and limits and the way Allocation pool leverage them has changed significantly, therefore the need for this post.
Firstly I would recommend reading Frank Denneman’s and Chris Colotti’s recently released white paper VMware vCloud Director Resource Allocation Models, which describes mapping of all three organization vDC allocation models to vSphere resource management for vCloud Director 1.5.x. Let me now quickly recap how the Allocation pool works in vCloud Director 1.5.x:
- Provider creates allocation pool organization vDC with a CPU and memory allocation, where only a percentage is guaranteed for each CPU and memory.
- On vSphere level a child resource pool in the primary resource pool of provider vDC is created with limits of CPU and memory org vDC allocations and reservations of guaranteed percentage x allocated value
- When the tenant deploys a VM in this organization vDC, no VM CPU limit or reservation is set, however memory reservation is set to memory guaranteed percentage x VM memory and limit equal to VM memory.
- This results in very flexible use of CPU resources – tenant can overallocate vCPUs as he wishes, because organization vDC resource pool controls that total CPU usage does not exceeds tenant’s CPU allocation, however he cannot overallocate memory as the VM memory reservations must be backed up by resource pool memory reservations.
Elastic Allocation pool (in vCloud Director 5.1) implies that the organization vDC can consist of two or more resource pools. And this means that the CPU and memory usage management cannot be controlled at the resource pool level the way it was done in vCloud Director 1.5.x. The organization vDC could be fragmented into many resource pools and vCloud Director has to distribute the organization vDC guaranteed resources (reservations) and allocations (limits) among them. Each resource pools entitlements are therefore based on the virtual machines running in it. This is similar in the way Pay-As-You-Go model works where resource pool settings are changed every time a VM is powered on in it. To do this for memory is not a big problem, because memory is not as elastic resource as CPU (it is harder to take it away from a VM when it is not used and give it to another VM, where on the other hand it is very easy to redistribute unused CPU cycles). Unfortunately there is impact in the way CPU resources are distributed. Allocation model now has to rely on vCPU speed settings which must be set when the organization vDC is created. Then it is possible to calculate the entitlements of each VM in terms of CPU MHz and sum them up for entitlements of each resource pool. This results in resource sharing only among the VMs that are running in the particular resource pool. If there are some organization vDC resources undeployed they cannot be used by running VMs in the organization vDC.
What happens at vSphere level?
- The organization vDC is after creation backed by one resource pool in the primary resource pool of the provider vDC. The organization vDC resource pool has 0 MHz CPU reservation and limit with no expandable reservations and 0 MB RAM reservation but expandable reservations and unlimited memory.
- When a vApp is deployed (but not powered on yet) it might be placed into another resource pool backing up the provider vDC with most free capacity, where available networks and datastores are also considered. vCloud Director performs admission control to check if organization vDC has available resources to power on the vApp. If it is placed into a new provider vDC resource pool, again organization vDC resource pool is created with 0 MHz CPU reservation and limit and no expandable reservations and 0 MB RAM reservation and unlimited with expandable reservation.
- When the vApp is powered on its resource pool (RP) reservations are increased by its guaranteed allocation:
New RP CPU reservation = original RP CPU reservation + org vDC vCPU speed x VM number of vCPUs x percentage guaranteed
New RP RAM reservation = original RP RAM reservation + VM RAM x percentage guaranteed + VM memory overhead
Memory still has expandable reservations. There is still no memory limit set, but there is CPU limit increased by the VM vCPU allocation:
New RP CPU limit = original RP CPU limit + org vDC vCPU speed x VM number of vCPUs x percentage guaranteed
The memory limit on resource pool does not need to be set, because vCloud Director is doing the memory admission control and memory allocation is the natural limit at the VM level. CPU is the more liquid resource therefore limit is set only at the resource pool level (calculated from vCPU speed) and is not set at the VM level (this is different from the Pay-As-You-Go model).
- Use the same CPUs for all hosts in the same provider vDC to guarantee consistent performance
- vCloud Director capacity management is now more important as capacity guarantees are not enforced at the vSphere level. Provider can create organization vDC which guaranteed resources exceed available resources at the provider vDC level. Information box is displayed when this happens.
- The tenant cannot overcommit its organization vDC – neither in memory (that was possible in vCloud Director 1.x) versions nor in CPU (this is new). If the tenant buys 10 GHz of CPU power with 2 GHz vCPU speed, he can deploy only VMs in total with 5 vCPUs.
- CPU and memory resource guarantee percentage enables provider to overallocate its resources. This is different from the point 2 as exceeding guaranteed resources will not result in provisioning error and vSphere resource sharing mechanism will come into play.
- vCPU speed must be chosen carefully – too high can limit number of vCPUs deployed (see point 3) too low will result in poor VM performance if it is only one of few deployed in particular resource pool. On the other hand it gives provider more control on what is the minimum VM size deployed in the cloud (tenant cannot deploy ½ vCPU VM).
- When upgrading from an older vCloud Director release consider the impact on existing allocation organization vDCs.
- When a VM is powered off and then powered on again, it might get relocated to another resource pool if the original resource pool ran out of resources. This is more costly operation.