In vCloud Director it is possible to configure vApp leases. The maximums are set by system admin at Organization level (in Policies), which can be lowered by Org Admin (at org level) and set by vApp owner at the vApp level. A vApp has runtime lease (for how long it will be in running state) and storage lease (for how long it will consume storage once it is not running).
vApp leases are very useful in test & dev or lab environments to make sure abandoned, unused VMs are not running and taking resources.
When vApp lease is coming to an end, its owner gets a reminder via email (how many days before expiration can be configured in User Preferences) and can optionally reset vApp lease to avoid its stopping or deletion.
By default expired running vApp is put into suspended state which means its memory content is saved to datastores. This ensures fully consistent state upon consequent power on of the vApp. This however might not be always needed especially in dev/lab situations – the memory content could take lots of storage space and for example saving 16 GB RAM VM to datastore could also create IO performance impact. As of vCloud Director 8.20 the Organization Administrator can instead change the default runtime expiry action to power off. The setting is done at Org level and must be done via API by setting the element <PowerOffOnRuntimeLeaseExpiration> of OrgLeaseSettingsType to true. The API version must be at least 25.0.
vCloud Director 8.20 allows deployment of Org VDC Edge Gateways in 4 different form factors from Compact to X-Large where each provides different level of performance and consumes different amount of resources.
As these Edge Gateways are deployed by NSX Manager which allows setting custom reservations for CPU and RAM via an API call PUT https://<NSXManager>/api/4.0/edgePublish/tuningConfiguration, it is also possible in vCloud Director to set custom reservations.
Why would you change the default reservations? Reservations at VM (Edge) level reserve the resources for itself which means no other VM can utilize them in case they are unused. They basically guarantee certain level of service that the VM (Edge) from performance perspective will always deliver. In service provider environments oversubscription provides ROI benefits and if the service provider can guarantee enough resources at cluster scale, than the VM level reservations can be set lower if at all.
This can be accomplished by tuning the networking.gatewayMemoryReservationMultiplier and networking.gatewayCpuReservationMultiplier settings via cell-management-tool from vCloud Director cell. By default the CPU multiplier is set to 64 MHz per vCPU and the Memory multiplier to 0.5.
By default Edge Gateways will be deployed with the following reservation settings:
The following command will change memory multiplier to 10%:
Note: The new reservation settings are applicable only for newly deployed Org VDC Edge Gateways. Redeploying existing edges will not change their reservation settings. You must either use NSX API to do so, or modify Org VDC Edge Gateway form factor (e.g. change Large to Compact and then back to Large) which is not so elegant as it will basically redeploy the Edge twice.
Also note that NSX 6.2 and NSX 6.3 have different sizing of Quad Large Edge. vCloud Director 8.20 is by default set for the NSX 6.3 size which is 2 GB RAM (as opposed to NSX 6.2 value of 1 GB RAM). It is possible to change the default for the reservation calculation by editing networking.full4GatewayMemoryMb setting to value ‘1024’
About two years ago I have written a blog post describing how service provider (with cloud based on vCloud Director) can replace hardware without tenants actually noticing this. The provider can stand up new vSphere cluster with new hardware and migrate whole Provider VDC to it with running virtual machines. In my mind this is one of the distinguishing features from other cloud management systems.
Recently I had an opportunity to test this in large environment with vCloud Director 5.5 and would like to report how it is even more simplified compared to vCloud Director 5.1 which the former post was based on. No more API calls are needed and everything can be done from vCloud Administrator GUI in the following five steps:
Prepare new cluster and create new provider VDC on it.
Merge it with the old provider VDC. Disable its resource pool(s).
Migrate all VMs from the old resource pool to the new one in vCloud Director (not in vSphere!!!)
Redeploy all Edge Gateways running on the old hardware.
Detach the old resource pool.
While the first 4 steps are identical as in vCloud Director 5.1 the step 5 is where the simplification is. The last step takes care of both migration of catalog templates and deletion of shadow VMs.
As before this is possible only with Elastic VDCs (Pay-As-You-Go or Elastic Allocation Pool).
This little known fact came up already twice recently so I will write just a short post about it.
In the past when you created allocation pool Organization Virtual Datacenter (Org VDC) and allocated to it for example 20 GB RAM it actually did not allow you to deploy VMs with total memory sum of 20 GB due to virtualization memory overhead being charged against the allocation as well. This behavior forced service providers to add additional 5%-15% to the memory Org VDC allocation. This was also very confusing for the end users who were complaining why their VM cannot power on.
With the elastic allocation pool VDC changes which came with vCloud Director 5.1 it is no longer an issue. The reason is that in the non-elastic VDC it is vSphere (and Org VDC resource pool object with a limit set) who does the admission control – i.e. how many VMs can be deployed into it. In the elastic VDC it is actually vCloud Director who is responsible for the decision if a VM can be deployed into particular VDC.
So to sum it up: if you use elastic allocation pool the tenant can use it up to the last MB. The virtualization VM memory overhead is charged to the provider who must take it into account when doing capacity management.
This is second part of the article. Read the first part here.
In the first part we have established why it is important to monitor resource utilization of workloads deployed in the public cloud and hinted how we can get some hypervisor metrics. In the second part I want to present an architecture example how it can be achieved.
The above diagram describes the whole setup. I am monitoring three workloads (VMs) deployed in a public cloud – in this case vCloud Hybrid Service. I have also deployed vFabric Hyperic into the same cloud, which collects the workload metrics through Hyperic agents. vCenter Operations Manager which is installed on premise is then collecting the metrics from Hyperic database and is displaying them in custom dashboards while doing its own magic (super metrics, dynamic thresholding and analytics).
I have also created custom Hyperic plugin which collects Windows VM performance metrics through VMware Tools Guest SDK.
Deployment and Configuration
vFabric Hyperic Deployment
Create Org VDC network in public cloud for Hyperic Deployment.
Upload vFabric Hyperic 5.7.1 Server installation and vFabric Hyperic 5.7.1 DB installation in OVF format to a public cloud catalog.
Deploy Hyperic DB first to Org VDC network first and note the IP address it has been allocated.
Deploy Hyperic Server and enter the Hyperic DB IP address when asked.
Install Hyperic Agent to all VMs that are going to be monitored.
vFabric Hyperic Configuration
Connect to Hyperic GUI (http://<hyperic server IP>:7080). As I have used jumpbox in the public cloud I did not needed to open the port to the internet, otherwise create the appropriate DNAT and firewall rule to reach the server from outside.
Go to Administration > Plugin Manager > and click Add/Update Plugin(s) button and upload vmPerfMon-plugin.xml custom plugin:
<server name="VM Performance Counters"
<!-- You always need availability metric, so just pick some service -->
<!-- Template filter is passed to metrics -->
<!-- Using object filter to reduce amount of xml -->
<filter name="object" value="VM Memory"/>
<metric name="Memory Reservation" alias="Memory Reservation in MB" units="MB"/>
<metric name="Memory Limit" alias="Memory Limit in MB" units="MB"/>
<metric name="Memory Shares" alias="Memory Shares"/>
<metric name="Memory Active" alias="Memory Active in MB" units="MB"/>
<metric name="Memory Ballooned" alias="Memory Ballooned in MB" units="MB"/>
<metric name="Memory Swapped" alias="Memory Swapped in MB" units="MB"/>
<!-- Win perf object is changed, using new one -->
<filter name="object" value="VM Processor"/>
<!-- Processor object needs instance information to access -->
<filter name="instance" value="_Total"/>
<!-- Giving new template since we now need instance info -->
<metric name="CPU Reservation in MHz" alias="Reservation in MHz"/>
<metric name="CPU Limit in MHz" alias="Limit in MHz"/>
<metric name="CPU Shares" alias="Shares"/>
<metric name="Effective VM Speed in MHz" alias="Effective VM Speed in MHz" indicator="true"/>
<metric name="Host processor speed in MHz" alias="Host processor speed in MHz"/>
The plugin should be automatically distributed through agents to the workloads. However we need to configure it. In the Resources tab browse to the workload and in the Tools Menu select New Server. Type a name, in the Server Type drop down find Windows VM Performance Counters and type something in the Install path field.
After clicking OK immediately click on Configuration Properties and check Auto-discover services.
To start collecting data we need to configure collection interval for the metrics. Go to Monitor > Metric Data and click Show All Metrics button. Select the metrics and at the bottom input collection interval and submit.
Now when we are collecting data we could create indicators, create alerts, etc., However Hyperic is just a collection engine for us. We will feed its data to vCenter Operations Manager.
vCenter Operations Manager Configuration
Assuming there is already an existing on-premise installation of vCenter Operations Manager we need to configure it to collect data from the cloud Hyperic instance. To do this first we need to open Edge Gateway firewall and create DNAT and firewall rule to the Hyperic DB server (TCP port 5432).
Now we need to download Hyperic vCOps Adapter from here.
To install go to vC Ops admin interface and install it through Update tab.
Then go to the vC Ops custom interface and in Admin > Support click Describe icon.
Next we need to configure the adapter. Go to Environment > Configuration > Adapter Instances and add new Hyperic instance for the public cloud. Then configure the (public) IP address of Hyperic DB, port and credentials.
When finished with all configurations we can create custom dashboards from the collected metrics in vC Ops. This is however out of scope of this article as it depends on the use case.
As an example above I am showing real CPU usage of one cloud VM as reported by the hypervisor. The host physical core speed is 2.2 GHz, however the effective VM speed was varying between 2.5 – 2.7 GHz (max turbo frequency of Intel E5-2660 CPU is 3 GHz). If I would look at the internal Windows GuestOS task manager metric I would see just meaningless 100% CPU utilization.