vCloud Director completely abstracts underlying virtual resources from the consumers who get compute and storage resources represented by virtual datacenters (VDC) with given tiered profile (e.g. gold – silver – bronze). However the provider must care about the actual physical hardware and from time to time is facing the issues of upgrades and migrations.
Fortunately it is possible to migrate whole Provider VDCs non disruptively from the old hardware to a new one with no or minimal impact on the vCloud customers with no downtime or their VMs running in the cloud.
vCloud Director 5.1 has two features that help to accomplish this: elastic VDC (VDCs spanning multiple vSphere clusters) and merging of Provider VDCs. I already wrote about elastic VDCs in the post about Allocation Pool changes so please read that article first if that concept is new to you.
The online migration process from the old to the new hardware in high level works like this:
- Let’s say that the Provider VDC (PVDC) called GoldVDC is backed by Cluster1 consisting of old hardware.
- A new Cluster2 is created with new hardware.
- A new PVDC is created – let us call it GoldVDCnew from the Cluster2 and merged with Cluster1. Although we could add the new Cluster2 directly to the GoldVDC this would not allow us to retire the old hardware as it is not possible to detach the primary resource pool from a PVDC.
- We can now rename PVDC GoldVDCnew to GoldVDC and disable Cluster1. This has no impact on running VMs however any newly deployed or power on VMs are already placed to Cluster2.
- Now we have to migrate all the workloads from Cluster1 to Cluster2 and then detach the Cluster1 from the PVDC.
The actual migration between Clusters (or resource pools) has to take into account following 5 resources that exist in the VDCs – vApps, catalog templates, catalog images, Edge Gateways and vApp Edges.
vCloud Director actually does not use vSphere vApp objects. vCloud vApps are from the point of view of vSphere infrastructure just a collection of VMs. So we just need to migrate the VMs. This cannot be done from within vSphere because vCloud Director keeps track in which resource pool each VM is placed. Additionally vCloud Director also needs to apply proper resource pool reservations and limits based on the org VDC allocation type. There is however migrate option in vCloud Director that can be used. This can be done from GUI or with API (see the end of this article). Note: the migration leverages vMotion with shared storage. It is not possible to migrate this way between clusters without shared storage even though vSphere 5.1 has so called Enhanced vMotion (aka shared nothing vMotion).
Migration of catalog templates is more difficult. Again the vCloud template is quite different from the vSphere template. vCloud templates are basically powered off VMs. Although migration at the vSphere level seems not as harmful as in the previous case, because no resource pool settings must be configured (catalog VMs are never powerd on), we would still encounter a problem when we would try to detach Cluster1 as vCloud Director keeps track of VM to Resource Pool associations.
Unfortunately the GUI migration process from the step 1 cannot be used. The GUI workaround is to open each catalog and move each catalog VM to the same catalog. This basically creates a clone of the VM which gets registered to a Cluster2 host and the original VM is deleted. This is however very expensive operation from the storage perspective. The cloning operation needs temporarily extra space and creates quite a lot of I/O storage traffic. Fast provisioning (linked clones) can help here.
The second alternative is to use the same API call as in the first case. Although this is not documented it works (see the example at the end of the article).
ISO or FLP images are stored on vSphere datastores in special folder <VCD Instance Name>/media/<org ID>/<org VDC ID>/media-<media ID>.<ISO|FLP>. vSphere (vCenter) does not keep track of these objects in any way. vCloud Director stores the datastore moref and the media folders in its database. The media upload is done by the vCloud Director Cells with NFC – VMware Network File Copy protocol via any ESX host that is connected to the datastore. Therefore as long as the Cluster2 has access to the media datastores nothing needs to be migrated.
Gateway and vApp Edges are always running VMs placed in System vDC resource pool in Cluster1. If a vApp with routed vApp network is powered off the particular vApp Edge is destroyed. When the vApp is started again a new vApp Edge is deployed with identical configuration and would be placed into the new Cluster2. Simple vMotion between cluster seems to work at first but is definitely not recommended. vShield Manager keeps track to which cluster is each Edge deployed. Any major Edge configuration change (upgrade to new version or upgrade ot full configuration) would try to deploy the Edge to the original cluster.
Edge redeploy is an operation with minimum impact on the network flows going through the virtual router. A new Edge VM is deployed by vShield Manager, identical configuration is pushed to it and then networks are disconnected from the old Edge and connected to the new one. This might have impact on loosing an IPsec VPN connection or load balanced session otherwise the disruption is minimal. The Edge redeploy however cannot be done directly from vShield Manager (too bad as there is a nice script for this: see KB 2035939) because vShield Manager knows nothing about restrictions made in vCloud Director on the PVDC (the disabled Cluster1). Edge Gateway redeploy and routed/fenced network reset must be done from vCloud Director. This can be done from the GUI (however it is not trivial to find all the running vApp Edges) or with vCloud API.
There are some limitations or considerations that need to be taken into account:
- VDC elasticity currently (version 5.1) works only within vCenter Datacenter domain and all clusters need to use the same distributed switch for external networks and network pool.
- Reservation allocation type Org VDCs do not currently support elasticity of VDC (those workloads cannot be migrated).
- Both clusters should have access to the same storage. If storage migration is required do it as independent second step.
- vSphere vMotion restrictions apply: if the new hardware has newer generation CPU leverage EVC and lower the compatibility of the new cluster to the old hardware. Once the old hardware is retired the EVC mode can be changed and any restarted (full power cycle required) or new VMs can take advantage of it. Obviously migrations between different CPU architectures is not possible (AMD vs Intel),
- 1 GHz of old CPU is not equal to 1 GHz of a newer generation CPU. Therefore do not mix them in elastic VDCs unless for above mentioned migration reasons. This could also impact Chargeback – customer will get different (higher) performance for the same cost.
vCloud API Examples
As mentioned above vCloud VMs can and vCloud template should be migrated with an API call. The request looks like this:
POST API-URL/admin/extension/resourcePool/id/action/migrateVms with Request body containing MigrateParams.
VM migration example:
Template migration example