How to Move (Live) vApps Across Org VDCs

VMware Cloud Director has secret not well known API only feature that allows to move vApps across Org VDCs while they are running. This feature has been purposefully made for the NSX-V to NSX-T Migration Tool, but can be used for other use cases hence the reason here to shed more lights on it.

We should start with mentioning that vApp migration across Org VDCs has been around since forever – in the UI you can select an existing vApp and you will find out Move command in the action menu. But that is something completely different – that method does in the background (vSphere) cloning operation with deletion of the source VM(s). Thus it is slow, requires vApp to be powered off and creates new identity for the vApp and VMs after the move (their UUIDs will change). The UI is using API method POST /vdc/{id}/action/cloneVApp with flag IsSourceDelete set to true.

So the above method is *not* the subject of this article – instead we will talk about API method POST /VDC/{id}/action/moveVApp.

The main differences are:

  • vMotion (e.g. live, share nothing and cross vCenter) is used
  • identity of vApp and VM does not change (UUID is retained)
  • vApp can be in running state
  • VMs can be connected to Named (independent) disks
  • Fast provisioning (linked clones) support

The moveVApp API is fairly new and still evolving. For example VMware Cloud Director 10.3.2 added support for move router vApps. Movement of running encrypted vApps will be supported in the future. So be aware there might be limitations based on your VCD version.

The vApp can be moved across Org VDCs/Provider VDCs/clusters, vCenters of the same tenant but it will not work across associated Orgs for example. It also cannot be used for moving vApps across clusters/resource pools in the same Org VDC (for that use Migrate VM UI/API). Obviously the underlying vSphere platform must support vMotions across the involved clusters or vCenters. NSX backing (V to T) change is also supported.

The API method is using the target Org VDC endpoint with quite elaborate payload that must describe which vApp is being moved, how will the target network configuration look like (obviously parent Org VDC networks will change) and what storage, compute or placement policies will be used by every vApp VM at the target.

Note that if a VM is connected to a media (ISO) it must be accessible to the target Org VDC (the ISO is not migrated).

An example is worth 1000 words:

POST https://{{host}}/api/vdc/5b2abda9-aa2e-4745-a33b-b4b8fa1dc5f4/action/moveVApp

Content-Type:application/vnd.vmware.vcloud.MoveVAppParams+xml
Accept:application/*+xml;version=36.2

<?xml version="1.0"?>
<MoveVAppParams xmlns="http://www.vmware.com/vcloud/v1.5" xmlns:ns7="http://schemas.dmtf.org/ovf/envelope/1" xmlns:ns8="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSettingData" xmlns:ns9="http://www.vmware.com/schema/ovf">
  <Source href="https://vcd-01a.corp.local/api/vApp/vapp-96d3a015-4a08-4c59-93fa-384b41d4e453"/>
  <NetworkConfigSection>
    <ns7:Info>The configuration parameters for logical networks</ns7:Info>
       <NetworkConfig networkName="vApp-192.168.40.0">
            <Configuration>
                <IpScopes>
                    <IpScope>
                        <IsInherited>false</IsInherited>
                        <Gateway>192.168.40.1</Gateway>
                        <Netmask>255.255.255.0</Netmask>
                        <SubnetPrefixLength>24</SubnetPrefixLength>
                        <IsEnabled>true</IsEnabled>
                        <IpRanges>
                            <IpRange>
<StartAddress>192.168.40.2</StartAddress>
<EndAddress>192.168.40.99</EndAddress>
                            </IpRange>
                        </IpRanges>
                    </IpScope>
                </IpScopes>
                <ParentNetwork href="https://vcd-01a.corp.local/api/admin/network/1b8a200b-7ee7-47d5-81a1-a0dcb3161452" id="1b8a200b-7ee7-47d5-81a1-a0dcb3161452" name="Isol_192.168.33.0-v2t"/>
                <FenceMode>natRouted</FenceMode>
                <RetainNetInfoAcrossDeployments>false</RetainNetInfoAcrossDeployments>
                <Features>
                    <FirewallService>
                        <IsEnabled>true</IsEnabled>
                        <DefaultAction>drop</DefaultAction>
                        <LogDefaultAction>false</LogDefaultAction>
                        <FirewallRule>
                            <IsEnabled>true</IsEnabled>
                            <Description>ssh-VM6</Description>
                            <Policy>allow</Policy>
                            <Protocols>
<Tcp>true</Tcp>
                            </Protocols>
                            <DestinationPortRange>22</DestinationPortRange>
                            <DestinationVm>
<VAppScopedVmId>88445b8a-a9c4-43d5-bfd8-3630994a0a88</VAppScopedVmId>
<VmNicId>0</VmNicId>
<IpType>assigned</IpType>
                            </DestinationVm>
                            <SourcePortRange>Any</SourcePortRange>
                            <SourceIp>Any</SourceIp>
                            <EnableLogging>false</EnableLogging>
                        </FirewallRule>
                        <FirewallRule>
                            <IsEnabled>true</IsEnabled>
                            <Description>ssh-VM5</Description>
                            <Policy>allow</Policy>
                            <Protocols>
<Tcp>true</Tcp>
                            </Protocols>
                            <DestinationPortRange>22</DestinationPortRange>
                            <DestinationVm>
<VAppScopedVmId>e61491e5-56c4-48bd-809a-db16b9619d63</VAppScopedVmId>
<VmNicId>0</VmNicId>
<IpType>assigned</IpType>
                            </DestinationVm>
                            <SourcePortRange>Any</SourcePortRange>
                            <SourceIp>Any</SourceIp>
                            <EnableLogging>false</EnableLogging>
                        </FirewallRule>
                        <FirewallRule>
                            <IsEnabled>true</IsEnabled>
                            <Description>Allow all outgoing traffic</Description>
                            <Policy>allow</Policy>
                            <Protocols>
<Any>true</Any>
                            </Protocols>
                            <DestinationPortRange>Any</DestinationPortRange>
                            <DestinationIp>external</DestinationIp>
                            <SourcePortRange>Any</SourcePortRange>
                            <SourceIp>internal</SourceIp>
                            <EnableLogging>false</EnableLogging>
                        </FirewallRule>
                    </FirewallService>
                    <NatService>
                        <IsEnabled>true</IsEnabled>
                        <NatType>portForwarding</NatType>
                        <Policy>allowTraffic</Policy>
                        <NatRule>
                            <Id>65537</Id>
                            <VmRule>
<ExternalIpAddress>192.168.33.2</ExternalIpAddress>
<ExternalPort>2222</ExternalPort>
<VAppScopedVmId>e61491e5-56c4-48bd-809a-db16b9619d63</VAppScopedVmId>
<VmNicId>0</VmNicId>
<InternalPort>22</InternalPort>
<Protocol>TCP</Protocol>
                            </VmRule>
                        </NatRule>
                        <NatRule>
                            <Id>65538</Id>
                            <VmRule>
<ExternalIpAddress>192.168.33.2</ExternalIpAddress>
<ExternalPort>22</ExternalPort>
<VAppScopedVmId>88445b8a-a9c4-43d5-bfd8-3630994a0a88</VAppScopedVmId>
<VmNicId>0</VmNicId>
<InternalPort>22</InternalPort>
<Protocol>TCP</Protocol>
                            </VmRule>
                        </NatRule>
                    </NatService>
                </Features>
                <SyslogServerSettings/>
                <RouterInfo>
                    <ExternalIp>192.168.33.2</ExternalIp>
                </RouterInfo>
                <GuestVlanAllowed>false</GuestVlanAllowed>
                <DualStackNetwork>false</DualStackNetwork>
            </Configuration>
            <IsDeployed>true</IsDeployed>
        </NetworkConfig>
  </NetworkConfigSection>
  <SourcedItem>
    <Source href="https://vcd-01a.corp.local/api/vApp/vm-fa47982a-120a-421a-a321-62e764e10b80"/>
    <InstantiationParams>
      <NetworkConnectionSection>
        <ns7:Info>Network Connection Section</ns7:Info>
        <PrimaryNetworkConnectionIndex>0</PrimaryNetworkConnectionIndex>
                <NetworkConnection network="vApp-192.168.40.0" needsCustomization="false">
                    <NetworkConnectionIndex>0</NetworkConnectionIndex>
                    <IpAddress>192.168.40.2</IpAddress>
                    <IpType>IPV4</IpType>
                    <ExternalIpAddress>192.168.33.3</ExternalIpAddress>
                    <IsConnected>true</IsConnected>
                    <MACAddress>00:50:56:28:00:30</MACAddress>
                    <IpAddressAllocationMode>POOL</IpAddressAllocationMode>
                    <SecondaryIpAddressAllocationMode>NONE</SecondaryIpAddressAllocationMode>
                    <NetworkAdapterType>VMXNET3</NetworkAdapterType>
                </NetworkConnection>
      </NetworkConnectionSection>
    </InstantiationParams>
    <StorageProfile href="https://vcd-01a.corp.local/api/vdcStorageProfile/bdf68bda-8ab9-4ec1-970a-fafc34cdcf5b"/>
  </SourcedItem>
    <SourcedItem>
    <Source href="https://vcd-01a.corp.local/api/vApp/vm-a1f87b29-60e7-45ee-86e2-5b749a81ed19"/>
    <InstantiationParams>
      <NetworkConnectionSection>
        <ns7:Info>Network Connection Section</ns7:Info>
        <PrimaryNetworkConnectionIndex>0</PrimaryNetworkConnectionIndex>
                <NetworkConnection network="vApp-192.168.40.0" needsCustomization="false">
                    <NetworkConnectionIndex>0</NetworkConnectionIndex>
                    <IpAddress>192.168.40.3</IpAddress>
                    <IpType>IPV4</IpType>
                    <ExternalIpAddress>192.168.33.2</ExternalIpAddress>
                    <IsConnected>true</IsConnected>
                    <MACAddress>00:50:56:28:00:37</MACAddress>
                    <IpAddressAllocationMode>POOL</IpAddressAllocationMode>
                    <SecondaryIpAddressAllocationMode>NONE</SecondaryIpAddressAllocationMode>
                    <NetworkAdapterType>VMXNET3</NetworkAdapterType>
                </NetworkConnection>
      </NetworkConnectionSection>
    </InstantiationParams>
    <StorageProfile href="https://vcd-01a.corp.local/api/vdcStorageProfile/bdf68bda-8ab9-4ec1-970a-fafc34cdcf5b"/>
  </SourcedItem>
</MoveVAppParams>

In our case this is routed two VM vApp where both VMs are connected to the same routed vApp network named vApp-192.168.40.0 with set of port forwarding NAT rules and FW policies configured on the vApp router.

  • As said above it is a POST call against the target Org VDC – in our case 5b2abda9-aa2e-4745-a33b-b4b8fa1dc5f4.
  • The payload starts with the source vApp (vapp-96d3a015-4a08-4c59-93fa-384b41d4e453).
  • The follows the NetworkConfig section. Here we are describing the target vApp network topology. In general that section should be identical to the source vApp payload with the only difference being the ParentNetwork must refer to an Org VDC network from the target Org VDC. So in our case we are describing the subnet and IP pools of the vApp network (vApp-192.168.40.0), its new parent Org VDC network (Isol_192.168.33.0-v2t) and the way these two are connected (bridged or natRouted). As we are using routed vApp it is natRouted in our case. Then follow (optional) routed vApp features such as firewall policies or NAT rules. They should be pretty self explanatory and again they are usually identical to the source vApp section from the NetworkConfig. Note that VM object rules use VAppScopedVmId that is random looking UUID that changes every time the vApp is moved.
    We should highlight that IP addresses allocated to the vApp (its VMs or vApp routers) from the source Org VDC network are retained during the migration (and must be available in the target Org VDC network static IP pool).
  • After the NetworkConfigSection follow details of every vApp VM (SourcedItem) – to which vApp network(s) defined above the VM network interface(s) will connect (with which IP/MAC and IPAM mode) and which storage, placement and compute policies (StorageProfile, VdcComputePolicy and ComputePolicy) it should use. For the NIC section you usually take the source VM equivalent info. The vApp network name must be the one defined in the NetworkConfig section. For the policies you must obviously use target Org VDC policies as these will change.
  • BTW storage policy can be also defined at the disk level with DiskSetting element (the followin excerpt shows when named disk is connected)
            <DiskSettings>
                <DiskId>2016</DiskId>
                <SizeMb>8</SizeMb>
                <UnitNumber>0</UnitNumber>
                <BusNumber>1</BusNumber>
                <AdapterType>3</AdapterType>
                <ThinProvisioned>true</ThinProvisioned>
                <Disk href="https://vcd-01a.corp.local/api/disk/567bdd04-4905-4a62-95e7-9f4850f85240" id="urn:vcloud:disk:567bdd04-4905-4a62-95e7-9f4850f85240" type="application/vnd.vmware.vcloud.disk+xml" name="Disk1"/>
                <StorageProfile href="https://vcd-01a.corp.local/api/vdcStorageProfile/1f8bf2df-d28c-4bec-900c-726f20507b5b"/>
                <overrideVmDefault>true</overrideVmDefault>
                <iops>0</iops>
                <VirtualQuantityUnit>byte</VirtualQuantityUnit>
                <resizable>true</resizable>
                <encrypted>false</encrypted>
                <shareable>false</shareable>
                <sharingType>None</sharingType>
            </DiskSettings>

The actual vApp migration triggers async operation that takes some time to complete. If you observe what is happening in VCD and vCenter you will see that a new temporary “-generated” vApp is created in the target Org VDC with the VMs being first migrated there. In case of routed vApps the vApp routers (edge service gateways or Tier-1 gateways) must be deployed as well. When all the vApp VMs are moved the source vApp is removed and the target vApp with the same identity is created and the VMs from generated vApp are relocated there. If all goes as expected the generated vApp is removed.

Shout-out to Julian – the engineering brain behind this feature.

vROps Tenant App Upgrade Issue

While performing vROps Tenant App 2.6.2 upgrade in my lab I have encounter the following error:
Failed to install updates(Error while running installation tests).

Quick check of the /opt/vmware/var/log/vami/updatecli.log shows that the appliance is running out of free space on the root / partition.

24/02/2022 15:01:34 [INFO] Running /opt/vmware/var/lib/vami/update/data/job/32/test_command
Verifying packages…
Preparing packages…
installing package tenant-app-8.6.0-18724818.noarch needs 1231MB on the / filesystem
24/02/2022 15:01:41 [ERROR] Failed with exit code 56576

The reason why this is happening is that the tenant app runs as a docker container and the older versions have not been purged. I can see in my particular case I have above 7 GB of docker images on the filesystem:

root@tenantapp [ /var/lib/docker/overlay2 ]# du -h -d 0
7.5G    .

/var/lib/docker/overlay2 ]# docker image ls
REPOSITORY                                 TAG                 IMAGE ID            CREATED             SIZE
vmware/vrops-vcd-tenant-app-db-cassandra   2.6.2-19235005      057345d369fd        5 weeks ago         634MB
vmware/vrops-vcd-tenant-app-db-cassandra   latest              057345d369fd        5 weeks ago         634MB
vmware/vrops-vcd-tenant-app-ui             2.6.2-19235005      4e90d15d3116        5 weeks ago         396MB
vmware/vrops-vcd-tenant-app-ui             latest              4e90d15d3116        5 weeks ago         396MB
vmware/vrops-vcd-tenant-app-plugin         2.6.2-19235004      de4cb469fb65        5 weeks ago         309MB
vmware/vrops-vcd-tenant-app-plugin         latest              de4cb469fb65        5 weeks ago         309MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.6.1-18326916      3b7ef9b0c10c        7 months ago        597MB
vmware/vrops-vcd-tenant-app-ui             2.6.1-18326916      b66e34b5d59b        7 months ago        368MB
vmware/vrops-vcd-tenant-app-plugin         2.6.1-18326915      f97bc56c3d61        7 months ago        286MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.6.0-17922920      0d5eb9de1cb7        10 months ago       581MB
vmware/vrops-vcd-tenant-app-ui             2.6.0-17922920      3ffdeee597ca        10 months ago       354MB
vmware/vrops-vcd-tenant-app-plugin         2.6.0-17922919      b23bd4eb6a2d        10 months ago       268MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.5.0-16990343      af72dbf16623        16 months ago       536MB
vmware/vrops-vcd-tenant-app-ui             2.5.0-16990343      62b09bd2a0a2        16 months ago       252MB
vmware/vrops-vcd-tenant-app-plugin         2.5.0-16941875      1217f67efd9d        17 months ago       190MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.4.0-15996298      a0d906a5cc5a        22 months ago       494MB
vmware/vrops-vcd-tenant-app-ui             2.4.0-15996298      777fe7bc0c1f        22 months ago       240MB
vmware/vrops-vcd-tenant-app-plugin         2.4.0-15996297      b85369dbf061        22 months ago       180MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.3.0-14826918      556121e468da        2 years ago         466MB
vmware/vrops-vcd-tenant-app-ui             2.3.0-14826918      eb77c613e9ad        2 years ago         224MB
vmware/vrops-vcd-tenant-app-plugin         2.3.0-14826917      e598e66d4818        2 years ago         158MB

After checking with Tenant App engineering, the problem has been fixed in the newest (8.6.1) version that does purge the old images upon successful upgrade. But if you hit the issue you will need to cleanup the old images with the follwing command:

docker image rm -f <image ID>

BTW if you delete wrong images you can always recreate them with the following commands:

docker load -i /opt/vmware/app/vrops-vcd-tenant-app-ui.tar.gz
docker load -i /opt/vmware/db/vrops-vcd-tenant-app-db-cassandra.tar.gz
docker load -i /opt/vmware/plugin/vrops-vcd-tenant-app-plugin.tar.gz

Update 3/3/2022
I have noticed the Self Health page on Tenant App admin UI in the Support section did not display any running services even though they (the docker containers) were running properly. After checking with engineering this can be fixed by modifying permissions of docker.sock file with:

chmod 666 /run/docker.sock

Before fix
After fix

Upgrading VMware Cloud Director with Single API Call

Today I have upgraded two VMware Cloud Director environments to version 10.3.2 each with 3 appliances with two API calls. All that thanks to VMware Cloud Lifecycle Manager.

curl --location --request PUT 'https://172.28.59.10:9443/api/v1/lcm/environment/vcd-env-2/product/vcd-1/upgrade?action=UPGRADE' \
--header 'Content-Type: application/json' \
--header 'JSESSIONID: 4E908BE08C282AF45B1CF5BB6736FE32' \
--data-raw '{
    "upgradeDetails": {
        "targetVersion": "10.3.2",
        "additionalProperties": {
            "keepBackup": true
        }
    }
}'

As I have blogged about the VMware Cloud Provider Lifecycle Manager (VCP LCM) in the past I just want to highlight how it handles frequent updates of the solutions it manages. VCP LCM is now in version 1.2 and deployed as an appliance. It is update about twice a year. However when one of the solution that it manages has a new update (VMware Cloud Director, Usage Meter, Tenant App) a small LCM interop update bundle is released (VCP LCM download page, Driver and Tools section) that provides support for update of the newly released solution(s). That way there is no lag or need to wait for new (big) VCP LCM release.

So in my case all I had to do was just download and apply (unzip and execute) the new LCM interop bundle, download the VCD 10.3.2 update file to my VCP LCM repo (NFS) and trigger the API update call mentioned above.

The interop bundle(s) are versioned independently from the VCP LCM itself, are cumulative and do check if the actual underlying VCP LCM will suport the bundle (for example LCM Interop bundle 1.2.1 can be installed on top of VCP LCM 1.2 or 1.2.0.1 but not on 1.1). This can be seen in the interop_bundle_version.properties file (inside .lcm zipped file).

product.version=1.2.0,1.2.0.1
vcplcm_interop_bundle.build_number=19239142
vcplcm_interop_bundle.version=1.2.1

I should mention that VCP LCM only supports environments that it created. It does have import functionality, but that is to import existing VCP LCM deployed environments as it does not (currently) keep their state when it is rebooted.

So what is actually happening when the update is triggered with the API call? In a high level: VCP LCM will first check that the to be updated environment (VCD installation) is running properly, that it can access all its cells, etc. Then it will shut down the VCD service and database and create snapshot of all cells for quick roll back if anything goes wrong. Then it restarts the database and creates regular backup which is saved to VCD transfer share. Update binaries are then uploaded and executed on every cell followed by database schema upgrade. Cells are rebooted and checks are performed that VCD came up properly with the correct version. If so snapshots can be removed and optionally the regular backup as well.

Happy upgrades!

Layer 2 VPN to the Cloud – Part III

I feel like it is time for another update on VMware Cloud Director (VCD) capabilities regarding establishing L2 VPN between on-prem location and Org VDC. The previous blog posts were written in 2015 and 2018 and do not reflect changes related to usage of NSX-T as the underlying cloud network platform.

The primary use case for L2 VPN to the cloud is migration of workloads to the cloud when the L2 VPN tunnel is temporarily established until migration of all VMs on single network is done. The secondary use case is Disaster Recovery but I feel that running L2 VPN permanently is not the right approach.

But that is not the topic of today’s post. VCD does support setting up L2 VPN on tenant’s Org VDC Gateway (Tier-1 GW) from version 10.2 however still it is hidden, API-only feature (the GUI is finally coming soon … in VCD 10.3.1). The actual set up is not trivial as the underlying NSX-T technology requires first IPSec VPN tunnel to be established to secure the L2 VPN client to server communication. VMware Cloud Director Availability (VCDA) version 4.2 is an addon disaster recovery and migration solution for tenant workloads on top of VCD and it simplifies the set up of both the server (cloud) and client (on-prem) L2 VPN endpoints from its own UI. To reiterate, VCDA is not needed to set up L2 VPN, but it makes it much easier.

The screenshot above shows the VCDA UI plugin embeded in the VCD portal. You can see three L2 VPN session has been created on VDC Gateway GW1 (NSX-T Tier-1 backed) in ACME-PAYG Org VDC. Each session uses different L2 PVN client endpoint type.

The on-prem client can be existing NSX-T tier-0 or tier-1 GW, NSX-T autonomous edge or standalone Edge client. And each requires different type of configuration, so let me discuss each separately.

NSX-T Tier-0 or Tier-1 Gateway

This is mostly suitable for tenants who are running existing NSX-T environment on-prem. They will need to set up both IPSec and L2VPN tunnels directly in NSX-T Manager and is the most complicated process of the three options. On either Tier-0 or Tier-1 GW they will first need to set up IPSec VPN and L2 VPN client services, then the L2VPN session must be created with local and remote endpoint IPs and Peer Code that must be retrieved before via VCD API (it is not available in VCDA UI, but will be available in VCD UI in 10.3.1 or newer). The peer code contains all necessary configuration for the parent IPSec session in Base64 encoding.

Lastly local NSX-T segments to be bridged to the cloud can be configured for the session. The parent IPSec session will be created automagically by NSX-T and after while you should see green status for both IPSec and L2 VPN sessions.

Standalone Edge Client

This option leverages the very light (150 MB) OVA appliance that can be downloaded from NSX-T download website and actually works both with NSX-V and NSX-T L2 VPN server endpoints. It does not require any NSX installation. It provides no UI and its configuration must be done at the time of deployment via OVF parameters. Again the peer code must be provided.

Autonomous Edge

This is the prefered option for non-NSX environments. Autonomous edge is a regular NSX-T edge node that is deployed from OVA, but is not connected to NSX-T Manager. During the OVA deployment Is Autonomous Edge checkbox must be checked. It provides its own UI and much better performance and configurability. Additionally the client tunnel configuration can be done via the VCDA on-premises appliance UI. You just need to deploy the autonomous edge appliance and VCDA will discover it and let you manage it from then via its UI.

This option requires no need to retrieve the Peer Code as the VCDA plugin will retrieve all necessary information from the cloud site.