Console Proxy Traffic Enhancements

VMware Cloud Director provides direct access to tenant’s VM consoles via proxying the vSphere console traffic from ESXi hosts running the workload, through VCD cells, load balancer to the end-user browser or console client. This is fairly complex process that requires dedicated TCP port (by default 8443), certificate and a load balancer configuriation without SSL termination (SSL pass-through).

Especially the dedicated certificate requirement is annoying as any change to this certificate cannot be done at the load balancer level, but must be performed on every cell in the VCD server group and those need to be restarted.

However, VMware Cloud Director 10.3.3 for the first time showcases newly improved console proxy. It is still an experimental feature and therefore not enabled by default, but can be accessed in the Feature Flags section of the provider Administration.

By enabling it, you switch to the enhanced console proxy implementation that gives you the following benefits:
  • Console proxy traffic is now going over the default HTTPS 443 port together with UI/API. That means no need for dedicated port/IP/certificate.
  • This traffic can be SSL terminated at the load balancer. This means no need for specific load balancing configuration that needed the SSL pass through of port 8443.
  • The Public Addresses Console Proxy section is irrelevant and not used

The followin diagram shows the high level implementation (credit and shout-out goes to Francois Misiak – the brain behind the new functionality).

As this feature has not yet been tested at scale it is marked as experimental but it is expected that this will be the default console proxy mechanism starting in the next major VMware Cloud Director release. Note that you will still be able to revert to the legacy one if needed.

How to Move (Live) vApps Across Org VDCs

VMware Cloud Director has secret not well known API only feature that allows to move vApps across Org VDCs while they are running. This feature has been purposefully made for the NSX-V to NSX-T Migration Tool, but can be used for other use cases hence the reason here to shed more lights on it.

We should start with mentioning that vApp migration across Org VDCs has been around since forever – in the UI you can select an existing vApp and you will find out Move command in the action menu. But that is something completely different – that method does in the background (vSphere) cloning operation with deletion of the source VM(s). Thus it is slow, requires vApp to be powered off and creates new identity for the vApp and VMs after the move (their UUIDs will change). The UI is using API method POST /vdc/{id}/action/cloneVApp with flag IsSourceDelete set to true.

So the above method is *not* the subject of this article – instead we will talk about API method POST /VDC/{id}/action/moveVApp.

The main differences are:

  • vMotion (e.g. live, share nothing and cross vCenter) is used
  • identity of vApp and VM does not change (UUID is retained)
  • vApp can be in running state
  • VMs can be connected to Named (independent) disks
  • Fast provisioning (linked clones) support

The moveVApp API is fairly new and still evolving. For example VMware Cloud Director 10.3.2 added support for move router vApps. Movement of running encrypted vApps will be supported in the future. So be aware there might be limitations based on your VCD version.

The vApp can be moved across Org VDCs/Provider VDCs/clusters, vCenters of the same tenant but it will not work across associated Orgs for example. It also cannot be used for moving vApps across clusters/resource pools in the same Org VDC (for that use Migrate VM UI/API). Obviously the underlying vSphere platform must support vMotions across the involved clusters or vCenters. NSX backing (V to T) change is also supported.

The API method is using the target Org VDC endpoint with quite elaborate payload that must describe which vApp is being moved, how will the target network configuration look like (obviously parent Org VDC networks will change) and what storage, compute or placement policies will be used by every vApp VM at the target.

Note that if a VM is connected to a media (ISO) it must be accessible to the target Org VDC (the ISO is not migrated).

An example is worth 1000 words:

POST https://{{host}}/api/vdc/5b2abda9-aa2e-4745-a33b-b4b8fa1dc5f4/action/moveVApp

Content-Type:application/vnd.vmware.vcloud.MoveVAppParams+xml
Accept:application/*+xml;version=36.2

<?xml version="1.0"?>
<MoveVAppParams xmlns="http://www.vmware.com/vcloud/v1.5" xmlns:ns7="http://schemas.dmtf.org/ovf/envelope/1" xmlns:ns8="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSettingData" xmlns:ns9="http://www.vmware.com/schema/ovf">
  <Source href="https://vcd-01a.corp.local/api/vApp/vapp-96d3a015-4a08-4c59-93fa-384b41d4e453"/>
  <NetworkConfigSection>
    <ns7:Info>The configuration parameters for logical networks</ns7:Info>
       <NetworkConfig networkName="vApp-192.168.40.0">
            <Configuration>
                <IpScopes>
                    <IpScope>
                        <IsInherited>false</IsInherited>
                        <Gateway>192.168.40.1</Gateway>
                        <Netmask>255.255.255.0</Netmask>
                        <SubnetPrefixLength>24</SubnetPrefixLength>
                        <IsEnabled>true</IsEnabled>
                        <IpRanges>
                            <IpRange>
<StartAddress>192.168.40.2</StartAddress>
<EndAddress>192.168.40.99</EndAddress>
                            </IpRange>
                        </IpRanges>
                    </IpScope>
                </IpScopes>
                <ParentNetwork href="https://vcd-01a.corp.local/api/admin/network/1b8a200b-7ee7-47d5-81a1-a0dcb3161452" id="1b8a200b-7ee7-47d5-81a1-a0dcb3161452" name="Isol_192.168.33.0-v2t"/>
                <FenceMode>natRouted</FenceMode>
                <RetainNetInfoAcrossDeployments>false</RetainNetInfoAcrossDeployments>
                <Features>
                    <FirewallService>
                        <IsEnabled>true</IsEnabled>
                        <DefaultAction>drop</DefaultAction>
                        <LogDefaultAction>false</LogDefaultAction>
                        <FirewallRule>
                            <IsEnabled>true</IsEnabled>
                            <Description>ssh-VM6</Description>
                            <Policy>allow</Policy>
                            <Protocols>
<Tcp>true</Tcp>
                            </Protocols>
                            <DestinationPortRange>22</DestinationPortRange>
                            <DestinationVm>
<VAppScopedVmId>88445b8a-a9c4-43d5-bfd8-3630994a0a88</VAppScopedVmId>
<VmNicId>0</VmNicId>
<IpType>assigned</IpType>
                            </DestinationVm>
                            <SourcePortRange>Any</SourcePortRange>
                            <SourceIp>Any</SourceIp>
                            <EnableLogging>false</EnableLogging>
                        </FirewallRule>
                        <FirewallRule>
                            <IsEnabled>true</IsEnabled>
                            <Description>ssh-VM5</Description>
                            <Policy>allow</Policy>
                            <Protocols>
<Tcp>true</Tcp>
                            </Protocols>
                            <DestinationPortRange>22</DestinationPortRange>
                            <DestinationVm>
<VAppScopedVmId>e61491e5-56c4-48bd-809a-db16b9619d63</VAppScopedVmId>
<VmNicId>0</VmNicId>
<IpType>assigned</IpType>
                            </DestinationVm>
                            <SourcePortRange>Any</SourcePortRange>
                            <SourceIp>Any</SourceIp>
                            <EnableLogging>false</EnableLogging>
                        </FirewallRule>
                        <FirewallRule>
                            <IsEnabled>true</IsEnabled>
                            <Description>Allow all outgoing traffic</Description>
                            <Policy>allow</Policy>
                            <Protocols>
<Any>true</Any>
                            </Protocols>
                            <DestinationPortRange>Any</DestinationPortRange>
                            <DestinationIp>external</DestinationIp>
                            <SourcePortRange>Any</SourcePortRange>
                            <SourceIp>internal</SourceIp>
                            <EnableLogging>false</EnableLogging>
                        </FirewallRule>
                    </FirewallService>
                    <NatService>
                        <IsEnabled>true</IsEnabled>
                        <NatType>portForwarding</NatType>
                        <Policy>allowTraffic</Policy>
                        <NatRule>
                            <Id>65537</Id>
                            <VmRule>
<ExternalIpAddress>192.168.33.2</ExternalIpAddress>
<ExternalPort>2222</ExternalPort>
<VAppScopedVmId>e61491e5-56c4-48bd-809a-db16b9619d63</VAppScopedVmId>
<VmNicId>0</VmNicId>
<InternalPort>22</InternalPort>
<Protocol>TCP</Protocol>
                            </VmRule>
                        </NatRule>
                        <NatRule>
                            <Id>65538</Id>
                            <VmRule>
<ExternalIpAddress>192.168.33.2</ExternalIpAddress>
<ExternalPort>22</ExternalPort>
<VAppScopedVmId>88445b8a-a9c4-43d5-bfd8-3630994a0a88</VAppScopedVmId>
<VmNicId>0</VmNicId>
<InternalPort>22</InternalPort>
<Protocol>TCP</Protocol>
                            </VmRule>
                        </NatRule>
                    </NatService>
                </Features>
                <SyslogServerSettings/>
                <RouterInfo>
                    <ExternalIp>192.168.33.2</ExternalIp>
                </RouterInfo>
                <GuestVlanAllowed>false</GuestVlanAllowed>
                <DualStackNetwork>false</DualStackNetwork>
            </Configuration>
            <IsDeployed>true</IsDeployed>
        </NetworkConfig>
  </NetworkConfigSection>
  <SourcedItem>
    <Source href="https://vcd-01a.corp.local/api/vApp/vm-fa47982a-120a-421a-a321-62e764e10b80"/>
    <InstantiationParams>
      <NetworkConnectionSection>
        <ns7:Info>Network Connection Section</ns7:Info>
        <PrimaryNetworkConnectionIndex>0</PrimaryNetworkConnectionIndex>
                <NetworkConnection network="vApp-192.168.40.0" needsCustomization="false">
                    <NetworkConnectionIndex>0</NetworkConnectionIndex>
                    <IpAddress>192.168.40.2</IpAddress>
                    <IpType>IPV4</IpType>
                    <ExternalIpAddress>192.168.33.3</ExternalIpAddress>
                    <IsConnected>true</IsConnected>
                    <MACAddress>00:50:56:28:00:30</MACAddress>
                    <IpAddressAllocationMode>POOL</IpAddressAllocationMode>
                    <SecondaryIpAddressAllocationMode>NONE</SecondaryIpAddressAllocationMode>
                    <NetworkAdapterType>VMXNET3</NetworkAdapterType>
                </NetworkConnection>
      </NetworkConnectionSection>
    </InstantiationParams>
    <StorageProfile href="https://vcd-01a.corp.local/api/vdcStorageProfile/bdf68bda-8ab9-4ec1-970a-fafc34cdcf5b"/>
  </SourcedItem>
    <SourcedItem>
    <Source href="https://vcd-01a.corp.local/api/vApp/vm-a1f87b29-60e7-45ee-86e2-5b749a81ed19"/>
    <InstantiationParams>
      <NetworkConnectionSection>
        <ns7:Info>Network Connection Section</ns7:Info>
        <PrimaryNetworkConnectionIndex>0</PrimaryNetworkConnectionIndex>
                <NetworkConnection network="vApp-192.168.40.0" needsCustomization="false">
                    <NetworkConnectionIndex>0</NetworkConnectionIndex>
                    <IpAddress>192.168.40.3</IpAddress>
                    <IpType>IPV4</IpType>
                    <ExternalIpAddress>192.168.33.2</ExternalIpAddress>
                    <IsConnected>true</IsConnected>
                    <MACAddress>00:50:56:28:00:37</MACAddress>
                    <IpAddressAllocationMode>POOL</IpAddressAllocationMode>
                    <SecondaryIpAddressAllocationMode>NONE</SecondaryIpAddressAllocationMode>
                    <NetworkAdapterType>VMXNET3</NetworkAdapterType>
                </NetworkConnection>
      </NetworkConnectionSection>
    </InstantiationParams>
    <StorageProfile href="https://vcd-01a.corp.local/api/vdcStorageProfile/bdf68bda-8ab9-4ec1-970a-fafc34cdcf5b"/>
  </SourcedItem>
</MoveVAppParams>

In our case this is routed two VM vApp where both VMs are connected to the same routed vApp network named vApp-192.168.40.0 with set of port forwarding NAT rules and FW policies configured on the vApp router.

  • As said above it is a POST call against the target Org VDC – in our case 5b2abda9-aa2e-4745-a33b-b4b8fa1dc5f4.
  • The payload starts with the source vApp (vapp-96d3a015-4a08-4c59-93fa-384b41d4e453).
  • The follows the NetworkConfig section. Here we are describing the target vApp network topology. In general that section should be identical to the source vApp payload with the only difference being the ParentNetwork must refer to an Org VDC network from the target Org VDC. So in our case we are describing the subnet and IP pools of the vApp network (vApp-192.168.40.0), its new parent Org VDC network (Isol_192.168.33.0-v2t) and the way these two are connected (bridged or natRouted). As we are using routed vApp it is natRouted in our case. Then follow (optional) routed vApp features such as firewall policies or NAT rules. They should be pretty self explanatory and again they are usually identical to the source vApp section from the NetworkConfig. Note that VM object rules use VAppScopedVmId that is random looking UUID that changes every time the vApp is moved.
    We should highlight that IP addresses allocated to the vApp (its VMs or vApp routers) from the source Org VDC network are retained during the migration (and must be available in the target Org VDC network static IP pool).
  • After the NetworkConfigSection follow details of every vApp VM (SourcedItem) – to which vApp network(s) defined above the VM network interface(s) will connect (with which IP/MAC and IPAM mode) and which storage, placement and compute policies (StorageProfile, VdcComputePolicy and ComputePolicy) it should use. For the NIC section you usually take the source VM equivalent info. The vApp network name must be the one defined in the NetworkConfig section. For the policies you must obviously use target Org VDC policies as these will change.
  • BTW storage policy can be also defined at the disk level with DiskSetting element (the followin excerpt shows when named disk is connected)
            <DiskSettings>
                <DiskId>2016</DiskId>
                <SizeMb>8</SizeMb>
                <UnitNumber>0</UnitNumber>
                <BusNumber>1</BusNumber>
                <AdapterType>3</AdapterType>
                <ThinProvisioned>true</ThinProvisioned>
                <Disk href="https://vcd-01a.corp.local/api/disk/567bdd04-4905-4a62-95e7-9f4850f85240" id="urn:vcloud:disk:567bdd04-4905-4a62-95e7-9f4850f85240" type="application/vnd.vmware.vcloud.disk+xml" name="Disk1"/>
                <StorageProfile href="https://vcd-01a.corp.local/api/vdcStorageProfile/1f8bf2df-d28c-4bec-900c-726f20507b5b"/>
                <overrideVmDefault>true</overrideVmDefault>
                <iops>0</iops>
                <VirtualQuantityUnit>byte</VirtualQuantityUnit>
                <resizable>true</resizable>
                <encrypted>false</encrypted>
                <shareable>false</shareable>
                <sharingType>None</sharingType>
            </DiskSettings>

The actual vApp migration triggers async operation that takes some time to complete. If you observe what is happening in VCD and vCenter you will see that a new temporary “-generated” vApp is created in the target Org VDC with the VMs being first migrated there. In case of routed vApps the vApp routers (edge service gateways or Tier-1 gateways) must be deployed as well. When all the vApp VMs are moved the source vApp is removed and the target vApp with the same identity is created and the VMs from generated vApp are relocated there. If all goes as expected the generated vApp is removed.

Shout-out to Julian – the engineering brain behind this feature.

vROps Tenant App Upgrade Issue

While performing vROps Tenant App 2.6.2 upgrade in my lab I have encounter the following error:
Failed to install updates(Error while running installation tests).

Quick check of the /opt/vmware/var/log/vami/updatecli.log shows that the appliance is running out of free space on the root / partition.

24/02/2022 15:01:34 [INFO] Running /opt/vmware/var/lib/vami/update/data/job/32/test_command
Verifying packages…
Preparing packages…
installing package tenant-app-8.6.0-18724818.noarch needs 1231MB on the / filesystem
24/02/2022 15:01:41 [ERROR] Failed with exit code 56576

The reason why this is happening is that the tenant app runs as a docker container and the older versions have not been purged. I can see in my particular case I have above 7 GB of docker images on the filesystem:

root@tenantapp [ /var/lib/docker/overlay2 ]# du -h -d 0
7.5G    .

/var/lib/docker/overlay2 ]# docker image ls
REPOSITORY                                 TAG                 IMAGE ID            CREATED             SIZE
vmware/vrops-vcd-tenant-app-db-cassandra   2.6.2-19235005      057345d369fd        5 weeks ago         634MB
vmware/vrops-vcd-tenant-app-db-cassandra   latest              057345d369fd        5 weeks ago         634MB
vmware/vrops-vcd-tenant-app-ui             2.6.2-19235005      4e90d15d3116        5 weeks ago         396MB
vmware/vrops-vcd-tenant-app-ui             latest              4e90d15d3116        5 weeks ago         396MB
vmware/vrops-vcd-tenant-app-plugin         2.6.2-19235004      de4cb469fb65        5 weeks ago         309MB
vmware/vrops-vcd-tenant-app-plugin         latest              de4cb469fb65        5 weeks ago         309MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.6.1-18326916      3b7ef9b0c10c        7 months ago        597MB
vmware/vrops-vcd-tenant-app-ui             2.6.1-18326916      b66e34b5d59b        7 months ago        368MB
vmware/vrops-vcd-tenant-app-plugin         2.6.1-18326915      f97bc56c3d61        7 months ago        286MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.6.0-17922920      0d5eb9de1cb7        10 months ago       581MB
vmware/vrops-vcd-tenant-app-ui             2.6.0-17922920      3ffdeee597ca        10 months ago       354MB
vmware/vrops-vcd-tenant-app-plugin         2.6.0-17922919      b23bd4eb6a2d        10 months ago       268MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.5.0-16990343      af72dbf16623        16 months ago       536MB
vmware/vrops-vcd-tenant-app-ui             2.5.0-16990343      62b09bd2a0a2        16 months ago       252MB
vmware/vrops-vcd-tenant-app-plugin         2.5.0-16941875      1217f67efd9d        17 months ago       190MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.4.0-15996298      a0d906a5cc5a        22 months ago       494MB
vmware/vrops-vcd-tenant-app-ui             2.4.0-15996298      777fe7bc0c1f        22 months ago       240MB
vmware/vrops-vcd-tenant-app-plugin         2.4.0-15996297      b85369dbf061        22 months ago       180MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.3.0-14826918      556121e468da        2 years ago         466MB
vmware/vrops-vcd-tenant-app-ui             2.3.0-14826918      eb77c613e9ad        2 years ago         224MB
vmware/vrops-vcd-tenant-app-plugin         2.3.0-14826917      e598e66d4818        2 years ago         158MB

After checking with Tenant App engineering, the problem has been fixed in the newest (8.6.1) version that does purge the old images upon successful upgrade. But if you hit the issue you will need to cleanup the old images with the follwing command:

docker image rm -f <image ID>

BTW if you delete wrong images you can always recreate them with the following commands:

docker load -i /opt/vmware/app/vrops-vcd-tenant-app-ui.tar.gz
docker load -i /opt/vmware/db/vrops-vcd-tenant-app-db-cassandra.tar.gz
docker load -i /opt/vmware/plugin/vrops-vcd-tenant-app-plugin.tar.gz

Update 3/3/2022
I have noticed the Self Health page on Tenant App admin UI in the Support section did not display any running services even though they (the docker containers) were running properly. After checking with engineering this can be fixed by modifying permissions of docker.sock file with:

chmod 666 /run/docker.sock

Before fix
After fix

Upgrading VMware Cloud Director with Single API Call

Today I have upgraded two VMware Cloud Director environments to version 10.3.2 each with 3 appliances with two API calls. All that thanks to VMware Cloud Lifecycle Manager.

curl --location --request PUT 'https://172.28.59.10:9443/api/v1/lcm/environment/vcd-env-2/product/vcd-1/upgrade?action=UPGRADE' \
--header 'Content-Type: application/json' \
--header 'JSESSIONID: 4E908BE08C282AF45B1CF5BB6736FE32' \
--data-raw '{
    "upgradeDetails": {
        "targetVersion": "10.3.2",
        "additionalProperties": {
            "keepBackup": true
        }
    }
}'

As I have blogged about the VMware Cloud Provider Lifecycle Manager (VCP LCM) in the past I just want to highlight how it handles frequent updates of the solutions it manages. VCP LCM is now in version 1.2 and deployed as an appliance. It is update about twice a year. However when one of the solution that it manages has a new update (VMware Cloud Director, Usage Meter, Tenant App) a small LCM interop update bundle is released (VCP LCM download page, Driver and Tools section) that provides support for update of the newly released solution(s). That way there is no lag or need to wait for new (big) VCP LCM release.

So in my case all I had to do was just download and apply (unzip and execute) the new LCM interop bundle, download the VCD 10.3.2 update file to my VCP LCM repo (NFS) and trigger the API update call mentioned above.

The interop bundle(s) are versioned independently from the VCP LCM itself, are cumulative and do check if the actual underlying VCP LCM will suport the bundle (for example LCM Interop bundle 1.2.1 can be installed on top of VCP LCM 1.2 or 1.2.0.1 but not on 1.1). This can be seen in the interop_bundle_version.properties file (inside .lcm zipped file).

product.version=1.2.0,1.2.0.1
vcplcm_interop_bundle.build_number=19239142
vcplcm_interop_bundle.version=1.2.1

I should mention that VCP LCM only supports environments that it created. It does have import functionality, but that is to import existing VCP LCM deployed environments as it does not (currently) keep their state when it is rebooted.

So what is actually happening when the update is triggered with the API call? In a high level: VCP LCM will first check that the to be updated environment (VCD installation) is running properly, that it can access all its cells, etc. Then it will shut down the VCD service and database and create snapshot of all cells for quick roll back if anything goes wrong. Then it restarts the database and creates regular backup which is saved to VCD transfer share. Update binaries are then uploaded and executed on every cell followed by database schema upgrade. Cells are rebooted and checks are performed that VCD came up properly with the correct version. If so snapshots can be removed and optionally the regular backup as well.

Happy upgrades!