Control System Admin Access to VMware Cloud Director

When VMware Cloud Director is deployed in public environment setup it is a good practice to restrict the system admin access only for specific networks so no brute force attack can be triggered against the publicly available UI/API end points.

There is actually a relatively easy way to achieve this via any web application firewall (WAF) with URI access filter. The strategy is to protect only the provider authentication end points which is much easier than to try to distinguish between provider and tenant URIs.

As the access (attack) can be done either through UI or API the solution should address both. Let us first talk about the UI. The tenants and provider use specific URL to access their system/org context but we do not really need to care about this at all. The UI is actually using (public) APIs so there is nothing needed to harden the UI specifically if we harder the API endpoint. Well, the OAuth and SAML logins are exception so let me tackle them separately.

So how can you authenticate to VCD via API?

Integrated Authentication

The integrated basic authentication consisting of login/password is used for VCD local accounts and LDAP accounts. The system admin (provider context) uses /cloudapi/1.0.0/sessions/provider API endpoint while the tenants use /cloudapi/1.0.0/sessions.

The legacy (common for both providers and tenant) API endpoint /api/sessions has been deprecated since API version 33.0 (introduced in VCD 10.0). Note that deprecated does not mean removed and it is still available even with API version 36.x so you can expect to be around for some time as VCD keeps backward compatible APIs for few years.

You might notice that there is in a Feature Flags section the possibility to enable “Legacy Login Removal”.

Feature Flags

Enabling this feature will disable legacy login both for tenants and providers however only if you use alpha API version (in the case of VCD 10.3.3.1 it is 37.0.0-alpha-1652216327). So this is really only useful for testing your own tooling where you can force the usage of that particular API version. The UI and any 3rd party tooling will still use the main (supported) API versions where the legacy endpoint will still work.

However, you can forcefully disable it for provider context for any API version with the following CMT command (run from any cell, no need to restart the services):

/opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n vcloud.api.legacy.nonprovideronly -v true

The providers will need to use only the new cloudapi/1.0.0/providers/session endpoint. So be careful as it might break some legacy tools!

API Access Token Authentication

This is a fairly new method of authentication to VCD (introduced in version 10.3.1) that uses once generated secret token for API authentication. It is mainly used by automation or orchestration tools. The actual method of generating session token requires access to the tenant or provider oauth API endpoints:

/oauth/tenant/<tenant_name>/token

/oauth/provider/token

This makes it easy to disable provider context via URI filter.

SAML/OAuth Authentication via UI

Here we must distinguish the API and UI behavior. For SAML, the UI is using /login/org/<org-name>/… endpoint. The provider context is using the default SYSTEM org as the org name. So we must filter URI starting with /login/org/SYSTEM.

For OAuth the UI is using the same endpoint as API access token authentication /oauth/tenant vs /oauth/provider. /login/oauth?service=provider

For API SAML/OAuth logins cloudapi/1.0.0/sessions vs cloudapi/1.0.0/sessions/provider endpoints are used.

WAF Filtering Example

Here is an example how to set up URI filtering with VMware NSX Advanced Load Balancer.

  1. We need to obviously set up VCD cell (SSL) pool and Virtual Service for the external IP and port 443 (SSL).
  2. The virtual service application profile must be set to System-Secure-HTTP as we need to terminate SSL sessions on the load balancer in order to inspect the URI. That means the public SSL certificate must be uploaded to load balancer as well. The cells can actually use self signed certs especially if you use the new console proxy that does not require SSL pass through and works on port 443.
  3. In the virtual service go to Policies > HTTP Request and create following rules:
    Rule Name: Provider Access
    Client IP Address: Is Not: <admin subnets>
    Path: Criteria – Begins with:
    /cloudapi/1.0.0/sessions/provider
    /oauth/provider
    /login/oauth?service=provider
    /login/org/SYSTEM
    Content Switch: Local response – Status Code: 403.
WAF Access Rule

And this is what you can observe when trying to log in via integrated authentication from non-authorized subnets:

And here is an example of SAML login:

Console Proxy Traffic Enhancements

VMware Cloud Director provides direct access to tenant’s VM consoles via proxying the vSphere console traffic from ESXi hosts running the workload, through VCD cells, load balancer to the end-user browser or console client. This is fairly complex process that requires dedicated TCP port (by default 8443), certificate and a load balancer configuriation without SSL termination (SSL pass-through).

Especially the dedicated certificate requirement is annoying as any change to this certificate cannot be done at the load balancer level, but must be performed on every cell in the VCD server group and those need to be restarted.

However, VMware Cloud Director 10.3.3 for the first time showcases newly improved console proxy. It is still an experimental feature and therefore not enabled by default, but can be accessed in the Feature Flags section of the provider Administration.

By enabling it, you switch to the enhanced console proxy implementation that gives you the following benefits:
  • Console proxy traffic is now going over the default HTTPS 443 port together with UI/API. That means no need for dedicated port/IP/certificate.
  • This traffic can be SSL terminated at the load balancer. This means no need for specific load balancing configuration that needed the SSL pass through of port 8443.
  • The Public Addresses Console Proxy section is irrelevant and not used

The followin diagram shows the high level implementation (credit and shout-out goes to Francois Misiak – the brain behind the new functionality).

As this feature has not yet been tested at scale it is marked as experimental but it is expected that this will be the default console proxy mechanism starting in the next major VMware Cloud Director release. Note that you will still be able to revert to the legacy one if needed.

How to Move (Live) vApps Across Org VDCs

VMware Cloud Director has secret not well known API only feature that allows to move vApps across Org VDCs while they are running. This feature has been purposefully made for the NSX-V to NSX-T Migration Tool, but can be used for other use cases hence the reason here to shed more lights on it.

We should start with mentioning that vApp migration across Org VDCs has been around since forever – in the UI you can select an existing vApp and you will find out Move command in the action menu. But that is something completely different – that method does in the background (vSphere) cloning operation with deletion of the source VM(s). Thus it is slow, requires vApp to be powered off and creates new identity for the vApp and VMs after the move (their UUIDs will change). The UI is using API method POST /vdc/{id}/action/cloneVApp with flag IsSourceDelete set to true.

So the above method is *not* the subject of this article – instead we will talk about API method POST /VDC/{id}/action/moveVApp.

The main differences are:

  • vMotion (e.g. live, share nothing and cross vCenter) is used
  • identity of vApp and VM does not change (UUID is retained)
  • vApp can be in running state
  • VMs can be connected to Named (independent) disks
  • Fast provisioning (linked clones) support

The moveVApp API is fairly new and still evolving. For example VMware Cloud Director 10.3.2 added support for move router vApps. Movement of running encrypted vApps will be supported in the future. So be aware there might be limitations based on your VCD version.

The vApp can be moved across Org VDCs/Provider VDCs/clusters, vCenters of the same tenant but it will not work across associated Orgs for example. It also cannot be used for moving vApps across clusters/resource pools in the same Org VDC (for that use Migrate VM UI/API). Obviously the underlying vSphere platform must support vMotions across the involved clusters or vCenters. NSX backing (V to T) change is also supported.

The API method is using the target Org VDC endpoint with quite elaborate payload that must describe which vApp is being moved, how will the target network configuration look like (obviously parent Org VDC networks will change) and what storage, compute or placement policies will be used by every vApp VM at the target.

Note that if a VM is connected to a media (ISO) it must be accessible to the target Org VDC (the ISO is not migrated).

An example is worth 1000 words:

POST https://{{host}}/api/vdc/5b2abda9-aa2e-4745-a33b-b4b8fa1dc5f4/action/moveVApp

Content-Type:application/vnd.vmware.vcloud.MoveVAppParams+xml
Accept:application/*+xml;version=36.2

<?xml version="1.0"?>
<MoveVAppParams xmlns="http://www.vmware.com/vcloud/v1.5" xmlns:ns7="http://schemas.dmtf.org/ovf/envelope/1" xmlns:ns8="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSettingData" xmlns:ns9="http://www.vmware.com/schema/ovf">
  <Source href="https://vcd-01a.corp.local/api/vApp/vapp-96d3a015-4a08-4c59-93fa-384b41d4e453"/>
  <NetworkConfigSection>
    <ns7:Info>The configuration parameters for logical networks</ns7:Info>
       <NetworkConfig networkName="vApp-192.168.40.0">
            <Configuration>
                <IpScopes>
                    <IpScope>
                        <IsInherited>false</IsInherited>
                        <Gateway>192.168.40.1</Gateway>
                        <Netmask>255.255.255.0</Netmask>
                        <SubnetPrefixLength>24</SubnetPrefixLength>
                        <IsEnabled>true</IsEnabled>
                        <IpRanges>
                            <IpRange>
<StartAddress>192.168.40.2</StartAddress>
<EndAddress>192.168.40.99</EndAddress>
                            </IpRange>
                        </IpRanges>
                    </IpScope>
                </IpScopes>
                <ParentNetwork href="https://vcd-01a.corp.local/api/admin/network/1b8a200b-7ee7-47d5-81a1-a0dcb3161452" id="1b8a200b-7ee7-47d5-81a1-a0dcb3161452" name="Isol_192.168.33.0-v2t"/>
                <FenceMode>natRouted</FenceMode>
                <RetainNetInfoAcrossDeployments>false</RetainNetInfoAcrossDeployments>
                <Features>
                    <FirewallService>
                        <IsEnabled>true</IsEnabled>
                        <DefaultAction>drop</DefaultAction>
                        <LogDefaultAction>false</LogDefaultAction>
                        <FirewallRule>
                            <IsEnabled>true</IsEnabled>
                            <Description>ssh-VM6</Description>
                            <Policy>allow</Policy>
                            <Protocols>
<Tcp>true</Tcp>
                            </Protocols>
                            <DestinationPortRange>22</DestinationPortRange>
                            <DestinationVm>
<VAppScopedVmId>88445b8a-a9c4-43d5-bfd8-3630994a0a88</VAppScopedVmId>
<VmNicId>0</VmNicId>
<IpType>assigned</IpType>
                            </DestinationVm>
                            <SourcePortRange>Any</SourcePortRange>
                            <SourceIp>Any</SourceIp>
                            <EnableLogging>false</EnableLogging>
                        </FirewallRule>
                        <FirewallRule>
                            <IsEnabled>true</IsEnabled>
                            <Description>ssh-VM5</Description>
                            <Policy>allow</Policy>
                            <Protocols>
<Tcp>true</Tcp>
                            </Protocols>
                            <DestinationPortRange>22</DestinationPortRange>
                            <DestinationVm>
<VAppScopedVmId>e61491e5-56c4-48bd-809a-db16b9619d63</VAppScopedVmId>
<VmNicId>0</VmNicId>
<IpType>assigned</IpType>
                            </DestinationVm>
                            <SourcePortRange>Any</SourcePortRange>
                            <SourceIp>Any</SourceIp>
                            <EnableLogging>false</EnableLogging>
                        </FirewallRule>
                        <FirewallRule>
                            <IsEnabled>true</IsEnabled>
                            <Description>Allow all outgoing traffic</Description>
                            <Policy>allow</Policy>
                            <Protocols>
<Any>true</Any>
                            </Protocols>
                            <DestinationPortRange>Any</DestinationPortRange>
                            <DestinationIp>external</DestinationIp>
                            <SourcePortRange>Any</SourcePortRange>
                            <SourceIp>internal</SourceIp>
                            <EnableLogging>false</EnableLogging>
                        </FirewallRule>
                    </FirewallService>
                    <NatService>
                        <IsEnabled>true</IsEnabled>
                        <NatType>portForwarding</NatType>
                        <Policy>allowTraffic</Policy>
                        <NatRule>
                            <Id>65537</Id>
                            <VmRule>
<ExternalIpAddress>192.168.33.2</ExternalIpAddress>
<ExternalPort>2222</ExternalPort>
<VAppScopedVmId>e61491e5-56c4-48bd-809a-db16b9619d63</VAppScopedVmId>
<VmNicId>0</VmNicId>
<InternalPort>22</InternalPort>
<Protocol>TCP</Protocol>
                            </VmRule>
                        </NatRule>
                        <NatRule>
                            <Id>65538</Id>
                            <VmRule>
<ExternalIpAddress>192.168.33.2</ExternalIpAddress>
<ExternalPort>22</ExternalPort>
<VAppScopedVmId>88445b8a-a9c4-43d5-bfd8-3630994a0a88</VAppScopedVmId>
<VmNicId>0</VmNicId>
<InternalPort>22</InternalPort>
<Protocol>TCP</Protocol>
                            </VmRule>
                        </NatRule>
                    </NatService>
                </Features>
                <SyslogServerSettings/>
                <RouterInfo>
                    <ExternalIp>192.168.33.2</ExternalIp>
                </RouterInfo>
                <GuestVlanAllowed>false</GuestVlanAllowed>
                <DualStackNetwork>false</DualStackNetwork>
            </Configuration>
            <IsDeployed>true</IsDeployed>
        </NetworkConfig>
  </NetworkConfigSection>
  <SourcedItem>
    <Source href="https://vcd-01a.corp.local/api/vApp/vm-fa47982a-120a-421a-a321-62e764e10b80"/>
    <InstantiationParams>
      <NetworkConnectionSection>
        <ns7:Info>Network Connection Section</ns7:Info>
        <PrimaryNetworkConnectionIndex>0</PrimaryNetworkConnectionIndex>
                <NetworkConnection network="vApp-192.168.40.0" needsCustomization="false">
                    <NetworkConnectionIndex>0</NetworkConnectionIndex>
                    <IpAddress>192.168.40.2</IpAddress>
                    <IpType>IPV4</IpType>
                    <ExternalIpAddress>192.168.33.3</ExternalIpAddress>
                    <IsConnected>true</IsConnected>
                    <MACAddress>00:50:56:28:00:30</MACAddress>
                    <IpAddressAllocationMode>POOL</IpAddressAllocationMode>
                    <SecondaryIpAddressAllocationMode>NONE</SecondaryIpAddressAllocationMode>
                    <NetworkAdapterType>VMXNET3</NetworkAdapterType>
                </NetworkConnection>
      </NetworkConnectionSection>
    </InstantiationParams>
    <StorageProfile href="https://vcd-01a.corp.local/api/vdcStorageProfile/bdf68bda-8ab9-4ec1-970a-fafc34cdcf5b"/>
  </SourcedItem>
    <SourcedItem>
    <Source href="https://vcd-01a.corp.local/api/vApp/vm-a1f87b29-60e7-45ee-86e2-5b749a81ed19"/>
    <InstantiationParams>
      <NetworkConnectionSection>
        <ns7:Info>Network Connection Section</ns7:Info>
        <PrimaryNetworkConnectionIndex>0</PrimaryNetworkConnectionIndex>
                <NetworkConnection network="vApp-192.168.40.0" needsCustomization="false">
                    <NetworkConnectionIndex>0</NetworkConnectionIndex>
                    <IpAddress>192.168.40.3</IpAddress>
                    <IpType>IPV4</IpType>
                    <ExternalIpAddress>192.168.33.2</ExternalIpAddress>
                    <IsConnected>true</IsConnected>
                    <MACAddress>00:50:56:28:00:37</MACAddress>
                    <IpAddressAllocationMode>POOL</IpAddressAllocationMode>
                    <SecondaryIpAddressAllocationMode>NONE</SecondaryIpAddressAllocationMode>
                    <NetworkAdapterType>VMXNET3</NetworkAdapterType>
                </NetworkConnection>
      </NetworkConnectionSection>
    </InstantiationParams>
    <StorageProfile href="https://vcd-01a.corp.local/api/vdcStorageProfile/bdf68bda-8ab9-4ec1-970a-fafc34cdcf5b"/>
  </SourcedItem>
</MoveVAppParams>

In our case this is routed two VM vApp where both VMs are connected to the same routed vApp network named vApp-192.168.40.0 with set of port forwarding NAT rules and FW policies configured on the vApp router.

  • As said above it is a POST call against the target Org VDC – in our case 5b2abda9-aa2e-4745-a33b-b4b8fa1dc5f4.
  • The payload starts with the source vApp (vapp-96d3a015-4a08-4c59-93fa-384b41d4e453).
  • The follows the NetworkConfig section. Here we are describing the target vApp network topology. In general that section should be identical to the source vApp payload with the only difference being the ParentNetwork must refer to an Org VDC network from the target Org VDC. So in our case we are describing the subnet and IP pools of the vApp network (vApp-192.168.40.0), its new parent Org VDC network (Isol_192.168.33.0-v2t) and the way these two are connected (bridged or natRouted). As we are using routed vApp it is natRouted in our case. Then follow (optional) routed vApp features such as firewall policies or NAT rules. They should be pretty self explanatory and again they are usually identical to the source vApp section from the NetworkConfig. Note that VM object rules use VAppScopedVmId that is random looking UUID that changes every time the vApp is moved.
    We should highlight that IP addresses allocated to the vApp (its VMs or vApp routers) from the source Org VDC network are retained during the migration (and must be available in the target Org VDC network static IP pool).
  • After the NetworkConfigSection follow details of every vApp VM (SourcedItem) – to which vApp network(s) defined above the VM network interface(s) will connect (with which IP/MAC and IPAM mode) and which storage, placement and compute policies (StorageProfile, VdcComputePolicy and ComputePolicy) it should use. For the NIC section you usually take the source VM equivalent info. The vApp network name must be the one defined in the NetworkConfig section. For the policies you must obviously use target Org VDC policies as these will change.
  • BTW storage policy can be also defined at the disk level with DiskSetting element (the followin excerpt shows when named disk is connected)
            <DiskSettings>
                <DiskId>2016</DiskId>
                <SizeMb>8</SizeMb>
                <UnitNumber>0</UnitNumber>
                <BusNumber>1</BusNumber>
                <AdapterType>3</AdapterType>
                <ThinProvisioned>true</ThinProvisioned>
                <Disk href="https://vcd-01a.corp.local/api/disk/567bdd04-4905-4a62-95e7-9f4850f85240" id="urn:vcloud:disk:567bdd04-4905-4a62-95e7-9f4850f85240" type="application/vnd.vmware.vcloud.disk+xml" name="Disk1"/>
                <StorageProfile href="https://vcd-01a.corp.local/api/vdcStorageProfile/1f8bf2df-d28c-4bec-900c-726f20507b5b"/>
                <overrideVmDefault>true</overrideVmDefault>
                <iops>0</iops>
                <VirtualQuantityUnit>byte</VirtualQuantityUnit>
                <resizable>true</resizable>
                <encrypted>false</encrypted>
                <shareable>false</shareable>
                <sharingType>None</sharingType>
            </DiskSettings>

The actual vApp migration triggers async operation that takes some time to complete. If you observe what is happening in VCD and vCenter you will see that a new temporary “-generated” vApp is created in the target Org VDC with the VMs being first migrated there. In case of routed vApps the vApp routers (edge service gateways or Tier-1 gateways) must be deployed as well. When all the vApp VMs are moved the source vApp is removed and the target vApp with the same identity is created and the VMs from generated vApp are relocated there. If all goes as expected the generated vApp is removed.

Shout-out to Julian – the engineering brain behind this feature.

vROps Tenant App Upgrade Issue

While performing vROps Tenant App 2.6.2 upgrade in my lab I have encounter the following error:
Failed to install updates(Error while running installation tests).

Quick check of the /opt/vmware/var/log/vami/updatecli.log shows that the appliance is running out of free space on the root / partition.

24/02/2022 15:01:34 [INFO] Running /opt/vmware/var/lib/vami/update/data/job/32/test_command
Verifying packages…
Preparing packages…
installing package tenant-app-8.6.0-18724818.noarch needs 1231MB on the / filesystem
24/02/2022 15:01:41 [ERROR] Failed with exit code 56576

The reason why this is happening is that the tenant app runs as a docker container and the older versions have not been purged. I can see in my particular case I have above 7 GB of docker images on the filesystem:

root@tenantapp [ /var/lib/docker/overlay2 ]# du -h -d 0
7.5G    .

/var/lib/docker/overlay2 ]# docker image ls
REPOSITORY                                 TAG                 IMAGE ID            CREATED             SIZE
vmware/vrops-vcd-tenant-app-db-cassandra   2.6.2-19235005      057345d369fd        5 weeks ago         634MB
vmware/vrops-vcd-tenant-app-db-cassandra   latest              057345d369fd        5 weeks ago         634MB
vmware/vrops-vcd-tenant-app-ui             2.6.2-19235005      4e90d15d3116        5 weeks ago         396MB
vmware/vrops-vcd-tenant-app-ui             latest              4e90d15d3116        5 weeks ago         396MB
vmware/vrops-vcd-tenant-app-plugin         2.6.2-19235004      de4cb469fb65        5 weeks ago         309MB
vmware/vrops-vcd-tenant-app-plugin         latest              de4cb469fb65        5 weeks ago         309MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.6.1-18326916      3b7ef9b0c10c        7 months ago        597MB
vmware/vrops-vcd-tenant-app-ui             2.6.1-18326916      b66e34b5d59b        7 months ago        368MB
vmware/vrops-vcd-tenant-app-plugin         2.6.1-18326915      f97bc56c3d61        7 months ago        286MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.6.0-17922920      0d5eb9de1cb7        10 months ago       581MB
vmware/vrops-vcd-tenant-app-ui             2.6.0-17922920      3ffdeee597ca        10 months ago       354MB
vmware/vrops-vcd-tenant-app-plugin         2.6.0-17922919      b23bd4eb6a2d        10 months ago       268MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.5.0-16990343      af72dbf16623        16 months ago       536MB
vmware/vrops-vcd-tenant-app-ui             2.5.0-16990343      62b09bd2a0a2        16 months ago       252MB
vmware/vrops-vcd-tenant-app-plugin         2.5.0-16941875      1217f67efd9d        17 months ago       190MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.4.0-15996298      a0d906a5cc5a        22 months ago       494MB
vmware/vrops-vcd-tenant-app-ui             2.4.0-15996298      777fe7bc0c1f        22 months ago       240MB
vmware/vrops-vcd-tenant-app-plugin         2.4.0-15996297      b85369dbf061        22 months ago       180MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.3.0-14826918      556121e468da        2 years ago         466MB
vmware/vrops-vcd-tenant-app-ui             2.3.0-14826918      eb77c613e9ad        2 years ago         224MB
vmware/vrops-vcd-tenant-app-plugin         2.3.0-14826917      e598e66d4818        2 years ago         158MB

After checking with Tenant App engineering, the problem has been fixed in the newest (8.6.1) version that does purge the old images upon successful upgrade. But if you hit the issue you will need to cleanup the old images with the follwing command:

docker image rm -f <image ID>

BTW if you delete wrong images you can always recreate them with the following commands:

docker load -i /opt/vmware/app/vrops-vcd-tenant-app-ui.tar.gz
docker load -i /opt/vmware/db/vrops-vcd-tenant-app-db-cassandra.tar.gz
docker load -i /opt/vmware/plugin/vrops-vcd-tenant-app-plugin.tar.gz

Update 3/3/2022
I have noticed the Self Health page on Tenant App admin UI in the Support section did not display any running services even though they (the docker containers) were running properly. After checking with engineering this can be fixed by modifying permissions of docker.sock file with:

chmod 666 /run/docker.sock

Before fix
After fix