vROps Tenant App Upgrade Issue

While performing vROps Tenant App 2.6.2 upgrade in my lab I have encounter the following error:
Failed to install updates(Error while running installation tests).

Quick check of the /opt/vmware/var/log/vami/updatecli.log shows that the appliance is running out of free space on the root / partition.

24/02/2022 15:01:34 [INFO] Running /opt/vmware/var/lib/vami/update/data/job/32/test_command
Verifying packages…
Preparing packages…
installing package tenant-app-8.6.0-18724818.noarch needs 1231MB on the / filesystem
24/02/2022 15:01:41 [ERROR] Failed with exit code 56576

The reason why this is happening is that the tenant app runs as a docker container and the older versions have not been purged. I can see in my particular case I have above 7 GB of docker images on the filesystem:

root@tenantapp [ /var/lib/docker/overlay2 ]# du -h -d 0
7.5G    .

/var/lib/docker/overlay2 ]# docker image ls
REPOSITORY                                 TAG                 IMAGE ID            CREATED             SIZE
vmware/vrops-vcd-tenant-app-db-cassandra   2.6.2-19235005      057345d369fd        5 weeks ago         634MB
vmware/vrops-vcd-tenant-app-db-cassandra   latest              057345d369fd        5 weeks ago         634MB
vmware/vrops-vcd-tenant-app-ui             2.6.2-19235005      4e90d15d3116        5 weeks ago         396MB
vmware/vrops-vcd-tenant-app-ui             latest              4e90d15d3116        5 weeks ago         396MB
vmware/vrops-vcd-tenant-app-plugin         2.6.2-19235004      de4cb469fb65        5 weeks ago         309MB
vmware/vrops-vcd-tenant-app-plugin         latest              de4cb469fb65        5 weeks ago         309MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.6.1-18326916      3b7ef9b0c10c        7 months ago        597MB
vmware/vrops-vcd-tenant-app-ui             2.6.1-18326916      b66e34b5d59b        7 months ago        368MB
vmware/vrops-vcd-tenant-app-plugin         2.6.1-18326915      f97bc56c3d61        7 months ago        286MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.6.0-17922920      0d5eb9de1cb7        10 months ago       581MB
vmware/vrops-vcd-tenant-app-ui             2.6.0-17922920      3ffdeee597ca        10 months ago       354MB
vmware/vrops-vcd-tenant-app-plugin         2.6.0-17922919      b23bd4eb6a2d        10 months ago       268MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.5.0-16990343      af72dbf16623        16 months ago       536MB
vmware/vrops-vcd-tenant-app-ui             2.5.0-16990343      62b09bd2a0a2        16 months ago       252MB
vmware/vrops-vcd-tenant-app-plugin         2.5.0-16941875      1217f67efd9d        17 months ago       190MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.4.0-15996298      a0d906a5cc5a        22 months ago       494MB
vmware/vrops-vcd-tenant-app-ui             2.4.0-15996298      777fe7bc0c1f        22 months ago       240MB
vmware/vrops-vcd-tenant-app-plugin         2.4.0-15996297      b85369dbf061        22 months ago       180MB
vmware/vrops-vcd-tenant-app-db-cassandra   2.3.0-14826918      556121e468da        2 years ago         466MB
vmware/vrops-vcd-tenant-app-ui             2.3.0-14826918      eb77c613e9ad        2 years ago         224MB
vmware/vrops-vcd-tenant-app-plugin         2.3.0-14826917      e598e66d4818        2 years ago         158MB

After checking with Tenant App engineering, the problem has been fixed in the newest (8.6.1) version that does purge the old images upon successful upgrade. But if you hit the issue you will need to cleanup the old images with the follwing command:

docker image rm -f <image ID>

BTW if you delete wrong images you can always recreate them with the following commands:

docker load -i /opt/vmware/app/vrops-vcd-tenant-app-ui.tar.gz
docker load -i /opt/vmware/db/vrops-vcd-tenant-app-db-cassandra.tar.gz
docker load -i /opt/vmware/plugin/vrops-vcd-tenant-app-plugin.tar.gz

Update 3/3/2022
I have noticed the Self Health page on Tenant App admin UI in the Support section did not display any running services even though they (the docker containers) were running properly. After checking with engineering this can be fixed by modifying permissions of docker.sock file with:

chmod 666 /run/docker.sock

Before fix
After fix

VMware Cloud Provider Lifecycle Manager

VMware Cloud Provider Lifecycle Manager is a new product just released in version 1.1. The version 1.0 was not generaly available and thus not widely known. Let me therefore briefly describe what it is and what it can do.

As the name indicates its main goal is to simplify deployment and lifecycle of VMware’s Cloud Provider solutions. Currently in scope are:

  • VMware Cloud Director (10.1.x or 10.2.x)
  • Usage Meter (4.3 and 4.4)
  • vRealize Operations Tenant App (2.4 and 2.5)
  • RabbitMQ (Bitnami based)

The product itself ships as a stateless Docker image that can be deployed as a container for example in Photon OS VM. It has no GUI, but provides REST API. The API calls support the following actions:

  • Deployment of an environment that can consist of one or more products (VCD, UM, …)
  • Upgrade of an environment and product
  • Certificate management
  • Node managment (adding, removing, redeploying nodes)
  • Integration management (integration of a specific products with others)

The image below shows most of the Postman Collection API calls available:

The whole environment (or its product subset) is described in JSON format that is supplied in the API payload. The example below shows payload to deploy VCD with three cells, includes necessary certificates, target vSphere environment and integration with vSphere, NSX-T and RabbitMQ including creation of Provider VDC.

{
    "environmentName": "{{vcd_env_id}}",
    "products": [
        {
            "properties": {
                "installationId": 1,
                "systemName": "vcd-1-vms",
                "dbPassword": "{{password}}",
                "keystorePassword": "{{password}}",
                "clusterFailoverMode": "MANUAL",
                "publicAddress": {
                    "consoleProxyExternalAddress": "{{vcd_lb_ip}}:8443",
                    "restApiBaseHttpUri": "http://{{vcd_lb_ip}}",
                    "restApiBaseUri": "https://{{vcd_lb_ip}}",
                    "tenantPortalExternalHttpAddress": "http://{{vcd_lb_ip}}",
                    "tenantPortalExternalAddress": "https://{{vcd_lb_ip}}"
                },
                "adminEmail": "admin@vcd-test.com",
                "adminFullName": "admin",
                "nfsMount": "{{vcd_nfs_mount}}"
            },
            "certificate": {
                "product": {
                    "certificate": "{{vcd_cert}}",
                    "privateKey": "{{vcd_cert_key}}"
                },
                "restApi": {
                    "certificate": "{{vcd_cert}}"
                },
                "tenantPortal": {
                    "certificate": "{{vcd_cert}}"
                }
            },
            "integrations": [
                {
                    "integrationId": "vcd-01-to-vc-01",
                    "datacenterComponentType": "VCENTER",
                    "hostname": "{{vcenter_hostname}}.{{domainName}}",
                    "integrationUsername": "administrator@vsphere.local",
                    "integrationPassword": "{{vc_password}}",
                    "properties": {
                        "providerVdcs": {
                                "PVDC-1": {
                                "description": "m01vc01-comp-rp",
                                "highestSupportedHardwareVersion": "vmx-14",
                                "isEnabled": true,
                                "clusterName": "{{vc_cluster}}",
                                "resourcePoolname": "{{pvdc_resource_pool}}",
                                "nsxIntegration": "vcd-01-to-nsx-01",
                                "storageProfile":[
                                    "{{pvdc_storage_profile}}"
                                ],
                                "networkPoolname":"NP-1"
                            }
                        }
                    }
                },
                {
                    "integrationId": "vcd-01-to-nsx-01",
                    "datacenterComponentType": "NSXT",
                    "hostname": "{{nsxt_hostname}}.{{domainName}}",
                    "integrationUsername": "admin",
                    "integrationPassword": "{{nsx_password}}",
                    "properties": {
                        "networkPools": {
                            "NP-1": "{{pvdc_np_transport_zone}}"
                        },
                        "vcdExternalNetworks": {
                            "EN-1": {
                                "subnets": [
                                    {
                                        "gateway": "192.168.91.1",
                                        "prefixLength": 24,
                                        "dnsServer1": "",
                                        "ipRanges":  [
                                            {
                                                "startAddress": "192.168.91.150",
                                                "endAddress": "192.168.91.200"
                                            }
                                        ]
                                    }
                                ],
                                "description": "ExternalNetworkCreatedViaVCDBringup",
                                "tier0Name": "{{pvdc_ext_nw_tier0_gw}}"
                            }
                        }
                    }
                },
                {
                    "integrationId": "vcd-01-to-rmq-01",
                    "productType": "RMQ",
                    "hostname": "{{rmq_lb_name}}.{{domainName}}",
                    "port": "{{rmq_port_amqp_ssl}}",
                    "integrationUsername": "svc_vcd",
                    "integrationPassword": "{{password}}",
                    "properties": {
                        "amqpExchange": "systemExchange",
                        "amqpVHost": "/",
                        "amqpUseSSL": true,
                        "amqpSslAcceptAll": true,
                        "amqpPrefix": "vcd"
                    }
                }
            ],
            "productType": "VCD",
            "productId": "{{vcd_product_id}}",
            "version": "10.1.2",
            "license": "{{vcd_license}}",
            "adminPassword": "{{password}}",
            "nodes": [
                {
                    "hostName": "{{vcd_cell_1_name}}.{{domainName}}",
                    "vmName": "{{vcd_cell_1_name}}",
                    "rootPassword": "{{password}}",
                    "gateway": "{{vcd_mgmt_nw_gateway}}",
                    "nics": [
                        {
                            "ipAddress": "{{vcd_cell_1_ip}}",
                            "networkName": "vcd-dmz-nw",
                            "staticRoutes": []
                        }, {
                            "ipAddress": "{{vcd_cell_1_mgmt_ip}}",
                            "networkName": "vcd-mgmt-nw",
                            "staticRoutes": []
                        }
                    ]
                },
                {
                    "hostName": "{{vcd_cell_2_name}}.{{domainName}}",
                    "vmName": "{{vcd_cell_2_name}}",
                    "rootPassword": "{{password}}",
                    "gateway": "{{vcd_mgmt_nw_gateway}}",
                    "nics": [
                        {
                            "ipAddress": "{{vcd_cell_2_ip}}",
                            "networkName": "vcd-dmz-nw",
                            "staticRoutes": []
                        }, {
                            "ipAddress": "{{vcd_cell_2_mgmt_ip}}",
                            "networkName": "vcd-mgmt-nw",
                            "staticRoutes": []
                        }
                    ]
                }
            ]
        }
    ],
    "deploymentInfrastructures": {
        "infra1": {
            "vcenter": {
                "vcenterName": "mgmt-vc",
                "vcenterHost": "{{vcenter_hostname}}.{{domainName}}",
                "vcenterUsername": "administrator@vsphere.local",
                "vcenterPassword": "{{vc_password}}",
                "datacenterName": "{{vc_datacenter}}",
                "clusterName": "{{vc_cluster}}",
                "resourcePool": "{{vc_res_pool}}",
                "datastores": [
                    "{{vc_datastore}}"
                ],
                "networks": {
                    "vcd-dmz-nw": {
                        "portGroupName": "{{vcd_dmz_portgroup}}",
                        "gateway": "{{vcd_dmz_gateway}}",
                        "subnetMask": "{{vcd_dmz_subnet}}",
                        "domainName": "{{domainName}}",
                        "searchPath": [
                            "{{domainName}}"
                        ],
                        "useDhcp": false,
                        "dns": [
                            "{{dns}}"
                        ],
                        "ntp": [
                            "{{ntp}}"
                        ]
                    },
                    "vcd-mgmt-nw": {
                        "portGroupName": "{{vcd_mgmt_nw_portgroup}}",
                        "gateway": "{{vcd_mgmt_nw_gateway}}",
                        "subnetMask": "{{vcd_mgmt_nw_subnet}}",
                        "useDhcp": false
                    }
                }
            }
        }
    }
}

The JSON payload structure is similar for other products. It starts with the environment definition and then follows with a specific product and its product type (VCD, RMQ, TenantApp, Usage Meter). Each has its own set of properties. Integrations section defines for example which tenant VC and NSX should be registered, RabbitMQ etc. Then follows the description of each node to be deployed while referring to Deployment Infrastructure section that is at the end of the JSON and describes the vSphere environent where the nodes can be deployed.

During the bring up the Lifecycle Manager will perform various set of tests and validations to see if the payload is correct and if the referenced environments are accessible. Then it will go on with the actual deployment process. For that it needs to have access to file repository of OVA images (for the bring up) or patch/upgrade files (for lifecycle). This must be manually downloaded to the Docker VM or mounted via NFS.

For the day 2 operations (certificate changes, node manipulations, etc.) an environment must first be imported (as mentioned before the Lifecycle Manager is stateless and forgets everything when rebooted). During the import the same payload as for deployment is provided and checks are performed that the actual environment matches the imported one. Once the state is in the container memory day 2 command can be run. And a six cell VMware Cloud Director deployment can be upgraded with a single API call!

The actual architecture of the deployment is quite flexible. The Lifecycle Manager itself does not prescribe or deploys any networks, load balancers or NFS shares. All those must be prepared up front. I have tested deployment on top of VMware Cloud Foundation 4 (see here) but that is not a hard requirement. Brown field environments are not supported, but nothing is really stopping you to try to describe your existing environment in the JSON and import it.

If you plan to deploy and manage VMware Cloud Director at scale give it a try. And as this is the first public release we have a lot to look forward in the future.