Enhanced Security for Cross vCenter Operations in VMware Cloud Director

Starting with VMware Cloud Director 10.4.1 VM and vApp operations that happen across Provider VDCs backed by different vCenter Server instances require that both vCenter Server trust each other certificates.

Which operations are affected?

Any VM move, clone or vApp move, clone and template deployments that happen via UI or API where source and destinations are managed by different vCenter Server.

If involved vCenter Server do not trust each other certificates, those operations will fail with an error:

Underlying system error: com.vmware.vim.binding.vim.fault.SSLVerifyFault

In the VCD cell logs you will see something similar to:

2022-11-18 17:15:46,023 | ERROR    | vim-proxy-activity-pool-482 | RelocateVmActivity             | Underlying system error: com.vmware.vim.binding.vim.fault.SSLVerifyFault | requestId=9554d239-a457-48bd-988d-4aa7856cdba2,request=POST https://10.196.235.31/api/vdc/ceb9bd94-171e-4d3f-a390-248791cbb218/action/moveVApp,requestTime=1668791732601,remoteAddress=10.30.65.110:53169,userAgent=PostmanRuntime/7.29.2,accept=application/*+xml;version 37.0 vcd=b4856f34-94bb-4462-8d1b-469483f98f69,task=46d10139-134e-4823-a674-dd0e10a5565a activity=(com.vmware.vcloud.backendbase.management.system.TaskActivity,urn:uuid:46d10139-134e-4823-a674-dd0e10a5565a) activity=(com.vmware.vcloud.vdc.impl.MoveVAppActivity,urn:uuid:a8c5d945-9ffc-44b7-aafb-54a634e0dd08) activity=(com.vmware.vcloud.vdc.impl.LinkVMsToTargetVAppActivity,urn:uuid:ed3c8d9f-69db-466e-bbbd-9046d996d3e6) activity=(com.vmware.vcloud.vdc.impl.MoveVmUsingVmotionActivity,urn:uuid:6839cde5-cff4-424f-9d24-940b0b78ef45) activity=(com.vmware.ssdc.backend.services.impl.RelocateVmActivity,urn:uuid:dd9d8304-c6e8-45eb-8ef0-65a17b765785) activity=(com.vmware.vcloud.fabric.storage.storedVm.impl.RelocateStoredVmByStorageClassActivity,urn:uuid:ff420cab-5635-45a0-bdf0-fee6fe33eb7e) activity=(com.vmware.vcloud.fabric.storage.storedVm.impl.RelocateStoredVmByDatastoreActivity,urn:uuid:6cde303d-ae82-4a3c-9ecc-cceb0efcdbe9) activity=(com.vmware.vcloud.val.internal.impl.RelocateVmActivity,urn:uuid:73b44746-52c3-4143-bb29-1d22b532cc55)
com.vmware.ssdc.library.exceptions.GenericVimFaultException: Underlying system error: com.vmware.vim.binding.vim.fault.SSLVerifyFault
        at com.vmware.ssdc.library.vim.LmVim.createGenericVimFaultException(LmVim.java:329)
        at com.vmware.ssdc.library.vim.LmVim.Convert(LmVim.java:445)
        at com.vmware.ssdc.library.vim.LmVim.Convert(LmVim.java:499)
        at com.vmware.vcloud.val.taskmanagement.AsynchronousTaskWaitActivity.getResultIfTaskAlreadyCompleted(AsynchronousTaskWaitActivity.java:449)
        at com.vmware.vcloud.val.taskmanagement.AsynchronousTaskWaitActivity$InitialPhase.invoke(AsynchronousTaskWaitActivity.java:123)
        at com.vmware.vcloud.activity.executors.ActivityRunner.runPhase(ActivityRunner.java:175)
        at com.vmware.vcloud.activity.executors.ActivityRunner.run(ActivityRunner.java:112)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: (vim.fault.SSLVerifyFault) {
   faultCause = null,
   faultMessage = null,
   selfSigned = false,
   thumbprint = B3:E8:35:77:E3:65:20:1A:BD:F2:DB:41:08:34:58:51:07:E2:AA:53
}

The system administrator thus must make sure that when there is more than one resource vCenter Server managed by VMware Cloud Director that there is mutual trust across all such vCenter Server instances. To reiterate, this must be ensured at vCenter Server level – no change is needed in VMware Cloud Director.

Typically, vCenter Servers use their own VMCA to issue their (VC and ESXi) certificates. Each such VMCA certificate must be imported to the other vCenter Server. The alternative is to use publicly signed CA instead of the self-signed VMCA. Then no action would be needed. And finally if you use your own Enterprise CA to issue VC certificates, then each VC just must trust the root Enterprise CA.

vCenter Server provides UI for certificate management in vSphere Client (Administration > Certificates > Certificate Management).

The source VMCA certificate can be retrieved via the download.zip from http://<source-VC>/certs/download.zip, extracted and uploaded via the ablove ADD UI.

Alternatively, the same can be accomplished via API from Developer Center > API Explorer by GET chain from source VC and upload it by POST trusted-root-chains to the target VC.

The third alternative is to use PowerCLI which also provides a way to manage vSphere certificates. An example script would look like this:

$sourceVC="vc01.fojta.com"
$targetVC="vc02.fojta.com"
connect-viserver $sourceVC
connect-viserver $targetVC
$trustedCerts = Get-VITrustedCertificate -server $sourceVC -VCenterOnly
foreach ($Cert in $trustedCerts)
  {Add-VITrustedCertificate -PemCertificateOrChain $cert.CertificatePEM -server $sourceVC -VCenterOnly -Confirm:$false}

As the trust must be mutual, if you have five resource vCenter Server each should have at least five trusted root certificates (one its own VMCA and four for the other VCs).

Custom vCenter Server Event and Alarm

Related to my previous post about monitoring Edge Gateways my customer asked me if he could leverage vCenter Server alarms as they are integrated with their monitoring and alerting infrastructure.

So basically is there a way to create vCenter alarm via scripted action (for example with PowerCLI)?

The answer is yes and it is not that difficult. There are two types of vCenter alarms: based on condition/state or an event. And it is possible to create custom user event via Loguserevent method of entity manager via vSphere API.

This is example of PowerCLI code the creates user event “Edge Gateway event” at the root datacenter folder.

$DCFolderEntity = Get-Folder -Name datacenters
$eventMgr = Get-View EventManager
$eventMgr.LogUserEvent($entity.ExtensionData.MoRef,"Edge Gateway event")

User logged event

Now it is easy to create custom alarm based on the user logged event. Create the alarm at the vCenter Server root level, as alarm type pick monitor vCenter Server and as triggers manually enter (type):

Event; vim.event.GeneralUserEvent 

That’s it.

Alarm

Graceful Shutdown of vCloud Director Cell

I have been challenged by one of my customers how to properly shutdown vCloud Director cell without any disruption of the service if multiple cells are used. Although we have KB article 2034994 about this particular subject it omits some important details.

When vCenter is connected to vCloud Director a VC Proxy service is started on one of the cells. The service is responsible for monitoring of active vCenter tasks and inventory updates which are then shared with other cells. Unless there is a network partition between the cells there is always one vCenter proxy service for one vCenter. Multiple VC Proxies can run on one cell. You can see which cell is running the VC Proxy service at the vCenter screen in the vCloud Director Admin interface.

vCenter Proxy

The screenshot shows two vCenters connected to vCloud Director with one having its vCenter proxy on vcloud1 cell and the second on vcloud2 cell.

If the VC Proxy service is not running most of the activities in the vCloud Director that require vCenter will not work properly. For example simple creation of a vApp with one VM will fail with message:

Folder vApp_system_34 (8ce90b57-da8b-4714-914b-5073457155b0) does not exist in our inventory, but vCenter Server claims that it does.

This is because the inventory listener on the VC Proxy was not running and vCloud Director could not verify successful creation of vApp folder in vCenter. When a cell with VC Proxy service dies the service fails over to a surviving cell. However that failover takes 5 minutes which is govern by vcloud:vcloud.heartbeat.failoverTimeoutMsecs property (stored in vCloud Director database). I am not aware if it is supported to change this value.

Anyway in order to shutdown a cell gracefully we need to move the VC Proxy service to another cell. This can be done by simple reconnect of vCenter and the move is very quick without any disruption of the running tasks.

Reconnect vCenter
Reconnect vCenter

I have observed that if possible different cell then the original and the least loaded (in terms of number of VC Proxy services) is chosen. This is also good for manually distributing the load if there are multiple vCenters and multiple cells (good practice is to have at least N+1 cells, where N is number of vCenters).

So what should be the correct graceful cell shutdown procedure?

  1. Make sure the cell is not running any VC Proxy service. No checkmark should be in the vCenter column of the Cloud Cells inventory in the vCloud Director Admin interface.Cloud CellsIf yes, then reconnect vCenters that have VC Proxy running on the cell.
  2. Quiesce the cell with the cell-management-tool:

    $VCLOUD_HOME/bin/cell-management-tool -u <user> cell –quiesce true

    where <user> is vCloud administrator username

  3. Monitor the number of outstanding active tasks on the cell and wait until it reaches 0.

    $VCLOUD_HOME/bin/cell-management-tool -u <user> cell –status
    Job count = 0
    Is Active = false

  4. Shutdown the cell. This can be done also with cell-management-tool. What I noticed is that it takes multiple attempts (usually two), as the first time only the watchdog service is terminated.

    # $VCLOUD_HOME/bin/cell-management-tool -u <user> cell –shutdown

    # service vmware-vcd status

    vmware-vcd-watchdog is not running
    vmware-vcd-cell is running

    # $VCLOUD_HOME/bin/cell-management-tool -u <user> cell –shutdown
    # service vmware-vcd status

    vmware-vcd-watchdog is not running
    vmware-vcd-cell is not running