Console Proxy Traffic Enhancements

VMware Cloud Director provides direct access to tenant’s VM consoles via proxying the vSphere console traffic from ESXi hosts running the workload, through VCD cells, load balancer to the end-user browser or console client. This is fairly complex process that requires dedicated TCP port (by default 8443), certificate and a load balancer configuriation without SSL termination (SSL pass-through).

Especially the dedicated certificate requirement is annoying as any change to this certificate cannot be done at the load balancer level, but must be performed on every cell in the VCD server group and those need to be restarted.

However, VMware Cloud Director 10.3.3 for the first time showcases newly improved console proxy. It is still an experimental feature and therefore not enabled by default, but can be accessed in the Feature Flags section of the provider Administration.

By enabling it, you switch to the enhanced console proxy implementation that gives you the following benefits:
  • Console proxy traffic is now going over the default HTTPS 443 port together with UI/API. That means no need for dedicated port/IP/certificate.
  • This traffic can be SSL terminated at the load balancer. This means no need for specific load balancing configuration that needed the SSL pass through of port 8443.
  • The Public Addresses Console Proxy section is irrelevant and not used

The followin diagram shows the high level implementation (credit and shout-out goes to Francois Misiak – the brain behind the new functionality).

As this feature has not yet been tested at scale it is marked as experimental but it is expected that this will be the default console proxy mechanism starting in the next major VMware Cloud Director release. Note that you will still be able to revert to the legacy one if needed.

How to Unregister NSX-V Manager from VMware Cloud Director

After successful migration from NSX-V to NSX-T in VMware Cloud Director you might wish to unregister NSX-V Manager and completely delete it from the vCenter. This is not so easy as the whole VCD model was build on the assumption that vCenter Server and NSX-V Manager are tied together and thus retired together as well. This is obviously no longer the case as you can now use NSX-T backed PVDCs or not use NSX at all (new feature as of VCD 10.3).

VMware Cloud Director (version 10.3+) adds API support to unregister NSX-V Manager without removing the vCenter Server. To do so you need to use the OpenAPI PUT VirtualCenter call. You will first have to run GET call with the VC URN to retrieve its current configuration payload, remove the nsxVManager element and then PUT it back.

Example:

GET https://{{host}}/cloudapi/1.0.0/virtualCenters/urn:vcloud:vimserver:cd0471d4-e48f-4669-8852-de1fdd2648aa

Response:

{
    "vcId": "urn:vcloud:vimserver:cd0471d4-e48f-4669-8852-de1fdd2648aa",
    "name": "vc-01a",
    "description": "",
    "username": "vcd@vsphere.local",
    "password": "******",
    "url": "https://vc-01a.corp.local",
    "isEnabled": true,
    "vsphereWebClientServerUrl": null,
    "hasProxy": false,
    "rootFolder": null,
    "vcNoneNetwork": null,
    "tenantVisibleName": "Site A",
    "isConnected": true,
    "mode": "IAAS",
    "listenerState": "CONNECTED",
    "clusterHealthStatus": "GREEN",
    "vcVersion": "7.0.0",
    "buildNumber": null,
    "uuid": "1da63a23-534a-4315-b3fa-29873d542ae5",
    "nsxVManager": {
        "username": "admin",
        "password": "******",
        "url": "http://192.168.110.24:443",
        "softwareVersion": "6.4.8"
    },
    "proxyConfigurationUrn": null
}

PUT https://{{host}}/cloudapi/1.0.0/virtualCenters/urn:vcloud:vimserver:cd0471d4-e48f-4669-8852-de1fdd2648aa

{
    "vcId": "urn:vcloud:vimserver:cd0471d4-e48f-4669-8852-de1fdd2648aa",
    "name": "vc-01a",
    "description": "",
    "username": "vcd@vsphere.local",
    "password": "******",
    "url": "https://vc-01a.corp.local",
    "isEnabled": true,
    "vsphereWebClientServerUrl": null,
    "hasProxy": false,
    "rootFolder": null,
    "vcNoneNetwork": null,
    "tenantVisibleName": "Site A",
    "isConnected": true,
    "mode": "IAAS",
    "listenerState": "CONNECTED",
    "clusterHealthStatus": "GREEN",
    "vcVersion": "7.0.0",
    "uuid": "1da63a23-534a-4315-b3fa-29873d542ae5",
    "proxyConfigurationUrn": null
}

In order for the NSX-V Manager removal to succeed you must make sure that:

  • Org VDCs using the vCenter Server do not have any NSX-V objects (VXLAN networks, Edge Gateways, vApp or DHCP Edges)
  • Org VDCs using the vCenter Server do not use VXLAN network pool
  • There is no VXLAN network pool managed by the to-be-removed NSX-V Manager

If all above is satistfied you will not need to remove existing Provider VDCs (if they were in the past using NSX-V). They will become NSX-less (so you will not be able to use NSX-T objects in them). NSX-T backed PVDCs will not be affected at all.

VMware Cloud Director on VMware Cloud Foundation

There has been more and more interest lately among service providers in usage of VMware Cloud Foundation (VCF) as the underlying virtualization platform in their datacenter. VCF is getting more and more mature and offers automated lifecycle capabilities that service providers appreciate when operating infrastructure at scale.

I want to focus on the topic how would you design and deploy VMware Cloud Director (VCD) on top of VCF with a specific example. While there are whitepaper on this topic written they do not go into the nitty gritty detail. This should not be considered as prescribed architecture – just one way to skin a cat that should inspire you for your own design.

VCF 4.0 consists of a management domain – smaller infrastructure with one vSphere 7 cluster , NSX-T 3 and vRealize components (vRealize Suite Lifecycle Manager, vRealize Operations Manager, vRealize Log Insight). It is also used for deployment of management components for workload domains, which are separate vSphere 7+NSX-T 3 environments.

VCF has prescribed architecture based on VMware Validated Designs (VVD) how all the management components are deployed. Some are on VLAN backed networks but some are on overlay logical segments created in NSX-T (VVD calls them application virtual networks – AVN) and routed via NSX-T Edge Gateways. The following picture shows typical logical architecture of the management cluster which we will start with:

Reg-MGT and X-Reg-MGMT are overlay segments, rest are VLAN networks.
VC Mgmt … Management vCenter Server
VC Res … Workload domain (resource) vCenter Server
NSX Mgmt … Management NSX-T Managers (3x)
Res Mgmt … Workload domain (resource) NSX-T Managers (3x)
SDDC Mgr … SDDC Manager
Edge Nodes … NSX-T Edge Nodes VMs (2x) that provide resources for Tier-0, Tier-1 gateways and Load Balancer
vRLCM … vRealize Suite Lifecycle Manager
vROps … vRealize Operation Managers (two or more nodes)
vROps RC … vRealize Operation Remote Collectors (optional)
vRLI … vRealize Log Insight (two or more nodes)
WS1A … Workspace ONE Access (former VIDM, one or more nodes)

Now we are going to add VMware Cloud Director solution. I will focus on the following components:

  • VCD cells
  • RabbitMQ (needed for extensibility such as vROps Tenant App or Container Service Extension)
  • vRealize Operations Tenant App (provides multitenant vROps view in VCD and Chargeback functionality)
  • Usage Meter

I have followed these design principles:

  • VCD solution will utilize overlay (AVN) networks
  • leverage existing VCF infrastructure when it makes sense
  • consider future scalability
  • separate internet traffic from the management one

And here is the proposed design:

New overlay segment (AVN) called VCD DMZ has been added to separate the internet traffic. It is routed via separate Tier-1 GW but connected to the existing Tier-0. VCD cells (3 or more) have their primary (eth0) interface on this network with NSX-T Load balancer (running in its own Tier-1 similar to the vROps one). And finally vRealize Operations Tenant App VM.

Existing Reg-Mgmt is used for the secondary interface of VCD cells, Usage Meter VM and for vSAN File Services NFS share that VCD cell require.

And finally the cross region X-Reg-MGMT is utilized for RabbitMQ nodes (2 or more) in order to leverage existing vROps Load Balancer and get away with deploying additional one just for RabbitMQ.

Additional notes:

  • VCF deploys two NSX-T Edge nodes in 2-node NSX-T Edge Cluster. These currently cannot easily be scaled out. Therefore I would recommend deploying additional Edge nodes in separate NSX-T Edge cluster (directly in NSX-T) for the DMZ Tier-1 gateway and VCD load balancer. This guarantees compute and networking resources especially for the load balancer that will perform SSL termination (might not apply if you chose to use different load balancer e.g. Avi). This will also add possibility to deploy separate Tier-0 for more N/S bandwidth.
  • vSAN FS NFS deployment is described here. Do not forget to enable MAC learning on the Reg-MGMT NSX-T logical segment (via segment profile).
  • Both Tier-1 gateways can provide north-south firewalling for additional security
  • As all the incoming internet traffic to VCD goes over the VCD load balancer which provides Source NAT I have opted to have default route on the VCD cells on the management interface to get away with any need for static routes necessary to separate tenant and management traffic

Let me know in the comments if you plan VCD on VCF and if you are facing any challenges.

vCloud Director 9.7 JMS Certificate Issue

Are you still on vCloud Director 9.7 (VCD) in multi-cell configuration? Then you are susceptible to Java Message Service (JMS) certificate expiration issue. Read on.

Background

In multi-cell set up VCD cells need to communicate between themselves. They use shared database but for much faster and efficient communication they also use internal ActiveMQ message bus. It is used for activity sharing and vCenter Server events notifications. If the message bus is dysfunctional it slows any operations almost to halt. For this particular certificate issue you will see in the logs similar message:

Could not accept connection from tcp://<primary-cell-IP:port> : javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown

In vCloud Director 9.7 the bus communication become encrypted in preparation for other use cases (read here). On upgrade or new deployment of each cell new certificate was issued by internal VCD_CA with 365 day duration. In vCloud Director 10.0 or VMware Cloud Director 10.1 the certificate is regenerated upon upgrade and its duration is extended to 3 years.

To find out the certificates expiry date, run the following command from any cell:


/opt/vmware/vcloud-director/bin/cell-management-tool jms-certificates -status

It will for every cell print out its JMS certificate details:

Cell with UUID fd0d2ca0-e357-4aae-9a3b-1c1c5d143d05 and IP 192.168.3.12 has jms certificate: [
[
Version: V3
Subject: CN=vcd-node2.vmware.local
Signature Algorithm: SHA256withRSA, OID = 1.2.840.113549.1.1.11

Key: Sun RSA public key, 2048 bits
modulus: 25783371233977292378120920630797680107189459997753528550984552091043966882929535251723319624716621953887829800012972122297123129787471789561707788386606748136996502359020718390547612728910865287660771295203595369999641232667311679842803649012857218342116800466886995068595408721695568494955018577009303299305701023577567636203030094089380853265887935886100721105614590849657516345704143859471289058449674598267919118170969835480316562172830266483561584406559147911791471716017535443255297063175552471807941727578887748405832530327303427058308095740913857590061680873666005329704917078933424596015255346720758701070463
public exponent: 65537
Validity: [From: Wed Jun 12 15:38:11 UTC 2019,
To: Thu Jun 11 15:38:11 UTC 2020]

 

Yes, this particular cell’s certificate will expire Jun 12 2020 – in less than two months!

The Fix

Set a calendar reminder and when the certificate expiration day is approaching run the following command.

/opt/vmware/vcloud-director/bin/cell-management-tool jms-certificates --certgen

Or upgrade to vCloud Director 10.0 or newer.

Update 21/05/2020: KB 78964 has been published on this topic. Also if the CA signing certificate is expired you will need to disable SSL altogether, restart the cell, regenerate the cert and re-enable SSL.

VMware Cloud Director: Push Notifications in Tenant Context

In VMware Cloud Director 10.1 (VCD) the organization users can subscribe to event and task push notifications which might be useful if the tenant needs to keep track of the activity in the cloud, connect CMDB or any other custom solution and does not want to permanently poll audit log via API.

Access to notifications was in the past only in the realm of service providers who needed to deploy RabbitMQ and connect their Cloud Director cells to it. They can still do so and in fact have to, if they need blocking taks or use VCD API extension (for example Container Service Extension, App Launch Pad or vRealize Operations Tenant App).

The new functionality is enabled by internal Artemis ActiveMQ bus that runs on every VCD cell. The MQTT client connects to the public https endpoint and uses WebSocket connnection to the bus. Authentication is provided via the JWT authentication token. The official documentation provides some detail here, but not enough to actually set this up.

Therefore I want to demonstrate here with very simple Python script how to set up connection and start utilizing this feature.

The Python 3 script leverages the Pyvcloud module (22.0 or newer is required) and Paho MQTT Python Client. Both can be installed simply with pip.

pip install pyvcloud paho-mqtt

In the example org admin credentials are used, which allows to subscription to all organization messages via publish/<org UUID>/* subscription string. It can also be used by system administrator while changing the subscription string to publish/*/*.

#!/usr/bin/python3

import paho.mqtt.client as mqtt
import json
import datetime
import pyvcloud.vcd.client
import pyvcloud.vcd.vdc

vcdHost = 'vcloud.fojta.com'
vcdPort = 443
path = "/messaging/mqtt"
logFile = 'vcd_log.log'

#org admin credentials
user = 'acmeadmin'
password = 'VMware1!'
org = 'acme'

credentials = pyvcloud.vcd.client.BasicLoginCredentials(user, org, password)
vcdClient = pyvcloud.vcd.client.Client(vcdHost+":"+str(vcdPort),None,True,logFile)
vcdClient.set_credentials(credentials)
accessToken = vcdClient.get_access_token()
headers = {"Authorization": "Bearer "+ accessToken}

if max(vcdClient.get_supported_versions_list()) < "34.0":
    exit('VMware Cloud Director 10.1 or newer is required')

org = vcdClient.get_org_list()
orgId = (org[0].attrib).get('id').split('org:',1)[1]

def on_message(client, userdata, message):
    event = message.payload.decode('utf-8')
    event = event.replace('\\','')
    eventPayload = event.split('payload":"',1)[1]
    eventPayload = eventPayload[:-2]
    event_json = json.loads(eventPayload)
    print(datetime.datetime.now())
    print (event_json)

# Enable for logging
# def on_log(client, userdata, level, buf):
#     print("log: ",buf)

client = mqtt.Client(client_id = "PythonMQTT",transport = "websockets")
client.ws_set_options(path = path, headers = headers)
client.tls_set_context(context=None)
# client.tls_insecure_set(True)
client.on_message=on_message
# client.on_log=on_log  #enable for logging
client.connect(host = vcdHost, port = vcdPort , keepalive = 60)
print('Connected')
client.subscribe("publish/"+orgId+"/*")
client.loop_forever()

Notice that the client needs to connect to the /messaging/mqtt path on the VCD endpoint and must provide valid JWT authentication token in the header. That rules some MQTT WebSocket clients that do not support custom headers (JavaScript).

The actual event is in JSON format with nested payload JSON providing the details. The code example prints the time when the event was received and just the nested payload JSON. The script runs forever until interrupted with Ctrl+C.

Note: The actual RabbitMQ extensibility configuration in VCD and the Non-blocking AMQP Notifications setting in the UI has no impact on this functionality and can be disabled if not used by the service provider.