vCloud Director 9.7 JMS Certificate Issue

Are you still on vCloud Director 9.7 (VCD) in multi-cell configuration? Then you are susceptible to Java Message Service (JMS) certificate expiration issue. Read on.

Background

In multi-cell set up VCD cells need to communicate between themselves. They use shared database but for much faster and efficient communication they also use internal ActiveMQ message bus. It is used for activity sharing and vCenter Server events notifications. If the message bus is dysfunctional it slows any operations almost to halt. For this particular certificate issue you will see in the logs similar message:

Could not accept connection from tcp://<primary-cell-IP:port> : javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown

In vCloud Director 9.7 the bus communication become encrypted in preparation for other use cases (read here). On upgrade or new deployment of each cell new certificate was issued by internal VCD_CA with 365 day duration. In vCloud Director 10.0 or VMware Cloud Director 10.1 the certificate is regenerated upon upgrade and its duration is extended to 3 years.

To find out the certificates expiry date, run the following command from any cell:


/opt/vmware/vcloud-director/bin/cell-management-tool jms-certificates -status

It will for every cell print out its JMS certificate details:

Cell with UUID fd0d2ca0-e357-4aae-9a3b-1c1c5d143d05 and IP 192.168.3.12 has jms certificate: [
[
Version: V3
Subject: CN=vcd-node2.vmware.local
Signature Algorithm: SHA256withRSA, OID = 1.2.840.113549.1.1.11

Key: Sun RSA public key, 2048 bits
modulus: 25783371233977292378120920630797680107189459997753528550984552091043966882929535251723319624716621953887829800012972122297123129787471789561707788386606748136996502359020718390547612728910865287660771295203595369999641232667311679842803649012857218342116800466886995068595408721695568494955018577009303299305701023577567636203030094089380853265887935886100721105614590849657516345704143859471289058449674598267919118170969835480316562172830266483561584406559147911791471716017535443255297063175552471807941727578887748405832530327303427058308095740913857590061680873666005329704917078933424596015255346720758701070463
public exponent: 65537
Validity: [From: Wed Jun 12 15:38:11 UTC 2019,
To: Thu Jun 11 15:38:11 UTC 2020]

 

Yes, this particular cell’s certificate will expire Jun 12 2020 – in less than two months!

The Fix

Set a calendar reminder and when the certificate expiration day is approaching run the following command.

/opt/vmware/vcloud-director/bin/cell-management-tool jms-certificates --certgen

Or upgrade to vCloud Director 10.0 or newer.

Update 21/05/2020: KB 78964 has been published on this topic. Also if the CA signing certificate is expired you will need to disable SSL altogether, restart the cell, regenerate the cert and re-enable SSL.

VMware Cloud Director: Push Notifications in Tenant Context

In VMware Cloud Director 10.1 (VCD) the organization users can subscribe to event and task push notifications which might be useful if the tenant needs to keep track of the activity in the cloud, connect CMDB or any other custom solution and does not want to permanently poll audit log via API.

Access to notifications was in the past only in the realm of service providers who needed to deploy RabbitMQ and connect their Cloud Director cells to it. They can still do so and in fact have to, if they need blocking taks or use VCD API extension (for example Container Service Extension, App Launch Pad or vRealize Operations Tenant App).

The new functionality is enabled by internal Artemis ActiveMQ bus that runs on every VCD cell. The MQTT client connects to the public https endpoint and uses WebSocket connnection to the bus. Authentication is provided via the JWT authentication token. The official documentation provides some detail here, but not enough to actually set this up.

Therefore I want to demonstrate here with very simple Python script how to set up connection and start utilizing this feature.

The Python 3 script leverages the Pyvcloud module (22.0 or newer is required) and Paho MQTT Python Client. Both can be installed simply with pip.

pip install pyvcloud paho-mqtt

In the example org admin credentials are used, which allows to subscription to all organization messages via publish/<org UUID>/* subscription string. It can also be used by system administrator while changing the subscription string to publish/*/*.

#!/usr/bin/python3

import paho.mqtt.client as mqtt
import json
import datetime
import pyvcloud.vcd.client
import pyvcloud.vcd.vdc

vcdHost = 'vcloud.fojta.com'
vcdPort = 443
path = "/messaging/mqtt"
logFile = 'vcd_log.log'

#org admin credentials
user = 'acmeadmin'
password = 'VMware1!'
org = 'acme'

credentials = pyvcloud.vcd.client.BasicLoginCredentials(user, org, password)
vcdClient = pyvcloud.vcd.client.Client(vcdHost+":"+str(vcdPort),None,True,logFile)
vcdClient.set_credentials(credentials)
accessToken = vcdClient.get_access_token()
headers = {"Authorization": "Bearer "+ accessToken}

if max(vcdClient.get_supported_versions_list()) < "34.0":
    exit('VMware Cloud Director 10.1 or newer is required')

org = vcdClient.get_org_list()
orgId = (org[0].attrib).get('id').split('org:',1)[1]

def on_message(client, userdata, message):
    event = message.payload.decode('utf-8')
    event = event.replace('\\','')
    eventPayload = event.split('payload":"',1)[1]
    eventPayload = eventPayload[:-2]
    event_json = json.loads(eventPayload)
    print(datetime.datetime.now())
    print (event_json)

# Enable for logging
# def on_log(client, userdata, level, buf):
#     print("log: ",buf)

client = mqtt.Client(client_id = "PythonMQTT",transport = "websockets")
client.ws_set_options(path = path, headers = headers)
client.tls_set_context(context=None)
# client.tls_insecure_set(True)
client.on_message=on_message
# client.on_log=on_log  #enable for logging
client.connect(host = vcdHost, port = vcdPort , keepalive = 60)
print('Connected')
client.subscribe("publish/"+orgId+"/*")
client.loop_forever()

Notice that the client needs to connect to the /messaging/mqtt path on the VCD endpoint and must provide valid JWT authentication token in the header. That rules some MQTT WebSocket clients that do not support custom headers (JavaScript).

The actual event is in JSON format with nested payload JSON providing the details. The code example prints the time when the event was received and just the nested payload JSON. The script runs forever until interrupted with Ctrl+C.

Note: The actual RabbitMQ extensibility configuration in VCD and the Non-blocking AMQP Notifications setting in the UI have no impact on this functionality and can be disabled if not used by the service provider.

How to Migrate VMware Cloud Director from NSX-V to NSX-T

VMware Cloud Director as a cloud management solution is built on top of the underlying compute and networking platforms that virtualize the physical infrastructure. For the compute and storage part VMware vSphere was always used. However, the networking platform is more interesting. It all started with vShield Edge which was later rebranded to vCloud Networking and Security, Cisco Nexus 1000V was briefly an option, but currently NSX for vSphere (NSX-V) and NSX-T Data Center are supported.

VMware has announced the sunsetting of NSX-V (current end of general support is planned for (January 2022) and is fully committed going forward to the NSX-T Data Center flavor. The two NSX releases while similar are completely different products and there is no direct upgrade path from one to the other. So it is natural that all existing NSX-V users are asking how to migrate their environments to the NSX-T?

NSX-T Data Center Migration Coordinator has been available for some time but the way it works is quite destructive for Cloud Director and cannot be used in such environments.

Therefore with VMware Cloud Director 10.1 VMware is releasing compatible migration tool called VMware NSX Migration for VMware Cloud Director.

The philosophy of the tool is following:

  • Enable granular migration of tenant workloads and networking at Org VDC granularity with minimum downtime from NSX-V backed Provider VDC (PVDC) to NSX-T backed PVDC.
  • Check and allow migration of only supported networking features
  • Evolve with new releases of NSX-T and Cloud Director

In other words, it is not in-place migration. The provider will need to stand up new NSX-T backed cluster(s) next to the NSX-V backed ones in the same vCenter Server. Also the current NSX-T feature set in Cloud Director is not equivalent to the NSX-V. Therefore there are networking features that cannot in principle be migrated. To see comparison of the NSX-V and NSX-T Cloud Director feature set see the table at the end of this blog post.

The service provider will thus need to evaluate what Org VDCs can be migrated today based on existing limitations and functionality. Start with the simple Org VDCs and as new releases are provided migrate the rest.

How does the tool work?

  • It is Python based CLI tool that is installed and run by the system administrator. It uses public APIs to communicate with Cloud Director, NSX-T and vCenter Server to perform the migrations.
  • The environment must be prepared is such way that there exists NSX-T backed PVDC in the same vCenter Server as the source NSX-V PVDC and that their external networks are at the infrastructure level equivalent as existing external IP addresses are seamlessly migrated as well.
  • The service provider defines which source Org VDC (NSX-V backed) is going to be migrated and what is the target Provider VDC (NSX-T backed)
  • The service provider must prepare dedicated NSX-T Edge Cluster whose sole purpose is to perform Layer-2 bridging of source and destination Org VDC networks. This Edge Cluster needs one node for each migrated network and must be deployed in the NSX-V prepared cluster as it will perform VXLAN port group to NSX-T Overlay (Geneve) Logical Segment bridging.
  • When the tool is started, it will first discover the source Org VDC feature set and assess if there are any incompatible (unsupported) features. If so, the migration will be halted.
  • Then it will create the target Org VDC and start cloning the network topology, establish bridging, disconnect target networks and run basic network checks to see if the bridges work properly. If not then roll-back is performed (bridges and target Org VDC are destroyed).
  • In the next step the north / south networking flow will be reconfigured to flow through the target Org VDC. This is done by disconnecting the source networks from the gateway and reconnecting the target ones. During this step brief N/S network disruption is expected. Also notice that the source Org VDC Edge GW needs to be connected temporarily to a dummy network as NSX-V requires at least one connected interface on the Edge at all times.
  • Each vApp is then vMotioned from the source Org VDC to the target one. As this is live vMotion no significant network/compute disruption is expected.
  • Once the provider verifies the correct functionality of the target VDC she can manually trigger the cleanup step that migrates source catalogs, destroys bridges and the source Org VDC and renames the target Org VDC.
  • Rinse and repeat for the other Org VDCs.

Please make sure you read the release notes and user guide for the list of supported solutions and features. The tool will rapidly evolve – short roadmap already includes pre-validation and roll-back features. You are also encouraged to provide early feedback to help VMware decide how the tool should evolve.

VMware Cloud Director 10.1: NSX-T Integration

This is an updated blog post of the original vCloud Director 10: NSX-T Integration to include all VMware Cloud Director 10.1 related updates.

Intro

VMware Cloud Director relies on NSX network virtualization platform to provide on-demand creation and management of networks and networking services. NSX for vSphere has been supported for long time and vCloud Director allows most of its feature to be used by its tenants. However as VMware slowly shifts away from NSX for vSphere and pushes forward modern, fully rewritten NSX-T networking platform, I want to focus in this article on its integration with vCloud Director.

History

Let me start with highlighting that NSX-T is evolving very quickly. It means each release (now at version 3.0) adds major new functionality. Contrast that with NSX-V which is essentially feature complete in a sense that no major functionality change is happening there. The fast pace of NSX-T development is a challenge for any cloud management platforms as they have to play the catch up game.

The first release of vCloud Director that supported NSX-T was 9.5. It supported only NSX-T version 2.3 and the integration was very basic. All vCloud Director could do was to import NSX-T overlay logical segments (virtual networks) created manually by system administrator. These networks were imported into a specific tenant Org VDC as Org VDC networks.

The next version of vCloud Director – 9.7 supported only NSX-T 2.4 and from the feature perspective not much had changed. You could still only import networks. Under the hood the integration however used completely new set of NSX-T policy based APIs and there were some minor UI improvements in registering NSX-T Manager.

vCloud Director version 10 for the first time introduced on-demand creation of NSX-T based networks and network services. NSX-T version 2.5 was required.

The latest Cloud Director version 10.1 is extending NSX-T support with new features.

Note: Cloud Director 10.1.0 does not support NSX-T 3.0. That support will come in the next patch release (10.1.1).

NSX-T Primer

While I do not want to go too deep into the actual NSX-T architecture I fully expect that not all readers of this blog are fully familiar with NSX-T and how it differs from NSX-V. Let me quickly highlight major points that are relevant for topic of this blog post.

  • NSX-T is vCenter Server independent, which means it scales independently from vCenter domain. NSX-T essentially communicates with ESXi hosts directly (they are called host transport nodes). The hosts must be prepared with NSX-T vibs that are incompatible with NSX-V which means a particular host cannot be used by NSX-V and NSX-T at the same time.
  • Overlay virtual networks use Geneve encapsulation protocol which is incompatible with VXLAN. The concept of Controller cluster that keeps state and transport zone is very similar to NSX-V. The independence from VC mentioned in the previous point means vSphere distributed switch cannot be used, instead NSX-T brings its own N-VDS switch. It also means that there is concept of underlay (VLAN) networks managed by NSX-T. All overlay and underlay networks managed by NSX-T are called logical segments.
  • Networking services (such as routing, NATing, firewalling, DNS, DHCP, VPN, load balancing) are provided by Tier-0 or Tier-1 Gateways that are functionally similar to NSX-V ESGs but are not instantiated in dedicated VMs. Instead they are services running on shared Edge Cluster. The meaning of Edge Cluster is very different from the usage in NSX-V context. Edge Cluster is not a vSphere cluster, instead it is cluster of Edge Transport Nodes where each Edge Node is VM or bare metal host.
  • While T0 and T1 Gateways are similar they are not identical, and each has specific purpose or set of services it can offer. Distributed routing is implicitly provided by the platform unless a stateful networking service requires routing through single point. T1 GWs are usually connected to single T0 GW and that connection is managed automatically by NSX-T.
  • Typically you would have one or small number of T0 GWs in ECMP mode providing North-south routing (concept of Provider Edge) and large number of T1 GWs connected to T0 GW, each for a different tenant to provide tenant networking (concept of Tenant Edge).

VMware Cloud Director Integration

As mentioned above since NSX-T is not vCenter Server dependent, it is attached to Cloud Director independently from VC.

(Geneve) network pool creation is the same as with VXLAN – you provide mapping to an existing NSX-T overlay transport zone.


Now you can create Provider VDC (PVDC) which is as usual mapped to a vSphere cluster or resource pool. A particular cluster used by PVDC must be prepared for NSX-V or NSX-T and all clusters must share the same NSX flavor. It means you cannot mix NSX-V clusters with NSX-T in the same PVDC. However you can easily share NSX-V and NSX-T in the same vCenter Server, you will then just have to create multiple PVDCs. Although NSX-T can span VCs, PVDC cannot – that limitation still remains. When creating NSX-T backed PVDC you will have to specify the Geneve Network Pool created in the previous step.

Within PVDC you can start creating Org VDCs for your tenants – no difference there.

Org VDCs without routable networks are not very useful. To remedy this we must create external networks and Org VDC Edge Gateways. Here the concept quite differs from NSX-V. Although you could deploy provider ECMP Edges with NSX-V as well (and I described here how to do so), it is mandatory with NSX-T. You will have to pre-create T0 GW in NSX-T Manager (ECMP active – active is recommended). This T0 GW will provide external networking access for your tenants and should be routable from the internet. Instead of just importing external network port group how you would do with NSX-V you will import the whole T0 GW in Cloud Director.

During the import you will also have to specify IP subnets and pools that the T0 GW can use for IP sub-allocation to tenants.

Once the external network exist you can create tenant Org VDC Edge Gateways. The service provider can pick specific existing NSX-T Edge Cluster for their placement.

T1 GWs are always deployed in Active x Standby configuration, the placement of active node is automated by NSX-T. The router interlink between T0 and T1 GWs is also created automatically by NSX-T. It is possible to disconnect Org VDC Edge GW from Tier-0 GW (this is for example used in NSX-V to NSX-T migration scenario).

During the Org VDC Edge Gateway the service providers also allocates range of IPs from the external network. Whereas with NSX-V these would actually be assigned to the Org VDC Edge Gateway uplink, this is not the case with NSX-T. Once they are actually used in a specific T1 NAT rule, NSX-T will automatically create static route on the T0 GW and start routing to the correct T1 GW.

Tenant Networks

There are four major types of NSX-T based Org VDC networks and three of them are available to be created via UI:

  • Isolated: Layer 2 segment not connected to T1 GW. DHCP service is not available on this network (contrary to NSX-V implementation).
  • Routed: Network that is connected to T1 GW. The default is NAT-routed which means its subnet is not announced to upstream T0 GW and only way to route to reach it from outside is to use DNAT rule on T1 GW from a allocated external IP address.
    Cloud Director version 10.1 introduces fully routed network more on it below.
  • Imported: Existing NSX-T overlay logical segment can be imported (same as in VCD 9.7 or 9.5). Its routing/external connectivity must be managed outside of vCloud Director.
  • In OpenAPI (POST /1.0.0/OrgVdcNetwork) you will find one more network type:  DIRECT_UPLINK. This is for a specific NFV use case. Such network is connected directly to T0 GW with external interface. Note this feature is not officially supported!

Note that only Isolated and routed networks can be created by tenants.

In direct connect use case it is desirable to announce routed Org VDC networks upstream so workloads are reachable directly without any NAT. This is possible in Cloud Director version 10.1, but requires dedicated Tier-0 GW for the particular tenant. The provider must create new Tier-0, connect it to tenant’s particular direct connect transit VLAN and then when deploying Org VDC Edge GW select Dedicate External Network switch.

Cloud Director will make sure that dedicated External Network Tier-0 GW is not accessible to any other Org VDC Edge Gateway.

Tenant can then configure on its Org VDC Edge GW BGP routing, which is in fact set by Cloud Director on the dedicated Tier-0 GW (while Tier-0 to Tier-1 routes are auto-plumbed by NSX).

Tenant Networking Services

Currently the following T1 GW networking services are available to tenants:

  • Firewall (with IP Sets and Security Groups based on network objects)
  • NAT
  • DHCP (without binding and relay)
  • DNS forwarding
  • IPSec VPN: Policy based with pre shared key is supported.

All other services are currently not supported. This might be due to NSX-T not having them implemented yet, or Cloud Director not catching up yet. Expect big progress here with each new Cloud Director and NSX-T release.

Networking API

All NSX-T related features are available in the Cloud Director OpenAPI (CloudAPI). The pass through API approach that you might be familiar with from the Advanced Networking NSX-V implementation is not used!

Feature Comparison

I have summarized all Cloud Director networking features in the following table for quick comparison between NSX-V and NSX-T.

What’s New in VMware Cloud Director 10.1

And it is time for another What’s New in (v)Cloud Director blog post. If you are not up to date you can find the older articles for versions 10, 9.7, 9.5 and 9.1 here.

Let us start with the “important” announcement – a name change. vCloud Director has been re-branded to VMware Cloud Director. Fortunately we keep the same (unofficial) acronym – VCD. The current version is 10.1 which might look like a small increase from the previous 10.0 but that is just marketing numbering so do not put too much emphasis on it and assess yourself if it is big release or not.

Interoperability

VMware has added support for NSX-T 3.0, but vSphere 7 support is missing. It is expected to come shortly in a major patch release (correction, NSX-T 3.0 did not make it yet too). You can upgrade your management clusters and dedicated vCenter Servers, just not those that are backing Provider VDCs.

UI

As previously announced, no more Adobe Flex UI (cannot be even enabled with a secret switch). Shouldn’t be an issue however, as the HTML 5 UI has not only 99.9% parity but in fact is significantly better than Flex UI ever was. There are new features such as VM sizing and VM placement groups, advanced filtering, multiselect actions, badges, quick access to VM console or network cards, tasks and events in vApp details, Org VDC ACL, live import from vCenter Server, …

The UI team is no longer in feature parity mode, they are in make-it-better mode and doing a great job as can be seen from the screenshots below.


Platform Security

Certificate validation is now required for VC/NSX endpoints and will be required for LDAP in the next major release as well. It means your endpoints either have to have publicly trusted certificates, or you have to upload their signing certificate, or you have to approve their certificate on the first use (when you add or edit such endpoint). To ease the transition from older vCloud Director releases, you can run cell-management-tool trust-infra-certs command after upgrade that will automatically retrieve and trust certificates of all connected VC/NSX endpoints. If you forget to run this command, Cloud Director will not be able talk to VC/NSX endpoints!

Appliance

While you still can use the Linux installer of Cloud Director with external database the appliance deployment factor has been again improved and there is less and less reasons not to use it especially for green field deployments.

The appliance API (on port 5480) has been enhanced to monitor status of services, to see which node is running active node of embedded database (useful for load balancing access to the database for external tools), monitor filesystem storage and trigger database node switchoever (planned failover) or promotion (after failure).

The appliance embedded appliance has for the first time automated failover functionality. It is disabled by default but you can enable it with the API.The Appliance UI has also been improved and provides some of the API functionality.

Messaging Bus

As you might know, Cloud Director has embedded messaging bus for inter-cell communication. In the past it was using ActiveMQ (ports 61611 and 61616). If the service provider wanted to use blocking tasks, notifications or vCloud API (or is it now VCloud API?!) extensions then an external RabbitMQ messaging bus had to be deployed. In the current release ActiveMQ has been replaced with Artemis and is also available externally for blocking tasks and notifications, so RabbitMQ is no longer needed for these two use cases (Update: blocking tasks did not make it into this release). Additionally it can also be used by tenants.

Artemis uses MQTT communication protocol and the connection to it can be made via WebSocket session which is authenticated with regular Cloud Director authentication token mechanism.

Note that external RabbitMQ is still supported and still needed for vCloud API extensibility use case.

I have written full article on this feature here.

Networking

NSX-T integration has been enhanced with routed Org VDC networks (previous release supported NAT-routed) with BGP routing protocol. This feature currently requires dedicated Tier-0 Gateway for the tenant.
IPSec VPN is now in the UI (pre-shared key authentication and policy based are supported). IP Sets and Security Groups have been split with support for network objects that dynamically refer to all connected VMs.

The service provider can configure multiple NSX-T Edge Clusters and select which one will be used for a particular Tier-1 (Org VDC) Gateway. This enables separation of Tier-0 and Tier-1 Gateways to different Edge Clusters.

The NSX-V side of networking has also one new feature – you can now use Cross VDC networking within the same vCenter Server. This for example enables multi egress networks within single Org VDC for stretched cluster use case.

Edit (14/4/2020): Each NSX-V backed Org VDC Edge GW can now be placed to a specific cluster via API, which might be useful for the above use case. Previously you had to have same Edge Cluster config for the whole Org VDC.

Storage

vSphere VM Encryption is now supported within Cloud Director. The encryption happens in the hypervisor which means the data is encrypted both in rest as well in flight. The encryption is set up via vCenter Server storage policies by enabling host based rules. A compatible external key management server must be deployed and connected to vCenter Server. It means the feature is fully in realm of the service provider and key management is not exposed to tenants.

Other Features

  • Proxying of dedicated vCenter Servers (so called Centralized Point of Management – CPoM feature) was improved with extra stats, more proxies and browser extension to simplify the usage
  • Support for VM (UI) and vApp (API only) live migration between Provider VDCs
  • Due to UI upgrade to Clarity 2+ custom themes will have to be recompiled
  • The provider can enable promiscuous mode and forged transmits on VXLAN backed Org VDC network (API only)
  • Blocking tasks for OpenAPI tasks.
  • Cloud Director 10.1 is the first release that enables automated NSX-V to NSX-T migration. More on that in a later blog post.