VMware Cloud Provider Lifecycle Manager 1.4

VMware has just released VMware Cloud Provider Lifecycle Manager version 1.4. I have blogged about the product in the past. I have been testing the version 1.4 in my lab and must say it is now significantly easier to use to deploy, manage new or brown field deployments of the VMware Cloud Director stack.

The UI has now almost full parity with the API. So it is now very easy to use the product from the UI for single environment management (at scale you would probably still want to use API). Additionally, existing VCD environments (either deployed buy VCPLCM or any other way) can be registered under management of VCPLCM. You will just need to provide a few basic information (solution endpoint address, passwords, etc.) which allows VCPLCM to run the discovery process and collect additional details (nodes, certificates, integrations, etc.). Once that is done you simply finish the registration by supplying missing details such as integration passwords or node names and their network port groups in VC inventory that could not be autodiscovered.

With VCD under management of VCPLCM the most annoying operations such as upgrades, certificate replacements or node additions/redeployments are very simple to perform and take just a few minutes.

Give it a try!

Metering External Network Traffic with NSX-T

It is common for service providers to meter internet traffic consumption of their tenants. In NSX-V environment you could get any interface statistics of an Edge Service Gateway via NSX API or vCloud Director Networking API:
GET /api/4.0/edges/{edgeId}/statistics/interfaces

Let us explore the ways you can do similar metering in NSX-T environment.

We have three points where we can meter external traffic:

  1. On Tier-0 or VRF uplink (if such object is dedicated to the tenant). The API to use is:

    GET https://{{host}}/policy/api/v1/infra/tier-0s/{tier-0 name}/locale-services/default/interfaces/{uplink-name}/statistics

    If the Tier-0/VRF is stretched across multiple Edge nodes you will need to aggregate the data across multiple uplinks.
  2. On Tier-0/VRF router downlink to a specific Tier-1 GW. This is useful if the Tier-0 is shared across multiple tenants and their Tier-1 GWs. The API is not documented in the API docs.

    GET https://{{host}}/policy/api/v1/infra/tier-0s/{tier-0 name}/tier-1-interface/statistics/summary?tier1_path=/infra/tier-1s/{tier-1 name}
  3. On CSP (Service Interface) of a Tier-1 (or Tier-0) GW for use cases where a particular VLAN is connected directly to tenant GW. The API to use is quite long and must include the enforcement point (edge node backing active Tier-1 GW)

    GET https://{{host}}/policy/api/v1/infra/tier-1s/{tier-1 name}/locale-services/{locale id}/interfaces/{CSP interface id}/statistics?enforcement_point_path=/infra/sites/default/enforcement-points/default&edge_path=/infra/sites/default/enforcement-points/default/edge-clusters/{edge-cluster-id}/edge-nodes/{edge-node-id}

Multitenant Logging with VMware Cloud Director

Topic of multitenant logging in VMware Cloud Director (VCD) especially with NSX-T came up recently in multiple conversations, so I will summarize the capabilities and options as they are today.

Audit and Event Logs

Information about VCD events such as user logins, vApp operation, VM power-on etc are available in the Audit and Event log. Tenant can access their particular logs via API – AuditTrail Events and Tasks (type=task query) or directly in the UI.

Event Log UI

An additional option is to receive events and notifications via MQTT message bus as was described here.

NSX-V Networking Logs

With NSX-V each tenant would have one or more Org VDC Gateways each backed by its own NSX-V Edge Service Gateway (single or two VMs in HA mode). The tenant could configure their own syslog endpoint to send all Edge logs there. As each Edge could have two syslog entries, the provider could set up their syslog endpoint as well to receive the same logs.

Tenant Edge Gateway Syslog Configuration
Provider Edge Gateway Syslog Configuration

And this is how the configuration looks in NSX-V:

Distributed firewall logs are generated in vmkernel of each ESXi host. It means they are sent to syslog configured at ESXi level which is not multitenant. However VCD configures a rule tag which is unique for each Org VDC (its UUID) so it is possible to filter logs for a specific tenant.

Org VDC UUID rule tag configured for each DFW entry in NSX-V

NSX-T Networking Logs

Org VDC gateways in NSX-T Org VDCs are backed by NSX-T Tier-1 gateways which are logical entities on shared NSX-T Edge Nodes. This means it is not possible to configure tenant’s syslog on them as was done with NSX-V. Therefore VCD will configure Log label for each GW firewall entry. The log is composit of first 10 characters of the UUID of the GW parent object (Org VDC or Data Center Group), 5 characters of the GW name and 5 characters of the GW UUID.

Note that you need at least NSX-T 3.2 and VCD 10.4.

For distributed firewall the situation is identical to NSX-V. The only difference it that the log label is composit of first 5 characters of the Data Center Group name and 27 characters of its UUID.

NSX-T DFW Log Label

Additional improvements are planned in NSX-T 4.1+ to allow multitenant access to other log types.

Log Filtering and Forwarding

To test the functionality vRealize Log Insight supports up to 10 log forwarders. These can be configured in the Log Management section. In the screenshot below you can see Distributed Firewall for NSX-V and NSX-T and Gateway Firewall for NSX-T forwarding to tenant syslog 10.0.2.29 configured.

vRealize Log Insight filtering and forwarding

Non-elastic Allocation Pool VDC with Flex?

This came up on provider Slack and I want to preserve it for future.

Question: Can the provider create Flex type Org VDC where the tenant would get allocation of certain CPU capacity (in GHz) and they could deploy as many VMs as they wish while limited by the CPU allocation?

Example: the VDC has 10 GHz allocation and the tenant is able to deploy 3 VMs each with 2 vCPUs at 2 GHz (which is in total 12 GHz) but they will altogether be able to consume only up to 10 GHz of physical vSphere resources.

This is basically the old legacy non-elastic allocation pool type VDC where the VDC is backed by single vCenter Server Resource Pool with the limit equal to the CPU allocation. As of VCD 5.1.2 there is ability to enable elasticity of such VDC but it behaves differently and is done at whole VCD instance level. So can you create similarly behaving VDC with the new Flex type (introduced in VCD 10.0).

Answer: Yes. Create non-elastic Flex VDC with the allocation and enable CPU limit. Then create VM sizing policy with vCPU speed set to 0 GHz and make it default for the Org VDC.

Control System Admin Access to VMware Cloud Director

When VMware Cloud Director is deployed in public environment setup it is a good practice to restrict the system admin access only for specific networks so no brute force attack can be triggered against the publicly available UI/API end points.

There is actually a relatively easy way to achieve this via any web application firewall (WAF) with URI access filter. The strategy is to protect only the provider authentication end points which is much easier than to try to distinguish between provider and tenant URIs.

As the access (attack) can be done either through UI or API the solution should address both. Let us first talk about the UI. The tenants and provider use specific URL to access their system/org context but we do not really need to care about this at all. The UI is actually using (public) APIs so there is nothing needed to harden the UI specifically if we harder the API endpoint. Well, the OAuth and SAML logins are exception so let me tackle them separately.

So how can you authenticate to VCD via API?

Integrated Authentication

The integrated basic authentication consisting of login/password is used for VCD local accounts and LDAP accounts. The system admin (provider context) uses /cloudapi/1.0.0/sessions/provider API endpoint while the tenants use /cloudapi/1.0.0/sessions.

The legacy (common for both providers and tenant) API endpoint /api/sessions has been deprecated since API version 33.0 (introduced in VCD 10.0). Note that deprecated does not mean removed and it is still available even with API version 36.x so you can expect to be around for some time as VCD keeps backward compatible APIs for few years.

You might notice that there is in a Feature Flags section the possibility to enable “Legacy Login Removal”.

Feature Flags

Enabling this feature will disable legacy login both for tenants and providers however only if you use alpha API version (in the case of VCD 10.3.3.1 it is 37.0.0-alpha-1652216327). So this is really only useful for testing your own tooling where you can force the usage of that particular API version. The UI and any 3rd party tooling will still use the main (supported) API versions where the legacy endpoint will still work.

However, you can forcefully disable it for provider context for any API version with the following CMT command (run from any cell, no need to restart the services):

/opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n vcloud.api.legacy.nonprovideronly -v true

The providers will need to use only the new cloudapi/1.0.0/providers/session endpoint. So be careful as it might break some legacy tools!

API Access Token Authentication

This is a fairly new method of authentication to VCD (introduced in version 10.3.1) that uses once generated secret token for API authentication. It is mainly used by automation or orchestration tools. The actual method of generating session token requires access to the tenant or provider oauth API endpoints:

/oauth/tenant/<tenant_name>/token

/oauth/provider/token

This makes it easy to disable provider context via URI filter.

SAML/OAuth Authentication via UI

Here we must distinguish the API and UI behavior. For SAML, the UI is using /login/org/<org-name>/… endpoint. The provider context is using the default SYSTEM org as the org name. So we must filter URI starting with /login/org/SYSTEM.

For OAuth the UI is using the same endpoint as API access token authentication /oauth/tenant vs /oauth/provider. /login/oauth?service=provider

For API SAML/OAuth logins cloudapi/1.0.0/sessions vs cloudapi/1.0.0/sessions/provider endpoints are used.

WAF Filtering Example

Here is an example how to set up URI filtering with VMware NSX Advanced Load Balancer.

  1. We need to obviously set up VCD cell (SSL) pool and Virtual Service for the external IP and port 443 (SSL).
  2. The virtual service application profile must be set to System-Secure-HTTP as we need to terminate SSL sessions on the load balancer in order to inspect the URI. That means the public SSL certificate must be uploaded to load balancer as well. The cells can actually use self signed certs especially if you use the new console proxy that does not require SSL pass through and works on port 443.
  3. In the virtual service go to Policies > HTTP Request and create following rules:
    Rule Name: Provider Access
    Client IP Address: Is Not: <admin subnets>
    Path: Criteria – Begins with:
    /cloudapi/1.0.0/sessions/provider
    /oauth/provider
    /login/oauth?service=provider
    /login/org/SYSTEM
    Content Switch: Local response – Status Code: 403.
WAF Access Rule

And this is what you can observe when trying to log in via integrated authentication from non-authorized subnets:

And here is an example of SAML login: