New Networking Features in VMware Cloud Director 10.4.1

The recently released new version of VMware Cloud Director 10.4.1 brings quite a lot of new features. In this article I want to focus on those related to networking.

External Networks on Org VDC Gateway (NSX-T Tier-1 GW)

External networks that are NSX-T Segment backed (VLAN or overlay) can now be connected directly to Org VDC Edge Gateway and not routed through Tier-0 or VRF the Org VDC GW is connected to. This connection is done via the service interface (aka Centralize Service Port – CSP) on the service node of the Tier-1 GW that is backing the Org VDC Edge GW. The Org VDC Edge GW still needs a parent Tier-0/VRF (now called Provider Gateway) although it can be disconnected from it.

What are some of the use cases for such direct connection of the external network?

  • routed connectivity via dedicated VLAN to tenant’s co-location physical servers
  • transit connectivity across multiple Org VDC Edge Gateways to route between different Org VDCs
  • service networks
  • MPLS connectivity to direct connect while internet is accessible via shared provider gateway

The connection is configured by the system administrator. It can use only single subnet from which multiple IPs can be allocated to the Org VDC Edge GW. One is configured directly on the interface while the others are just routed through when used for NAT or other services.

If the external network is backed by VLAN segment, it can be connected to only one Org VDC Edge GW. This is due to a NSX-T limitation that a particular edge node can use VLAN segment only for single logical router instantiated on the edge node – and as VCD does not give you the ability to select particular edge nodes for instantiation of Org VDC Edge Tier-1 GWs it simply will not allow you to connect such external network to multiple Edge GWs. If you are sharing the Edge Cluster also with Tier-0 GWs make sure that they do not use the same VLAN for their uplinks too. Typically you would use VLAN for co-location or MPLS direct connect use case.

Overlay (Geneve) backed external network has no such limitations which makes it a great use case for connectivity across multiple GWs for transits or service networks.

NSX-T Tier-1 GW does not provide any dynamic routing capabilities, so routing to such network can be configured only via static routes. Also note that Tier-1 GW has default route (0.0.0.0/0) always pointing towards its parent Tier-0/VRF GW. Therefore if you want to set default route to the segment backed external network you need to use two more specific routes. For example:

0.0.0.0/1 next hop <MPLS router IP> scope <external network>
128.0.0.0/1 next hop <MPLS router IP> scope <external network>

Slightly related to this feature is the ability to scope gateway firewall rules to a particular interface. This is done via Applied To field:

You can select any CSP interface on the Tier-1 GW (used by the external network or non-distributed Org VDC networks) or nothing, which means the rule will be applied to uplink to Tier-0/VRF as well as to any CSP interface.

IP Spaces

IP Spaces is a big feature that will be delivered across multiple releases, where 10.4.1 is the first one. In short IP Spaces are new VCD object that allows managing individual IPs (floating IPs) and subnets (prefixes) independently across multiple networks/gateways. The main goal is to simplify the management of public IPs or prefixes for the provider as well as private subnets for the tenants, however additional benefits are being able to route on shared Provider Gateway (Tier-0/VRF), use same dedicated parent Tier-0/VRF for multiple Org VDC Edge GWs or the ability to re-connect Org VDC Edge GWs to a differeent parent Provider Gateway.

Use cases for IP Spaces:

  • Self-service for request of IPs/prefixes
  • routed public subnet for Org VDC network on shared Provider Gateway (DMZ use case, where VMs get public IPs with no NATing performed)
  • IPv6 routed networks on a shared Provider Gateway
  • tenant dedicated Provider Gateway used by multiple tenant Org VDC Edge Gateways
  • simplified management of public IP addresses across multiple Provider Gateways (shared or dedicated) all connected to the same network (internet)

In the legacy way the system administrator would create subnets at the external network / provider gateway level and then add static IP pools from those subnet for VCD to use. IPs from those IP pools would be allocated to tenant Org VDC Edge Gateways. The IP Spaces mechanism creates standalone IP Spaces (which are collections of IP ranges (e.g. 192.168.111.128-192.168.111.255) and blocks of IP Prefixes(2 blocks of 192.168.111.0/28 – 192.168.111.0/28, 192.168.111.16/28 and 1 block of 192.168.111.32/27).

A particular IP Space is then assigned to a Provider Gateway (NSX-T Tier-0 or VRF GW) as IP Space Uplink:

An IP Space can be used by multiple Provider Gateways.

The tenant Org VDC Edge Gateway connected to such IP Space enabled provider gateway can then request floating IPs (from IP ranges) 

or assign IP block to routable Org VDC network which results into route advertisement for such network.

In the above case such network should be also advertised to the internet, the parent Tier-0/VRF needs to have route advertisement manually configured (see below IP-SPACES-ROUTING policy) as VCD will not do so (contrary to NAT/LB/VPN case).

Note: The IP / IP Block assignments are done from the tenant UI. Tenant needs to have new rights added to see those features in the UI.

The number of IPs or prefixes the tenant can request is managed via quota system at the IP Space level. Note that the system administrator can always exceed the quota when doing the request on behalf of the tenant.

A provider gateway must be designated to be used for IP Spaces. So you cannot combine the legacy and IP Spaces method of managing IPs on the same provider gatway. There is currently no way of converting legacy provider gateway to the one IP Spaces enabled, but such functionality is planned for the future.

Tenant can create their own private IP Spaces which are used for their Org VDC Edge GWs (and routed Org VDC networks) implicitly enabled for IP Spaces. This simplifies the creation of new Org VDC networks where a unique prefix is automatically requested from the private IP Space. The uniqueness is important as it allows multiple Edge GWs to share same parent provider gateway.

Transparent Load Balancing

VMware Cloud Director 10.4.1 adds support for Avi (NSX ALB) Transparent Load Balancing. This allows the pool member to see the source IP of the client which might be needed for certain applications.

The actual implementation in NSX-T and Avi is fairly complicated and described in detail here: https://avinetworks.com/docs/latest/preserve-client-ip-nsxt-overlay/. This is due to the Avi service engine data path and the need to preserve the client source IP while routing the pool member return traffic back to the service engine node.

VCD will hide most of the complexity so that to actually enable the service three steps need to be taken:

  1. Transparent mode must be enabled at the Org VDC GW (Tier-1) level.

2. The pool members must be created via NSX-T Security Group (IP Set).

3. Preserve client IP must be configured on the virtual service.

Due to the data path implementation there are quite many restrictions when using the feature:

  • Avi version must be at least 21.1.4
  • the service engine group must be in A/S mode (legacy HA)
  • the LB service engine subnet must have at least 8 IPs available (/28 subnet – 2 for engines, 5 floating IPs and 1 GW) – see Floating IP explanation below
  • only in-line topology is supported. It means a client that is accessing the LB VIP and not going via T0 > T1 will not be able to access the VS
  • only IPv4 is supported
  • the pool members cannot be accessed via DNAT on the same port as the VS port
  • transparent LB should not be combined with non-transparent LB on the same GW as it will cause health monitor to fail for non-transparent LB
  • pool member NSX-T security group should not be reused for other VS on the same port

Floating IPs

When transparent load balancing is enabled the return traffic from the pool member cannot be sent directly to the client (source IP) but must go back to the service engine otherwise asymmetric routing happens and the traffic flows will be broken. This is implemented in NSX-T via N/S Service Insertion policy where the pool member (defined via security group) traffic is instead to its default GW redirected to the active engine with a floating IP. Floating IPs are from the service engine network subnet but are not from the DHCP range which assigns service engine nodes their primary IP. VCD will dedicate 5 IP from the LB Service network range for floating IPs. Note that multiple transparent VIPs on the same SEG/service network will share floating IP.

Multitenant Logging with VMware Cloud Director

Topic of multitenant logging in VMware Cloud Director (VCD) especially with NSX-T came up recently in multiple conversations, so I will summarize the capabilities and options as they are today.

Audit and Event Logs

Information about VCD events such as user logins, vApp operation, VM power-on etc are available in the Audit and Event log. Tenant can access their particular logs via API – AuditTrail Events and Tasks (type=task query) or directly in the UI.

Event Log UI

An additional option is to receive events and notifications via MQTT message bus as was described here.

NSX-V Networking Logs

With NSX-V each tenant would have one or more Org VDC Gateways each backed by its own NSX-V Edge Service Gateway (single or two VMs in HA mode). The tenant could configure their own syslog endpoint to send all Edge logs there. As each Edge could have two syslog entries, the provider could set up their syslog endpoint as well to receive the same logs.

Tenant Edge Gateway Syslog Configuration
Provider Edge Gateway Syslog Configuration

And this is how the configuration looks in NSX-V:

Distributed firewall logs are generated in vmkernel of each ESXi host. It means they are sent to syslog configured at ESXi level which is not multitenant. However VCD configures a rule tag which is unique for each Org VDC (its UUID) so it is possible to filter logs for a specific tenant.

Org VDC UUID rule tag configured for each DFW entry in NSX-V

NSX-T Networking Logs

Org VDC gateways in NSX-T Org VDCs are backed by NSX-T Tier-1 gateways which are logical entities on shared NSX-T Edge Nodes. This means it is not possible to configure tenant’s syslog on them as was done with NSX-V. Therefore VCD will configure Log label for each GW firewall entry. The log is composit of first 10 characters of the UUID of the GW parent object (Org VDC or Data Center Group), 5 characters of the GW name and 5 characters of the GW UUID.

Note that you need at least NSX-T 3.2 and VCD 10.4.

For distributed firewall the situation is identical to NSX-V. The only difference it that the log label is composit of first 5 characters of the Data Center Group name and 27 characters of its UUID.

NSX-T DFW Log Label

Additional improvements are planned in NSX-T 4.1+ to allow multitenant access to other log types.

Log Filtering and Forwarding

To test the functionality vRealize Log Insight supports up to 10 log forwarders. These can be configured in the Log Management section. In the screenshot below you can see Distributed Firewall for NSX-V and NSX-T and Gateway Firewall for NSX-T forwarding to tenant syslog 10.0.2.29 configured.

vRealize Log Insight filtering and forwarding

Control System Admin Access to VMware Cloud Director

When VMware Cloud Director is deployed in public environment setup it is a good practice to restrict the system admin access only for specific networks so no brute force attack can be triggered against the publicly available UI/API end points.

There is actually a relatively easy way to achieve this via any web application firewall (WAF) with URI access filter. The strategy is to protect only the provider authentication end points which is much easier than to try to distinguish between provider and tenant URIs.

As the access (attack) can be done either through UI or API the solution should address both. Let us first talk about the UI. The tenants and provider use specific URL to access their system/org context but we do not really need to care about this at all. The UI is actually using (public) APIs so there is nothing needed to harden the UI specifically if we harder the API endpoint. Well, the OAuth and SAML logins are exception so let me tackle them separately.

So how can you authenticate to VCD via API?

Integrated Authentication

The integrated basic authentication consisting of login/password is used for VCD local accounts and LDAP accounts. The system admin (provider context) uses /cloudapi/1.0.0/sessions/provider API endpoint while the tenants use /cloudapi/1.0.0/sessions.

The legacy (common for both providers and tenant) API endpoint /api/sessions has been deprecated since API version 33.0 (introduced in VCD 10.0). Note that deprecated does not mean removed and it is still available even with API version 36.x so you can expect to be around for some time as VCD keeps backward compatible APIs for few years.

You might notice that there is in a Feature Flags section the possibility to enable “Legacy Login Removal”.

Feature Flags

Enabling this feature will disable legacy login both for tenants and providers however only if you use alpha API version (in the case of VCD 10.3.3.1 it is 37.0.0-alpha-1652216327). So this is really only useful for testing your own tooling where you can force the usage of that particular API version. The UI and any 3rd party tooling will still use the main (supported) API versions where the legacy endpoint will still work.

However, you can forcefully disable it for provider context for any API version with the following CMT command (run from any cell, no need to restart the services):

/opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n vcloud.api.legacy.nonprovideronly -v true

The providers will need to use only the new cloudapi/1.0.0/providers/session endpoint. So be careful as it might break some legacy tools!

API Access Token Authentication

This is a fairly new method of authentication to VCD (introduced in version 10.3.1) that uses once generated secret token for API authentication. It is mainly used by automation or orchestration tools. The actual method of generating session token requires access to the tenant or provider oauth API endpoints:

/oauth/tenant/<tenant_name>/token

/oauth/provider/token

This makes it easy to disable provider context via URI filter.

SAML/OAuth Authentication via UI

Here we must distinguish the API and UI behavior. For SAML, the UI is using /login/org/<org-name>/… endpoint. The provider context is using the default SYSTEM org as the org name. So we must filter URI starting with /login/org/SYSTEM.

For OAuth the UI is using the same endpoint as API access token authentication /oauth/tenant vs /oauth/provider. /login/oauth?service=provider

For API SAML/OAuth logins cloudapi/1.0.0/sessions vs cloudapi/1.0.0/sessions/provider endpoints are used.

WAF Filtering Example

Here is an example how to set up URI filtering with VMware NSX Advanced Load Balancer.

  1. We need to obviously set up VCD cell (SSL) pool and Virtual Service for the external IP and port 443 (SSL).
  2. The virtual service application profile must be set to System-Secure-HTTP as we need to terminate SSL sessions on the load balancer in order to inspect the URI. That means the public SSL certificate must be uploaded to load balancer as well. The cells can actually use self signed certs especially if you use the new console proxy that does not require SSL pass through and works on port 443.
  3. In the virtual service go to Policies > HTTP Request and create following rules:
    Rule Name: Provider Access
    Client IP Address: Is Not: <admin subnets>
    Path: Criteria – Begins with:
    /cloudapi/1.0.0/sessions/provider
    /oauth/provider
    /login/oauth?service=provider
    /login/org/SYSTEM
    Content Switch: Local response – Status Code: 403.
WAF Access Rule

And this is what you can observe when trying to log in via integrated authentication from non-authorized subnets:

And here is an example of SAML login:

Layer 2 VPN to the Cloud – Part III

I feel like it is time for another update on VMware Cloud Director (VCD) capabilities regarding establishing L2 VPN between on-prem location and Org VDC. The previous blog posts were written in 2015 and 2018 and do not reflect changes related to usage of NSX-T as the underlying cloud network platform.

The primary use case for L2 VPN to the cloud is migration of workloads to the cloud when the L2 VPN tunnel is temporarily established until migration of all VMs on single network is done. The secondary use case is Disaster Recovery but I feel that running L2 VPN permanently is not the right approach.

But that is not the topic of today’s post. VCD does support setting up L2 VPN on tenant’s Org VDC Gateway (Tier-1 GW) from version 10.2 however still it is hidden, API-only feature (the GUI is finally coming soon … in VCD 10.3.1). The actual set up is not trivial as the underlying NSX-T technology requires first IPSec VPN tunnel to be established to secure the L2 VPN client to server communication. VMware Cloud Director Availability (VCDA) version 4.2 is an addon disaster recovery and migration solution for tenant workloads on top of VCD and it simplifies the set up of both the server (cloud) and client (on-prem) L2 VPN endpoints from its own UI. To reiterate, VCDA is not needed to set up L2 VPN, but it makes it much easier.

The screenshot above shows the VCDA UI plugin embeded in the VCD portal. You can see three L2 VPN session has been created on VDC Gateway GW1 (NSX-T Tier-1 backed) in ACME-PAYG Org VDC. Each session uses different L2 PVN client endpoint type.

The on-prem client can be existing NSX-T tier-0 or tier-1 GW, NSX-T autonomous edge or standalone Edge client. And each requires different type of configuration, so let me discuss each separately.

NSX-T Tier-0 or Tier-1 Gateway

This is mostly suitable for tenants who are running existing NSX-T environment on-prem. They will need to set up both IPSec and L2VPN tunnels directly in NSX-T Manager and is the most complicated process of the three options. On either Tier-0 or Tier-1 GW they will first need to set up IPSec VPN and L2 VPN client services, then the L2VPN session must be created with local and remote endpoint IPs and Peer Code that must be retrieved before via VCD API (it is not available in VCDA UI, but will be available in VCD UI in 10.3.1 or newer). The peer code contains all necessary configuration for the parent IPSec session in Base64 encoding.

Lastly local NSX-T segments to be bridged to the cloud can be configured for the session. The parent IPSec session will be created automagically by NSX-T and after while you should see green status for both IPSec and L2 VPN sessions.

Standalone Edge Client

This option leverages the very light (150 MB) OVA appliance that can be downloaded from NSX-T download website and actually works both with NSX-V and NSX-T L2 VPN server endpoints. It does not require any NSX installation. It provides no UI and its configuration must be done at the time of deployment via OVF parameters. Again the peer code must be provided.

Autonomous Edge

This is the prefered option for non-NSX environments. Autonomous edge is a regular NSX-T edge node that is deployed from OVA, but is not connected to NSX-T Manager. During the OVA deployment Is Autonomous Edge checkbox must be checked. It provides its own UI and much better performance and configurability. Additionally the client tunnel configuration can be done via the VCDA on-premises appliance UI. You just need to deploy the autonomous edge appliance and VCDA will discover it and let you manage it from then via its UI.

This option requires no need to retrieve the Peer Code as the VCDA plugin will retrieve all necessary information from the cloud site.

New Networking Features in VMware Cloud Director 10.3

The previous VMware Cloud Director 10.2 release brought many new networking features, the current one 10.3 continuous in the same fashion. Let me give you a brief run down.

UI Enhancements

The UI has been enhanced to surface formerly API only features such as the ability to configure dual stack IPv4/IPv6 networks:

or configure DHCP in gateway or network mode:

The service provider can now assign/change primary IP address of Org VDC Edge Gateway in the UI:

The Org VDC NSX-T Edge Cluster that is used to deploy DHCP service in network mode and vApp Edges (more on them later) can be set in the UI (previously Network Profile API > Services Edge Cluster).

It is also possible to configure (extend) an external network port group backing without using API.


New NSX-T Backed Provider VDC Features

As NSX-T backed PVDCs now support both Tier-0/VRF and port group backing for external networks, to avoid confusion the Tier-0/VRF GWs were separated into its own tab.

The port group backed external networks can be either traditional VDS port groups, or NSX-T segments. The latter option gives the ability to use NSX-T distributed firewall on such external network (provider managed directly in NSX-T).

Distributed Firewall now supports dynamic groups that can be defined utilizing VM Tag or VM name.

vApps support routed vApp networks including DHCP service on vApp isolated networks. This is achieved by deploying standalone Tier-1 GWs that are connected to Org VDC networks via service interface. The Org VDC network must be overlay backed (not VLAN). vApp fencing is still not supported as NSX-T does not provide this functionality.

A few additional small enhancements ranging from support for Guest VLAN tagging, reflexive NAT to DHCP pool management.

Provider VDC with no NSX

The creation of Provider VDC does not require network pool specification anymore. Such PVDC will thus not provide any NSX-V or T features (routing, DHCP, firewalling, load balancing). The Org VDC network can than be backed by VLAN network pool or use VDS backed imported direct networks.

NSX-V vs NSX-T Feature Parity

Let me conclude with traditional NSX-V / NSX-T VCD feature comparison chart (new updates highlighted in green).