Topic of multitenant logging in VMware Cloud Director (VCD) especially with NSX-T came up recently in multiple conversations, so I will summarize the capabilities and options as they are today.
Audit and Event Logs
Information about VCD events such as user logins, vApp operation, VM power-on etc are available in the Audit and Event log. Tenant can access their particular logs via API – AuditTrail Events and Tasks (type=task query) or directly in the UI.
An additional option is to receive events and notifications via MQTT message bus as was described here.
NSX-V Networking Logs
With NSX-V each tenant would have one or more Org VDC Gateways each backed by its own NSX-V Edge Service Gateway (single or two VMs in HA mode). The tenant could configure their own syslog endpoint to send all Edge logs there. As each Edge could have two syslog entries, the provider could set up their syslog endpoint as well to receive the same logs.
And this is how the configuration looks in NSX-V:
Distributed firewall logs are generated in vmkernel of each ESXi host. It means they are sent to syslog configured at ESXi level which is not multitenant. However VCD configures a rule tag which is unique for each Org VDC (its UUID) so it is possible to filter logs for a specific tenant.
NSX-T Networking Logs
Org VDC gateways in NSX-T Org VDCs are backed by NSX-T Tier-1 gateways which are logical entities on shared NSX-T Edge Nodes. This means it is not possible to configure tenant’s syslog on them as was done with NSX-V. Therefore VCD will configure Log label for each GW firewall entry. The log is composit of first 10 characters of the UUID of the GW parent object (Org VDC or Data Center Group), 5 characters of the GW name and 5 characters of the GW UUID.
Note that you need at least NSX-T 3.2 and VCD 10.4.
For distributed firewall the situation is identical to NSX-V. The only difference it that the log label is composit of first 5 characters of the Data Center Group name and 27 characters of its UUID.
Additional improvements are planned in NSX-T 4.1+ to allow multitenant access to other log types.
Log Filtering and Forwarding
To test the functionality vRealize Log Insight supports up to 10 log forwarders. These can be configured in the Log Management section. In the screenshot below you can see Distributed Firewall for NSX-V and NSX-T and Gateway Firewall for NSX-T forwarding to tenant syslog 10.0.2.29 configured.
When VMware Cloud Director is deployed in public environment setup it is a good practice to restrict the system admin access only for specific networks so no brute force attack can be triggered against the publicly available UI/API end points.
There is actually a relatively easy way to achieve this via any web application firewall (WAF) with URI access filter. The strategy is to protect only the provider authentication end points which is much easier than to try to distinguish between provider and tenant URIs.
As the access (attack) can be done either through UI or API the solution should address both. Let us first talk about the UI. The tenants and provider use specific URL to access their system/org context but we do not really need to care about this at all. The UI is actually using (public) APIs so there is nothing needed to harden the UI specifically if we harder the API endpoint. Well, the OAuth and SAML logins are exception so let me tackle them separately.
So how can you authenticate to VCD via API?
The integrated basic authentication consisting of login/password is used for VCD local accounts and LDAP accounts. The system admin (provider context) uses /cloudapi/1.0.0/sessions/provider API endpoint while the tenants use /cloudapi/1.0.0/sessions.
The legacy (common for both providers and tenant) API endpoint /api/sessions has been deprecated since API version 33.0 (introduced in VCD 10.0). Note that deprecated does not mean removed and it is still available even with API version 36.x so you can expect to be around for some time as VCD keeps backward compatible APIs for few years.
You might notice that there is in a Feature Flags section the possibility to enable “Legacy Login Removal”.
Enabling this feature will disable legacy login both for tenants and providers however only if you use alpha API version (in the case of VCD 10.3.3.1 it is 37.0.0-alpha-1652216327). So this is really only useful for testing your own tooling where you can force the usage of that particular API version. The UI and any 3rd party tooling will still use the main (supported) API versions where the legacy endpoint will still work.
However, you can forcefully disable it for provider context for any API version with the following CMT command (run from any cell, no need to restart the services):
The providers will need to use only the new cloudapi/1.0.0/providers/session endpoint. So be careful as it might break some legacy tools!
API Access Token Authentication
This is a fairly new method of authentication to VCD (introduced in version 10.3.1) that uses once generated secret token for API authentication. It is mainly used by automation or orchestration tools. The actual method of generating session token requires access to the tenant or provider oauth API endpoints:
This makes it easy to disable provider context via URI filter.
SAML/OAuth Authentication via UI
Here we must distinguish the API and UI behavior. For SAML, the UI is using /login/org/<org-name>/… endpoint. The provider context is using the default SYSTEM org as the org name. So we must filter URI starting with /login/org/SYSTEM.
For OAuth the UI is using the same endpoint as API access token authentication /oauth/tenant vs /oauth/provider./login/oauth?service=provider
For API SAML/OAuth logins cloudapi/1.0.0/sessions vs cloudapi/1.0.0/sessions/provider endpoints are used.
WAF Filtering Example
Here is an example how to set up URI filtering with VMware NSX Advanced Load Balancer.
We need to obviously set up VCD cell (SSL) pool and Virtual Service for the external IP and port 443 (SSL).
The virtual service application profile must be set to System-Secure-HTTP as we need to terminate SSL sessions on the load balancer in order to inspect the URI. That means the public SSL certificate must be uploaded to load balancer as well. The cells can actually use self signed certs especially if you use the new console proxy that does not require SSL pass through and works on port 443.
In the virtual service go to Policies > HTTP Request and create following rules: Rule Name: Provider Access Client IP Address: Is Not: <admin subnets> Path: Criteria – Begins with: /cloudapi/1.0.0/sessions/provider /oauth/provider /login/oauth?service=provider /login/org/SYSTEM Content Switch: Local response – Status Code: 403.
And this is what you can observe when trying to log in via integrated authentication from non-authorized subnets:
I feel like it is time for another update on VMware Cloud Director (VCD) capabilities regarding establishing L2 VPN between on-prem location and Org VDC. The previous blog posts were written in 2015 and 2018 and do not reflect changes related to usage of NSX-T as the underlying cloud network platform.
The primary use case for L2 VPN to the cloud is migration of workloads to the cloud when the L2 VPN tunnel is temporarily established until migration of all VMs on single network is done. The secondary use case is Disaster Recovery but I feel that running L2 VPN permanently is not the right approach.
But that is not the topic of today’s post. VCD does support setting up L2 VPN on tenant’s Org VDC Gateway (Tier-1 GW) from version 10.2 however still it is hidden, API-only feature (the GUI is finally coming soon … in VCD 10.3.1). The actual set up is not trivial as the underlying NSX-T technology requires first IPSec VPN tunnel to be established to secure the L2 VPN client to server communication. VMware Cloud Director Availability (VCDA) version 4.2 is an addon disaster recovery and migration solution for tenant workloads on top of VCD and it simplifies the set up of both the server (cloud) and client (on-prem) L2 VPN endpoints from its own UI. To reiterate, VCDA is not needed to set up L2 VPN, but it makes it much easier.
The screenshot above shows the VCDA UI plugin embeded in the VCD portal. You can see three L2 VPN session has been created on VDC Gateway GW1 (NSX-T Tier-1 backed) in ACME-PAYG Org VDC. Each session uses different L2 PVN client endpoint type.
The on-prem client can be existing NSX-T tier-0 or tier-1 GW, NSX-T autonomous edge or standalone Edge client. And each requires different type of configuration, so let me discuss each separately.
NSX-T Tier-0 or Tier-1 Gateway
This is mostly suitable for tenants who are running existing NSX-T environment on-prem. They will need to set up both IPSec and L2VPN tunnels directly in NSX-T Manager and is the most complicated process of the three options. On either Tier-0 or Tier-1 GW they will first need to set up IPSec VPN and L2 VPN client services, then the L2VPN session must be created with local and remote endpoint IPs and Peer Code that must be retrieved before via VCD API (it is not available in VCDA UI, but will be available in VCD UI in 10.3.1 or newer). The peer code contains all necessary configuration for the parent IPSec session in Base64 encoding.
Standalone Edge Client
This option leverages the very light (150 MB) OVA appliance that can be downloaded from NSX-T download website and actually works both with NSX-V and NSX-T L2 VPN server endpoints. It does not require any NSX installation. It provides no UI and its configuration must be done at the time of deployment via OVF parameters. Again the peer code must be provided.
This is the prefered option for non-NSX environments. Autonomous edge is a regular NSX-T edge node that is deployed from OVA, but is not connected to NSX-T Manager. During the OVA deployment Is Autonomous Edge checkbox must be checked. It provides its own UI and much better performance and configurability. Additionally the client tunnel configuration can be done via the VCDA on-premises appliance UI. You just need to deploy the autonomous edge appliance and VCDA will discover it and let you manage it from then via its UI.
This option requires no need to retrieve the Peer Code as the VCDA plugin will retrieve all necessary information from the cloud site.
The previous VMware Cloud Director 10.2 release brought many new networking features, the current one 10.3 continuous in the same fashion. Let me give you a brief run down.
The UI has been enhanced to surface formerly API only features such as the ability to configure dual stack IPv4/IPv6 networks:
or configure DHCP in gateway or network mode:
The service provider can now assign/change primary IP address of Org VDC Edge Gateway in the UI:
It is also possible to configure (extend) an external network port group backing without using API.
New NSX-T Backed Provider VDC Features
As NSX-T backed PVDCs now support both Tier-0/VRF and port group backing for external networks, to avoid confusion the Tier-0/VRF GWs were separated into its own tab.
The port group backed external networks can be either traditional VDS port groups, or NSX-T segments. The latter option gives the ability to use NSX-T distributed firewall on such external network (provider managed directly in NSX-T).
Distributed Firewall now supports dynamic groups that can be defined utilizing VM Tag or VM name.
vApps support routed vApp networks including DHCP service on vApp isolated networks. This is achieved by deploying standalone Tier-1 GWs that are connected to Org VDC networks via service interface. The Org VDC network must be overlay backed (not VLAN). vApp fencing is still not supported as NSX-T does not provide this functionality.
A few additional small enhancements ranging from support for Guest VLAN tagging, reflexive NAT to DHCP pool management.
Provider VDC with no NSX
The creation of Provider VDC does not require network pool specification anymore. Such PVDC will thus not provide any NSX-V or T features (routing, DHCP, firewalling, load balancing). The Org VDC network can than be backed by VLAN network pool or use VDS backed imported direct networks.
NSX-V vs NSX-T Feature Parity
Let me conclude with traditional NSX-V / NSX-T VCD feature comparison chart (new updates highlighted in green).
The tool’s main purpose is to automate migration of VMware Cloud Director Organization Virtual Data Centers that are NSX-V backed to a NSX-T backed Provider Virtual Data Center. The original article describes how exactly it is accomplished and what is the impact of migrated workloads from the networking and compute perspective.
The migration tool is continually developed and additional features are added to either enhance its usability (improved roll back, simplified L2 bridging setup) or to support more use cases based on new features in VMware Cloud Director (VCD). And then there is a new assessment mode! Let me go into more details.
Directly Connected Networks
The VCD release 10.2.2 added support to use in NSX-T backed Org VDCs directly connected Organization VDC networks. Such networks are not connected to a VDC Gateway and instead are just connected directly to a port group backed external network. The typical usage is for service networks, backup networks or colocation/MPLS networks where routing via the VDC Gateway is not desired.
The migration tool now supports migration of these networks. Let me describe how it is done.
The VCD external network in NSX-V backed PVDC is port group backed. It can be backed by one or more port groups that are typically manually created VLAN port groups in vCenter Server or they can also be VXLAN backed (system admin would create NSX-V logical switch directly in NSX-V and then use its DVS port groups for the external network). The system administrator then can create in the Org VDC a directly connected network that is connected to this external network. It inherits its parent’s IPAM (subnet, IP pools) and when tenant connects a VM to it it is just wired to the backing port group.
The migration tool first detects if the migrated Org VDC direct network is connected to an external network that is also used by other VDCs and based on that behaves differently.
Colocation / MPLS use case
If the external network is not used by any other Org VDC and the backing port group(s) is VLAN type (if more port groups are used they must have the same VLAN), then it will create in NSX-T logical segment in VLAN transport zone (specified in the YAML input spec) and import it to the target Org VDC as imported network. The reason why direct connection to external network is not used is to limit the external network sprawl as the import network feature perfectly matches the original use case intent. After the migration the source external network is not removed automatically and the system administrator should clean them up including the vCenter backing port groups at their convenience.
Note that no bridging is performed between the source and target network as it is expected the VLAN is trunked across source and target environments.
The diagram below shows the source Org VDC on the left and the target one on the right.
Service Network Use Case
If the external network is used by other Org VDCs, the import VLAN segment method cannot be used as each imported Org VDC network must be backed by its own logical segment and has its own IPAM (subnet, pool). In this case the tool will just create directly connected Org VDC network in the target VDC connected to the same external network as the source. This requires that the external network is scoped to the target PVDC – if the target PVDC is using different virtual switch you will need first to create regular VLAN backed port group there and then add it to the external network (API only currently). Also only VLAN backed port group can be used as no bridging is performed for such networks.
The other big feature is the assessment mode. The main driver for this feature is to enable service providers to see how much ready their environment is for the NSX-V to T migration and how much redisign will be needed. The assessment can be triggered against VCD 10.0, 10.1 or 10.2 environments and only requires VCD API access (the environment does not yet need to be prepared for NSX-T).
The tool will during the assessment check all or specified subset of NSX-V backed Org VDCs and assess every feature used there that impacts its migration viability. Then it will provide detailed and summarized report where you can see what ratio of the environment *could* be migrated (once upgraded to the latest VCD 10.2.2). This is provided in Org VDC, VM and used RAM units.
The picture below shows example of the summary report:
Note that if there is one vApp in a particular Org VDC that cannot be migrated, the whole Org VDC is counted as not possible to migrate (in all metrics VM and RAM). Some features are categorized as blocking – they are simple not supported by either NSX-T backed Org VDC or the migration tool (yet), but some issues can be mitigated/fixed (see the remediation recommendations in the user guide).
As mentioned the migration tool is continuosly developed and improved. Together with the next VMware Cloud Director version we can expect additional coverage of currently unsupported features. Especially the shared network support is high on the radar.