This is an updated blog post of the original vCloud Director 10: NSX-T Integration to include all VMware Cloud Director 10.1 related updates.
VMware Cloud Director relies on NSX network virtualization platform to provide on-demand creation and management of networks and networking services. NSX for vSphere has been supported for long time and vCloud Director allows most of its feature to be used by its tenants. However as VMware slowly shifts away from NSX for vSphere and pushes forward modern, fully rewritten NSX-T networking platform, I want to focus in this article on its integration with vCloud Director.
Let me start with highlighting that NSX-T is evolving very quickly. It means each release (now at version 3.0) adds major new functionality. Contrast that with NSX-V which is essentially feature complete in a sense that no major functionality change is happening there. The fast pace of NSX-T development is a challenge for any cloud management platforms as they have to play the catch up game.
The first release of vCloud Director that supported NSX-T was 9.5. It supported only NSX-T version 2.3 and the integration was very basic. All vCloud Director could do was to import NSX-T overlay logical segments (virtual networks) created manually by system administrator. These networks were imported into a specific tenant Org VDC as Org VDC networks.
The next version of vCloud Director – 9.7 supported only NSX-T 2.4 and from the feature perspective not much had changed. You could still only import networks. Under the hood the integration however used completely new set of NSX-T policy based APIs and there were some minor UI improvements in registering NSX-T Manager.
vCloud Director version 10 for the first time introduced on-demand creation of NSX-T based networks and network services. NSX-T version 2.5 was required.
The latest Cloud Director version 10.1 is extending NSX-T support with new features.
Note: Cloud Director 10.1.0 does not support NSX-T 3.0. That support will come in the next patch release (10.1.1).
While I do not want to go too deep into the actual NSX-T architecture I fully expect that not all readers of this blog are fully familiar with NSX-T and how it differs from NSX-V. Let me quickly highlight major points that are relevant for topic of this blog post.
- NSX-T is vCenter Server independent, which means it scales independently from vCenter domain. NSX-T essentially communicates with ESXi hosts directly (they are called host transport nodes). The hosts must be prepared with NSX-T vibs that are incompatible with NSX-V which means a particular host cannot be used by NSX-V and NSX-T at the same time.
- Overlay virtual networks use Geneve encapsulation protocol which is incompatible with VXLAN. The concept of Controller cluster that keeps state and transport zone is very similar to NSX-V. The independence from VC mentioned in the previous point means vSphere distributed switch cannot be used, instead NSX-T brings its own N-VDS switch. It also means that there is concept of underlay (VLAN) networks managed by NSX-T. All overlay and underlay networks managed by NSX-T are called logical segments.
- Networking services (such as routing, NATing, firewalling, DNS, DHCP, VPN, load balancing) are provided by Tier-0 or Tier-1 Gateways that are functionally similar to NSX-V ESGs but are not instantiated in dedicated VMs. Instead they are services running on shared Edge Cluster. The meaning of Edge Cluster is very different from the usage in NSX-V context. Edge Cluster is not a vSphere cluster, instead it is cluster of Edge Transport Nodes where each Edge Node is VM or bare metal host.
- While T0 and T1 Gateways are similar they are not identical, and each has specific purpose or set of services it can offer. Distributed routing is implicitly provided by the platform unless a stateful networking service requires routing through single point. T1 GWs are usually connected to single T0 GW and that connection is managed automatically by NSX-T.
- Typically you would have one or small number of T0 GWs in ECMP mode providing North-south routing (concept of Provider Edge) and large number of T1 GWs connected to T0 GW, each for a different tenant to provide tenant networking (concept of Tenant Edge).
VMware Cloud Director Integration
As mentioned above since NSX-T is not vCenter Server dependent, it is attached to Cloud Director independently from VC.
(Geneve) network pool creation is the same as with VXLAN – you provide mapping to an existing NSX-T overlay transport zone.
Now you can create Provider VDC (PVDC) which is as usual mapped to a vSphere cluster or resource pool. A particular cluster used by PVDC must be prepared for NSX-V or NSX-T and all clusters must share the same NSX flavor. It means you cannot mix NSX-V clusters with NSX-T in the same PVDC. However you can easily share NSX-V and NSX-T in the same vCenter Server, you will then just have to create multiple PVDCs. Although NSX-T can span VCs, PVDC cannot – that limitation still remains. When creating NSX-T backed PVDC you will have to specify the Geneve Network Pool created in the previous step.
Within PVDC you can start creating Org VDCs for your tenants – no difference there.
Org VDCs without routable networks are not very useful. To remedy this we must create external networks and Org VDC Edge Gateways. Here the concept quite differs from NSX-V. Although you could deploy provider ECMP Edges with NSX-V as well (and I described here how to do so), it is mandatory with NSX-T. You will have to pre-create T0 GW in NSX-T Manager (ECMP active – active is recommended). This T0 GW will provide external networking access for your tenants and should be routable from the internet. Instead of just importing external network port group how you would do with NSX-V you will import the whole T0 GW in Cloud Director.
During the import you will also have to specify IP subnets and pools that the T0 GW can use for IP sub-allocation to tenants.
Once the external network exist you can create tenant Org VDC Edge Gateways. The service provider can pick specific existing NSX-T Edge Cluster for their placement.
T1 GWs are always deployed in Active x Standby configuration, the placement of active node is automated by NSX-T. The router interlink between T0 and T1 GWs is also created automatically by NSX-T. It is possible to disconnect Org VDC Edge GW from Tier-0 GW (this is for example used in NSX-V to NSX-T migration scenario).
During the Org VDC Edge Gateway the service providers also allocates range of IPs from the external network. Whereas with NSX-V these would actually be assigned to the Org VDC Edge Gateway uplink, this is not the case with NSX-T. Once they are actually used in a specific T1 NAT rule, NSX-T will automatically create static route on the T0 GW and start routing to the correct T1 GW.
There are four major types of NSX-T based Org VDC networks and three of them are available to be created via UI:
- Isolated: Layer 2 segment not connected to T1 GW. DHCP service is not available on this network (contrary to NSX-V implementation).
- Routed: Network that is connected to T1 GW. The default is NAT-routed which means its subnet is not announced to upstream T0 GW and only way to route to reach it from outside is to use DNAT rule on T1 GW from a allocated external IP address.
Cloud Director version 10.1 introduces fully routed network more on it below.
- Imported: Existing NSX-T overlay logical segment can be imported (same as in VCD 9.7 or 9.5). Its routing/external connectivity must be managed outside of vCloud Director.
- In OpenAPI (POST /1.0.0/OrgVdcNetwork) you will find one more network type: DIRECT_UPLINK. This is for a specific NFV use case. Such network is connected directly to T0 GW with external interface. Note this feature is not officially supported!
Note that only Isolated and routed networks can be created by tenants.
In direct connect use case it is desirable to announce routed Org VDC networks upstream so workloads are reachable directly without any NAT. This is possible in Cloud Director version 10.1, but requires dedicated Tier-0 GW for the particular tenant. The provider must create new Tier-0, connect it to tenant’s particular direct connect transit VLAN and then when deploying Org VDC Edge GW select Dedicate External Network switch.
Cloud Director will make sure that dedicated External Network Tier-0 GW is not accessible to any other Org VDC Edge Gateway.
Tenant can then configure on its Org VDC Edge GW BGP routing, which is in fact set by Cloud Director on the dedicated Tier-0 GW (while Tier-0 to Tier-1 routes are auto-plumbed by NSX).
Tenant Networking Services
Currently the following T1 GW networking services are available to tenants:
- Firewall (with IP Sets and Security Groups based on network objects)
- DHCP (without binding and relay)
- DNS forwarding
- IPSec VPN: Policy based with pre shared key is supported.
All other services are currently not supported. This might be due to NSX-T not having them implemented yet, or Cloud Director not catching up yet. Expect big progress here with each new Cloud Director and NSX-T release.
All NSX-T related features are available in the Cloud Director OpenAPI (CloudAPI). The pass through API approach that you might be familiar with from the Advanced Networking NSX-V implementation is not used!
I have summarized all Cloud Director networking features in the following table for quick comparison between NSX-V and NSX-T.
13 thoughts on “VMware Cloud Director 10.1: NSX-T Integration”
External networks, would you expose public IP ranges directly to the T1?
In NSX-T external IP are routed by T0 all the way to T1. You can use individual IPs or ranges.
I really hope that things move forward fast, because right now the feature gap with NSX-V is embarassing.
Have you tested the use of NSX-T Distributed Firewall in 10.2? It says it’s now supported but I cannot find the options to enable the Distributed Firewall anywhere despite following the Vmware documentation.
Yes. I plan to have blog post on that subject. Things to check: your Data Center group is created properly (Compute Provider Scope is defined at VC or PVDC level), user and organization rights include DFW.
Thanks for this, I was able to get it working with your comments and assistance from VMware support.
I added a Compute Provider Scope to my vCenter object in VCD. Not sure if this is required but I just put a token value in there for testing. One this was added to the vCenter object it automatically populated on my provider VDC. This alone had no impact on the DFW issue or an issue I was having with Org VDC groups.
Org VDC group is definitely required. In my case, the group was creating in the UI but throwing a failure message due to a timeout in reaching the VCD’s public address. I could still see the group in the UI but it had no mention of DFW anywhere. I had to create an internal DNS record so the Cells resolve the public address to the internal web IP of the cells. This allowed the Org VDC group to create correctly. Strange that the DNS hasn’t been an issue until now. It seems all other tasks in VCD to the internal IP/name of the cells – this is the first task that has failed for me.
What is the correct design for DNS resolution of the public address internally? Should it be a split DNS round robin scenario as I’ve implemented now?
VCD cell needs to be able to reach other VCD instances and itself (public endpoint) for “multisite=global” API calls and for catalog synch.
Our environment is using VCD with NSX-V now, my external networks is public IP address..
When I used VCD + NSX-V, tenants can perform self-service SNAT / DNAT directly from the Edges gateway. (Public IP NAT to Internal IP (web server) / Internal IP NAT (web server) to Public IP), but I can’t do that in NSX-T.
My test scenario VCD + NSX-T, the external networks is public IP address (T0), T1 is org networks (geneve), when I perform SNAT / DNAT on my web server, the public network to the web server cannot pass through.
Is it the architecture is different? If yes, how can I achieve the similar architecture from NSX-V to NSX-T?
T1 performs the NAT. The public NAT IP is advertised to T0 and T0 routes the public IP to T1. T0 does not own the public IPs.
Do you mean, I need to create routed network (Public IP) in T1?
Because based on my understanding T0 is face to my physical router, I thought the external network backing to T0 created in VCD suppose to be public IP?
No. T0 is imported as external network to VCD. A public subnet is assigned in VCD to this external network. IPs from the subnet are allocated to Org VDC GWs. When such IP is used for NAT or similar, T1 will advertise it to T0. T0 needs to be able to route this IP (or the whole assigned subnet), which is usually accomplished by running BGP between T0 and upstream routers. In such case T0 would just have transit connetions to physical routers. I have other options described here: https://fojta.wordpress.com/2021/01/20/provider-networking-in-vmware-cloud-director/
Thank you very much.
After read through the article you shared..
I managed to make it work that Tier-0 uplinks are directly connected to the external subnet port group..
I still figure out how to do the recommended option.
Appreciate in the future you can share more VCD + NSX-T architecture in your blog.