Layer 2 VPN to the Cloud – Part III

I feel like it is time for another update on VMware Cloud Director (VCD) capabilities regarding establishing L2 VPN between on-prem location and Org VDC. The previous blog posts were written in 2015 and 2018 and do not reflect changes related to usage of NSX-T as the underlying cloud network platform.

The primary use case for L2 VPN to the cloud is migration of workloads to the cloud when the L2 VPN tunnel is temporarily established until migration of all VMs on single network is done. The secondary use case is Disaster Recovery but I feel that running L2 VPN permanently is not the right approach.

But that is not the topic of today’s post. VCD does support setting up L2 VPN on tenant’s Org VDC Gateway (Tier-1 GW) from version 10.2 however still it is hidden, API-only feature (the GUI is finally coming soon … in VCD 10.3.1). The actual set up is not trivial as the underlying NSX-T technology requires first IPSec VPN tunnel to be established to secure the L2 VPN client to server communication. VMware Cloud Director Availability (VCDA) version 4.2 is an addon disaster recovery and migration solution for tenant workloads on top of VCD and it simplifies the set up of both the server (cloud) and client (on-prem) L2 VPN endpoints from its own UI. To reiterate, VCDA is not needed to set up L2 VPN, but it makes it much easier.

The screenshot above shows the VCDA UI plugin embeded in the VCD portal. You can see three L2 VPN session has been created on VDC Gateway GW1 (NSX-T Tier-1 backed) in ACME-PAYG Org VDC. Each session uses different L2 PVN client endpoint type.

The on-prem client can be existing NSX-T tier-0 or tier-1 GW, NSX-T autonomous edge or standalone Edge client. And each requires different type of configuration, so let me discuss each separately.

NSX-T Tier-0 or Tier-1 Gateway

This is mostly suitable for tenants who are running existing NSX-T environment on-prem. They will need to set up both IPSec and L2VPN tunnels directly in NSX-T Manager and is the most complicated process of the three options. On either Tier-0 or Tier-1 GW they will first need to set up IPSec VPN and L2 VPN client services, then the L2VPN session must be created with local and remote endpoint IPs and Peer Code that must be retrieved before via VCD API (it is not available in VCDA UI, but will be available in VCD UI in 10.3.1 or newer). The peer code contains all necessary configuration for the parent IPSec session in Base64 encoding.

Lastly local NSX-T segments to be bridged to the cloud can be configured for the session. The parent IPSec session will be created automagically by NSX-T and after while you should see green status for both IPSec and L2 VPN sessions.

Standalone Edge Client

This option leverages the very light (150 MB) OVA appliance that can be downloaded from NSX-T download website and actually works both with NSX-V and NSX-T L2 VPN server endpoints. It does not require any NSX installation. It provides no UI and its configuration must be done at the time of deployment via OVF parameters. Again the peer code must be provided.

Autonomous Edge

This is the prefered option for non-NSX environments. Autonomous edge is a regular NSX-T edge node that is deployed from OVA, but is not connected to NSX-T Manager. During the OVA deployment Is Autonomous Edge checkbox must be checked. It provides its own UI and much better performance and configurability. Additionally the client tunnel configuration can be done via the VCDA on-premises appliance UI. You just need to deploy the autonomous edge appliance and VCDA will discover it and let you manage it from then via its UI.

This option requires no need to retrieve the Peer Code as the VCDA plugin will retrieve all necessary information from the cloud site.

Layer 2 VPN to the Cloud – Part II

Almost 3 years ago I have published an article how to set up layer 2 VPN between on-prem vSphere environment and vCloud Director Org VDC.

As both vCloud Director and NSX evolved quite a bit since to simplify the whole set up, here comes the part II.

Let me first summarize the use case:

The tenant has an application that resides on 3 different VLAN based networks running in its own (vSphere) datacenter. The networks are routed with existing physical router. The tenant wants to extend 2 of these networks to cloud for cloud bursting or DR purposes, but not the 3rd one (for example because there runs a physical database server).

The following diagram shows the setup.

The main advancements are:

  • vCloud Director natively supports NSX L2 VPN (VCD 8.20 or newer needed).
  • NSX now (since 6.2) supports configuration of unstretched networks directly (no static routes are necessary anymore)
  • This means the full setup can be done by the tenant in self-service fashion

Here are the steps:

  • The tenant will deploy freely available NSX Standalone Edge in its datacenter connected to trunk port with 2 VLANs mapped (10 and 11). Additional network configuration is necessary (forged transmits and promiscuous mode or sink port creation – see the link)
  • In the cloud Org VDC tenant deploys two routed Org VDC networks with identical subnets and gateways as networks A and B. These networks must be connected to the Org VDC Edge GW via subinterface (there can be up to 200 such networks on single Edge). The Org VDC Edge must have advanced networking enabled.
  • Tenant enables and configures L2VPN server on its Org VDC Edge GW. Note that this is a premium feature that the service provider must enable in Organization first (see this blog post).
  • Before the L2VPN tunnel is established the following must be taken into account:
    • The Org VDC Edge GW IP addresses are identical with the on-prem existing physical router. Therefore Egress Optimization Gateway addresses must be entered in the Peer Site configuration. That will prevent the Org VDC Edge GW from sending ARP replies over the tunnel.
    • The same must be performed on the Standalone NSX Edge via CLI (see egress-optimize command here).
    • The non-stretched network (subnet C) must be configured on the Org VDC Edge GW so it knows that the subnet is reachable through the tunnel and not via its upstream interface(s). This option however is not in vCloud UI, instead vCloud networking API must be used.
      Edit 3/26/2018: This does not works for standalone NSX Edges. See the end of the article for more details.
      Alternatively the provider could configure non-stretched network directly in the NSX UI:
    • Finally, the tunnel can be established by configuring L2VPN server details on the on-prem Standalone NSX Edge L2VPN client (endpoint IP, port, credentials, encryption) and providing VLAN to tunnel mappings.
    • Note to find the Org VDC network subinterface tunnel mapping vCloud API must be used again:

Edit 3/26/2018:

After multiple questions regarding unstretched networks and some testing I need to make some clarifications.

The routing of unstretched networks through the tunnel is achieved via static routes configured on the Edge GW. So in principle it still works the same way as described in the original article, the difference doing it via UI/API is that the setting of the IPs and routes is automatic.

The server Edge routing table looks like this:


show ip route

S 0.0.0.0/0 [1/0] via 10.0.2.254
C 10.0.2.0/24 [0/0] via 10.0.2.121
C 169.254.64.192/26 [0/0] via 169.254.64.193
C 169.254.255.248/30 [0/0] via 169.254.255.249
C 192.168.100.0/24 [0/0] via 192.168.100.1
C 192.168.101.0/24 [0/0] via 192.168.101.1
S 192.168.102.0/24 [1/0] via 169.254.64.194

show ip address

...

17: vNic_4094@br-sub: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:50:56:88:31:21 brd ff:ff:ff:ff:ff:ff
inet 169.254.64.193/26 brd 169.254.64.255 scope global vNic_4094
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe88:3121/64 scope link
valid_lft forever preferred_lft forever

You can see that the 169.254.64.193 IP address was autoassigned to the 4096 tunnel interface and static route was set to route the unstretched network to the other side via IP 169.254.64.194. The assignment of the .194 address on the other Edge will happen only if that Edge is managed by NSX and is actually performing routing! This is in fact not true for the use case above (with standalone Edge and existing physical router). Therefore the following manual approach must be taken:

  1. Create Org VDC transit network with arbitrary small subnet (e.g. 169.254.200.0/29) in the cloud. Assign IP .1 as the gateway on the Org VDC Edge. This network will not be used for workloads, it is used just for routing to unstretched network.
  2. Create corresponding VLAN transit network on-prem. Assign IP .2 as its gateway interface on the existing router (note the IP addresses of the routing intefaces in #1 and #2 are different).
  3. Create L2 VPN tunnel as before, however also stretch the transit network but do not optimize its GW (no need as on-prem and cloud are using different IPs).
  4. Create static routes on the Org VDC Edge GW to route to on-prem unstretched networks  via the 169.254.200.2 transit network router IP.

Note that this is approach very similar to the original blog post. The only difference is that we must create separate transit network as vCloud Director does not support multiple subnets on the same Edge GW interface.

NSX L2 Bridging Options

I had recently multiple discussions about NSX and its Layer 2 bridging capabilities with various service providers. Let me summarize some important points and considerations when you would use which.

Why?

Let’s start with simple question – why would you need layer 2 bridging? Here are some use cases:

  • The end-user wants to burst their application to the cloud but wants to keep certain components on-site and because its legacy application it cannot be re-IP’d or requires single subnet communication.
  • The service provider is building new cloud offering next to legacy business (collocation, managed services) and wants to enable existing customers to migrate or extend their workloads to the cloud seamlessly (no IP address changes)

How?

NSX offers three ways how to bridge layer two networks.

Layer 2 VPN

This is proprietary VPN solution which enables to create encrypted tunnel across IP networks between Edge Service Gateways that stitches one or more L2 networks. These Edge Gateways can be deployed in different management domains and there is also option of deploying standalone Edge which does not require NSX license. This is great for the cloud bursting use case. I have blogged about L2VPN already in the past here.

While this option is very flexible it is also quite CPU intensive for both the L2 VPN Client and Server Edge VMs. This option provides up to 2Gb throughput.

NSX Native L2 Bridging

L2 bridge is created in the ESXi VMkernel hypervisor by deploying a Logical router control VM. The control VM is used only for the bridge configuration and its pinning to a particular ESXi host. As the bridging happens in the VMkernel it is possible to achieve impressive line rate (10 Gb) throughput.

It is possible to bridge only VXLAN based logical switch with VLAN based port group. The same physical uplink must be utilized so this means that the VLAN port group must be on the same vSphere Distributed Switch (vDS) that is prepared with the VXLAN VTEP and where the VXLAN logical switch portgroups are created.

L2Bridge

VLAN and VXLAN portgroups are on the same vDS
VLAN and VXLAN portgroups are on the same vDS

The above fact prohibits a scenario where you would have Edge Cluster with multiple pairs of uplinks connected to separate vDS switches. One for VLAN based traffic and the other for VXLAN traffic. You cannot create NSX native L2 bridge instance between two vDS switches.

This important especially for the collocation use case mentioned at the beginning. In order to use the L2 bridge the customer VLAN must be connected to the Edge Cluster Top of the Rack pair of switches.

If this is not possible, as a workaround the service provider can use the L2 VPN option – it is even possible to run both L2 VPN Server and Client Edges on the same host connected through a transit VXLAN network where one Edge is connected to trunk with VLAN networks from one vDS and the other to trunk with VXLAN networks on another vDS. Unfortunately this has performance impact (even if NULL-MD5 encryption is used) and should be used only for temporary migration use cases.

L2VPN interfaces

L2VPN

Hardware VTEP

The last bridging option discussed is a new feature of NSX  6.2. It is possible to extend the VXLAN logical switch all the way to a compatible hardware device (switch from a VMware partner) that acts as Layer 2 gateway and bridges the logical switch with a VLAN network. The device performing the function of hardware VTEP is managed from NSX via OVSDB protocol while the control plane is still managed by NSX Controllers. More details are in the following white paper.

As this option requires new dedicated and NSX compatible switching hardware it is more useful for the permanent use cases.

Layer 2 VPN to the Cloud

When VMware NSX 6.0 came out about more than one year ago, one of the new great features it had on top of the its predecessor VMware vCloud Network and Security (vCNS) was L2VPN service on Edge Service Gateway which allows stretching layer 2 network segments between distant sites in different management domains. NSX 6.1 further enhanced the functionality by introducing Standalone Edge which can be deployed on top of vSphere without an NSX license and acts as L2VPN client.

Many vCloud Service Providers are now deploying their public cloud with vCloud Director and NSX instead of vCNS so I am often asked how could they leverage NSX in order to provide L2VPN service to their tenants.

As of today neither vCloud Director 5.6 nor 8.0 (in beta at the moment) can deploy NSX Edges and manage the L2VPN service. However it is still possible for the SP to provide L2VPN as a managed service for his tenants.

Let me demonstrate how would it work on the following artificial example.

The customer has an application that resides on 3 different VLAN based networks (subnet A, B and C) routed with existing physical router. He would like to extend subnets A and B into the cloud and deploy VMs there. The VMs in the cloud should access the internet in order to connect to external SaaS services through the provider connection (egress optimization) but should still be able to reach database running on subnet C which is hosted on premises.

The diagram below shows the whole architecture (click for larger version):

L2VPN to the Cloud

On the left is the customer on premises datacenter with physical router and three VLAN based networks. On the right is the public cloud with NSX design I proposed in one of my previous blog articles. While the unimportant parts are grayed out what is important is how the customer Org VDC and the NSX Edge Gateways is deployed.

  • The provider deploys tenant dedicated NSX Edge Service Gateway outside of vCloud Director manually and configures it based on customer requirements. The provider creates two logical switches (VXLAN 5005 and 5006) which will be used for extending customer subnets A and B. The switches are trunked to the NSX Edge and the Edge interface IP addresses are assigned identical to the IP addresses of the physical router on-premises (a.a.a.1 and b.b.b.1).
  • The two logical switches are configured in vCloud Director as External Networks with the proper subnets A and B and pool of unused IPs.
  • Two Org VDC networks are created inside tenant’s Org VDC as directly connected to the two External Networks.
  • L2VPN server is configured on the NSX Edge (with encryption algorithm, secret, certificate and stretched interfaces). Also Egress Optimization Gateway Address is configured (both physical gateway IPs are entered – a.a.a.1 and b.b.b.1). This will filter ARP replies of the two gateways sharing the same IPs in the tunnel and allow NSX Edge to act as the gateway to the internet.
  • The tenant will install Standalone Edge which is distributed as OVF inside his datacenter and set it up: he must map VLANs to tunnel IDs supplied by the provider, configure Edge Server public IP and port and encryption details.

Now what about the subnet C? How can VMs deployed in the cloud get to it if the physical router is unreachable due to the enabled egress optimization? Following trick is used:

  • Another subnet z.z.z.0/30 is used for P2P connection between the NSX Edge in the cloud and the physical router.
  • IP address z.z.z.1/30 is configured on the physical router on one of the stretched subnets (e.g. A).
  • The second IP z.z.z.2/30 is configured on the NSX Edge on the same subnet.
  • Finally static route is created on the NSX Edge pointing subnet C to the next hop address z.z.z.1.

Some additional considerations:

  • In case the tenant has licensed NSX on-premises he can obviously use ‘full’ NSX Edge Service Gateway. The advantages are that it is much easier to deploy and configure. It can also stretch VXLAN based networks as opposed to only VLANs which are supported by Standalone Edge.
  • Standalone Edge can be connected either to Standard Switch or Distributed Switch. When Standard Switch is used promiscuous mode and forged transmits must be enabled on the trunk port group. VLANs ID 4095 (all) must be configured to pass multiple VLANs.
  • When Distributed Switch is used it is recommended to use Sink Port instead of promiscuous mode. Sink Port receives traffic with MAC addresses unknown to vDS.
  • Sink Port creation is described here. It requires vDS edit via vCenter Managed Object UI. While Sink Port can be also created with net-dvs –EnableSink command directly on the ESXi host with Standalone Edge VM running it is not recommended as the host configuration can be overridden by vCenter Server.
  • RC4-MD4 encryption cipher should not be used as it is insecure and has been deprecated in NSX 6.2.