vCloud OpenAPI – Large Payload Issue with Load Balancer

With vCloud Director version 9 new API (cloudapi) based on OpenAPI specification has been introduced next to the legacy based XML API. In vCloud Director 9.5 API Explorer enables consumption of the API directly from the vCloud UI endpoint (read here). Most of the new features are using this OpenAPI such as H5 UI branding, extensions, vRealize Orchestrator service integrations, Cross VDC networking and Roles management.

OpenAPI is very simple to use, JSON based with links provided in headers. However there might be some issues when load balancer with SSL termination is involved as due to the header or payload size the request response will not get through the load balancer.

One such issue is documented in the vCloud Director 9.5 release notes. Attempting to edit Global Rules in the new H5 UI will fail with an error:

unexpected character at line 1 column 1 of the JSON data.

In my case I am using NSX Edge Load balancer with SSL termination and below is the error screenshot:

There are multiple workarounds described in the release notes but actually none worked for me:

  • increasing header maximum at the Edge LB as described in KB 52553 did not help as the number of headers is not the only issue in the particular scenario – the body payload size is as well
  • limiting maximum page size in vCloud Director with cell-management-tool manage-config -n restapi.queryservice.maxPageSize -v 25 fixes the above API call but the subsequent call made by the UI ignores the setting and the response will not get through the LB again.

After some investigations and troubleshooting I discovered that there is a way to increase Edge LB buffer size above the default 32 KB with similar call to the one in the KB 52553:

PUT https://<NSX-Manager>/api/4.0/edges/<Edge-ID>/systemcontrol/config

<systemControl>
    <property>lb.global.tune.http.maxhdr=1024</property>
    <property>lb.global.tune.bufsize=65536</property>
</systemControl>

The above call (NSX 6.4) was enough to fix the issue for me and i can now edit Global Roles in the UI:

Advertisements

Layer 2 VPN to the Cloud – Part II

Almost 3 years ago I have published an article how to set up layer 2 VPN between on-prem vSphere environment and vCloud Director Org VDC.

As both vCloud Director and NSX evolved quite a bit since to simplify the whole set up, here comes the part II.

Let me first summarize the use case:

The tenant has an application that resides on 3 different VLAN based networks running in its own (vSphere) datacenter. The networks are routed with existing physical router. The tenant wants to extend 2 of these networks to cloud for cloud bursting or DR purposes, but not the 3rd one (for example because there runs a physical database server).

The following diagram shows the setup.

The main advancements are:

  • vCloud Director natively supports NSX L2 VPN (VCD 8.20 or newer needed).
  • NSX now (since 6.2) supports configuration of unstretched networks directly (no static routes are necessary anymore)
  • This means the full setup can be done by the tenant in self-service fashion

Here are the steps:

  • The tenant will deploy freely available NSX Standalone Edge in its datacenter connected to trunk port with 2 VLANs mapped (10 and 11). Additional network configuration is necessary (forged transmits and promiscuous mode or sink port creation – see the link)
  • In the cloud Org VDC tenant deploys two routed Org VDC networks with identical subnets and gateways as networks A and B. These networks must be connected to the Org VDC Edge GW via subinterface (there can be up to 200 such networks on single Edge). The Org VDC Edge must have advanced networking enabled.
  • Tenant enables and configures L2VPN server on its Org VDC Edge GW. Note that this is a premium feature that the service provider must enable in Organization first (see this blog post).
  • Before the L2VPN tunnel is established the following must be taken into account:
    • The Org VDC Edge GW IP addresses are identical with the on-prem existing physical router. Therefore Egress Optimization Gateway addresses must be entered in the Peer Site configuration. That will prevent the Org VDC Edge GW from sending ARP replies over the tunnel.
    • The same must be performed on the Standalone NSX Edge via CLI (see egress-optimize command here).
    • The non-stretched network (subnet C) must be configured on the Org VDC Edge GW so it knows that the subnet is reachable through the tunnel and not via its upstream interface(s). This option however is not in vCloud UI, instead vCloud networking API must be used.
      Edit 3/26/2018: This does not works for standalone NSX Edges. See the end of the article for more details.
      Alternatively the provider could configure non-stretched network directly in the NSX UI:
    • Finally, the tunnel can be established by configuring L2VPN server details on the on-prem Standalone NSX Edge L2VPN client (endpoint IP, port, credentials, encryption) and providing VLAN to tunnel mappings.
    • Note to find the Org VDC network subinterface tunnel mapping vCloud API must be used again:

Edit 3/26/2018:

After multiple questions regarding unstretched networks and some testing I need to make some clarifications.

The routing of unstretched networks through the tunnel is achieved via static routes configured on the Edge GW. So in principle it still works the same way as described in the original article, the difference doing it via UI/API is that the setting of the IPs and routes is automatic.

The server Edge routing table looks like this:


show ip route

S 0.0.0.0/0 [1/0] via 10.0.2.254
C 10.0.2.0/24 [0/0] via 10.0.2.121
C 169.254.64.192/26 [0/0] via 169.254.64.193
C 169.254.255.248/30 [0/0] via 169.254.255.249
C 192.168.100.0/24 [0/0] via 192.168.100.1
C 192.168.101.0/24 [0/0] via 192.168.101.1
S 192.168.102.0/24 [1/0] via 169.254.64.194

show ip address

...

17: vNic_4094@br-sub: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:50:56:88:31:21 brd ff:ff:ff:ff:ff:ff
inet 169.254.64.193/26 brd 169.254.64.255 scope global vNic_4094
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe88:3121/64 scope link
valid_lft forever preferred_lft forever

You can see that the 169.254.64.193 IP address was autoassigned to the 4096 tunnel interface and static route was set to route the unstretched network to the other side via IP 169.254.64.194. The assignment of the .194 address on the other Edge will happen only if that Edge is managed by NSX and is actually performing routing! This is in fact not true for the use case above (with standalone Edge and existing physical router). Therefore the following manual approach must be taken:

  1. Create Org VDC transit network with arbitrary small subnet (e.g. 169.254.200.0/29) in the cloud. Assign IP .1 as the gateway on the Org VDC Edge. This network will not be used for workloads, it is used just for routing to unstretched network.
  2. Create corresponding VLAN transit network on-prem. Assign IP .2 as its gateway interface on the existing router (note the IP addresses of the routing intefaces in #1 and #2 are different).
  3. Create L2 VPN tunnel as before, however also stretch the transit network but do not optimize its GW (no need as on-prem and cloud are using different IPs).
  4. Create static routes on the Org VDC Edge GW to route to on-prem unstretched networks  via the 169.254.200.2 transit network router IP.

Note that this is approach very similar to the original blog post. The only difference is that we must create separate transit network as vCloud Director does not support multiple subnets on the same Edge GW interface.

vCloud Director 9: NSX Distributed Logical Router

vCloud Director version 9 introduces support for the last major missing NSX feature – the distributed logical router (DLR). DLR provides optimized router which in distributed fashion performs routing between different logical switches in the hypervisor. The routing always happens in the hypervisor running the source VM which means that the traffic goes between maximum two ESXi hosts (source and destination) and no tromboning through third host running router VM is necessary. Read here for technical deep dive into how this works. This not only provides much better performance than traditional Edge GW routing, but also scales up to 1000 routed logical networks (as opposed to 10 on Edge GW or up to 209 if trunk port is enabled).

Generally, DLR should be used for routing only between VXLAN based logical switches, although NSX supports VLANs networks with certain caveats as well. Additionally dynamic routing protocols are supported as well and managed by Control VM of the DLR.

Now let’s look how vCloud Director implements DLR. The main focus was making DLR very simple to use and seamlessly integrate with the existing networking Org VDC concepts.

  • DLR is enabled on Org VDC Edge Gateway which must be already converted to advanced networking. You cannot use DLR without Org VDC Edge Gateway! There must be one free interface on the Edge (you will see later on why).
  • Once DLR is enabled, a logical DLR instance is created in NSX in headless mode without DLR Control VM (the instance is named in NSX vse-dlr-<GW name) (<UUID>)). vCloud Director can get away without Control VM as dynamic routing is not necessary – see later below.
  • The DLR instance uplink interface is connected to the Org VDC Edge GW with P2P connection using 10.255.255.248/30 subnet. The DLR uses .250 IP address and the Org VDC Edge GW uses .249. This subnet is hardcoded and cannot overlap with existing Org VDC Edge GW subnets. Obviously the Org VDC Edge GW needs at least one free interface.
  • DLR has default gateway set to the Org VDC Edge GW interface (10.255.255.249)
  • New Org VDC networks now can be created in the Org VDC with the choice to attach them to the Edge Gateway (as regular or subinterface in a trunk) or to attach them to the DLR instance.
    For each distributed Org VDC network a static route will be created on the Org VDC Edge Gateway to point to the DLR uplink interface. This means there is no need for dynamic routing protocols on the DLR instance.

    Static Routes on NSX Edge GW

In the diagram below is the networking topology of such setup.

In the example you can see three Org VDC networks. One (blue) traditional (10.10.10.0/24) attached directly to the Org VDC Edge GW and two (purple and orange) distributed (192.168.0.0/24 and 192.168.1.0/24) connected through the DLR instance. The P2P connection between Org VDC Edge GW and DLR instance is green.

  • DHCP relay agents are automatically configured on DLR instance for each distributed Org VDC network and point to DHCP Relay Server – the Org VDC Edge GW interface (10.255.255.249). To enable DHCP service for particular distributed Org VDC network, the DHCP Pool with proper IP Range just needs to be manually created on the Org VDC Edge Gateway. If Auto Configure DNS is enabled, DHCP will provide IP address of the Org VDC Edge P2P interface to the DLR instance.

    DHCP Configuration of DLR pools on the Edge GW

Considerations

  • Up to 1000 distributed Org VDC networks can be connected to one Org VDC Edge GW (one DLR instance per Org VDC Edge GW).
  • Some networking features (such as L2 VPN) are not supported on the distributed Org VDC networks.
  • VLAN based Org VDC networks cannot be distributed. The Org VDC must use VXLAN network pool.
  • IPv6 is not supported by DLR
  • vApp routed networks cannot be distributed
  • The tenant can override the automatic DHCP and static route configurations done by vCloud Director for distributed networks on the Org VDC Edge GW. The tenant cannot modify the P2P connection between the Edge and DLR instance.
  • Disabling DLR on Org VDC Edge Gateways is possible but all distributed networks must be removed before.
  • Both enabling and disabling DLR on Org VDC Edge Gateway are by default system administrator only operations. It is possible to grant these rights to a tenant with the granular RBAC introduced in vCloud Director 8.20.
  • DLR feature is in the base NSX license in the VMware Cloud Provider Program.

Edit 02/10/2017: Engineering (Abhinav Mishra) provided a way how to change P2P subnet between the Edge and DLR. Add the following property value with CMT:

$VCLOUD_HOME/bin/cell-management-tool manage-config -n gateway.dlr.default.subnet.cidr -v <subnet CIDR>

Example: $VCLOUD_HOME/bin/cell-management-tool manage-config -n gateway.dlr.default.subnet.cidr -v 169.254.255.248/30

No need for cell reboot.

Edit 03/10/2017: Existing Org VDC networks can be migrated from traditional to DLR or sub-interface based networks in all directions in non-disruptive way with running VMs attached.

 

vCloud Director 9: Create VXLAN Network Pool

vCloud Director uses Network Pools to create programmatically on-demand L2 networking segments for Org VDC and vApp networks. Network pools can be based on VLANs, VXLAN, port groups and legacy (deprecated) vCloud Network isolation (VCDNI) technology.

VXLAN Network Pool is recommended to be used as it scales the best. Until version 9, vCloud Director would create new VXLAN Network Pool automatically for each Provider VDC backed by NSX Transport Zone (again created automatically) scoped to cluster that belong to the particular Provider VDC. This would create multiple VXLAN network pools and potentially confusion which to use for a particular Org VDC.

In vCloud Director 9 we have the option to create our own VXLAN network pool backed by a NSX Transport Zone manually created and scoped to clusters we want to (and using any control plane mode).

During creation of Provider VDC we then have a choice to create a new VXLAN Network Pool (the legacy behavior) or use an existing one.

Advantages of the new feature are:

  • No more clutter of large amount of VXLAN network pools (if there are many Provider VDCs)
  • Simpler way to use hybrid or unicast control plane modes (vCloud Director would always default to multicast before)
  • Control over scope of VXLAN networks – especially useful for sharing Org VDC networks between Org VDCs from different Provider VDCs.
  • Adhering to best practice of scoping transport zone to whole vDS (more here)

vCloud Architecture Toolkit for Service Provider Update

The vCloud Architecture Toolkit for Service Provider website has been updated with new set of documents. All documents were re-branded with the new VMware Cloud Provider Program logos that replace the old vCloud Air Network brand.

My Architecting a VMware vCloud Director Solution for VMware Cloud Providers whitepaper has been refreshed to include vCloud Director 8.10 and 8.20 additions that were missing in the previous version. The current version of the document is 2.8 with August 2017 release date.

Here is summary of the new or updated topics:

  • Cell sizing
  • vCloud DB performance tips
  • New vCenter Chargeback Manager network metrics
  • vRealize Business for Cloud
  • vRealize Log Insight
  • vRealize Operations Manager
  • NSX Networking updates
  • Storage support
  • vCloud RBAC
  • Org VDC vSphere Resource Settings
  • VCDNI deprecation
  • New Org VDC Edge GW features
  • Distributed Firewall
  • VM Auto import
  • vCloud API for NSX
  • vCloud Director orchestrated upgrade

The document can be downloaded in PDF format or viewed online.