VCDNI to VXLAN Migration

vCloud Network Isolation (VCDNI or VCNI) is legacy mechanism to create overlay logical networks independently from physical networking underlay. It was originally used in VMware vCenter Lab Manager (where it was known as Cross Host Fencing). vCloud Director offers it as one of many mechanisms for creation of logical networks (next to VXLAN, VLAN and port group backings). VCDNI uses VMware proprietary MAC-in-MAC encapsulation done by vCloud Agent running in ESXi host vmkernel.

It has been for some time superseded by VXLAN technology which is much more scalable, provides better performance and is industry standard technology. VXLAN network pools have been available in vCloud Director since version 5.1.

VCDNI is consumed by manual creation of a vCloud Network Isolation backed Network Pool that is mapped to an underlay VLAN network with up to 1000 logical networks for each pool (VLAN).

As a deprecated and obsolete technology it is no longer supported in vSphere 6.5 and vCloud Director 8.20 is the last release that will support such network pools. vCloud Director 8.20 also provides simple mechanism to perform low-disruption migrations for Org VDC and vApp networks to VXLAN backed networks. Such migration must be done before upgrade to vSphere 6.5 (see more in KB 2148381).

The migration can be performed via UI or API by system administrator with Org VDC granularity.

Migration via UI

  1. For an Org VDC using VCDNI network pool open in the System tab – Manager & Monitor, Org VDC properties (note that doing the same from Org tab will not work).
    org-vdc
  2. Go to Network Pool & Services tab and change VCDNI backed network pool to VXLAN backed one and click OK.
    network-pool
  3. Again open Network Pool & Services tab of the Org VDC. Migrate to VXLAN button will now appear.
    migrate-to-vxlan
  4. Click the button, confirm the message and start the migration.
    confirmation
  5. After while the Org VDC status will change from busy to ready and the migration is finished. Details (and possible errors) can be reviewed in the Recent Tasks of the Audit Log.
    audit-log

Migration with vCloud API

Org VDC network migration is triggered by single API POST call at the Org VDC level.

POST /api/admin/vdc/<org VDC UUID>/migrateVcdniToVxlan
Content Type: application/vnd.vmware.admin.vdcnitovxlanmigration+xml

The Process

The following happens in the background when migration is triggered for each VCDNI backed network in an Org VDC:

  1. ‘Dummy’ VXLAN logical switch is created
  2. All VMs connected to VCDNI network are reconnected to the new VXLAN logical switch
  3. Edge Gateways connected to VCDNI network are connected to the new VXLAN logical switch
  4. Org VDC/vApp network backing is changed in vCloud DB to use the new VXLAN logical switch
  5. Original VCDNI port group is deleted

Small network disruption is expected during VM and Edge Gateway reconnections. The following Recent Tasks picture from vSphere Client shows what is happening at vCenter Server level and how much time each task could take. In the example there was one Org VDC network and one vApp network migrated with VM1 and Edge Gateway ACME-GW2 involved.

vc-recent-tasks

Automate ESXi Host VTEP Default Gateway

As discussed in my older article VXLAN routed transport network requires to set default gateway of vxlan stack on each ESXi host. While NSX has concept of IP Pools which allows automatic VTEP configuration (including gateway), older vCloud Network and Security (vShield) technology does not have this feature and VTEP IP address must be configure via DHCP or manually.

Following quick and dirty PowerCLI script shows how this can be automated at cluster level:


$hosts = Get-Cluster Cluster1 |Get-VMHost
foreach($vihost in $hosts){
$esxcli = get-vmhost $vihost | Get-EsxCli
$vihost.name
$result=$esxcli.network.ip.route.ipv4.add("10.40.0.1","vxlan","default")
$esxcli.network.ip.interface.ipv4.get("","vxlan")|format-list IPv4Address
$esxcli.network.ip.route.ipv4.list("vxlan")|Format-Table
}

The script sets vxlan stack default gateway to 10.40.0.1 on each host in the cluster ‘Cluster1‘ and displays each host name, VTEP IP address and vxlan routing table.

VXLAN Routing Script

Credit for esxcli to PowerCLI command conversion goes to Virten.net.

 

 

VXLAN on Routed Transport Network

One of the major benefits of VXLAN technology is that it allows creating virtual Layer 2 segments over Layer 3 routed networks. VTEPs (VXLAN Tunnel End Points) encapsulate and decapsulate ethernet frames of VMs on virtual networks and send them as UDP packets. However there still must be a mechanism that provides ability for sending VTEPs to find the receiving VTEPs for broadcast, unknown unicast and multicast (BUM) traffic.

In NSX we can use multicast, hybrid and unicast modes. Hybrid and unicast modes leverage controller cluster that has knowledge of the entire VTEP topology. However in vCloud Network and Security (vCNS) we can use only multicast mode.

While setting up a multicast in a flat layer 2 network is very easy and only requires enabling IGMP snooping and querrier on the physical switch infrastructure, routed multicast is much harder. That is why hybrid and unicast modes that NSX provides are so useful. In unicast mode all BUM traffic is replicated by VTEPs. In hybrid mode, multicast is used in each L2 segment of the transport network while unicast is used to send for replication of the traffic to the other segments.

In my recent VXLAN deployment we however had to stick to pure multicast mode as we used vCNS. To route multicast traffic the physical router was enabled to use PIM-SM (Protocol Independent Multicast in Sparse Mode) with rendezvous point. However it turned out that setting up the VTEPs is not straightforward and not very well documented with some misinformation in blog post I found on the web.

Each VTEP needs to have an IP address assigned. In vCNS the assignment happens over DHCP protocol only,

Auto-assigned VTEP IP address

NSX provides next to DHCP also ability to use network pools. As we were using vCNS and had no DHCP servers in the VXLAN transport network we had to go into each host and manually assign the VTEP vmkernel port IP address through vSphere client. Unfortunately this is not enough for routed communication on the transport network. Default gateway in the VXLAN network stack must be defined.

Missing gateway

The default gateway must be added through ESXi CLI interface as can be seen in above screenshot it is not configurable via GUI. Originally we created a static route to the other segment, but that is not enough (actually not needed at all) and instead the default gateway must be defined with the following command.

esxcli network ip route ipv4  add -n default -g 1.1.1.1 -N vxlan

where 1.1.1.1 is the gateway IP address and vxlan is the networking stack.

The verification that gateway is set properly can be done with net-vdl2 -l command.

net-vdl2

 

Troubleshooting Multicast with Linux

I was looking for lightweight tool which would help me with troubleshooting multicast on VXLAN transport network (underlay). While both vCNS and NSX have built in tools (pings of various sizes and broadcast packets) I needed something more flexibile where I could do arbitrary IGMP joins and leaves.

I used CentOS VM with one interface directly on transport network and software SMCRoute. This link contains binary package that works on RHEL/CentOS. Some other notes:

  • if you have multiple interfaces make sure the multicast is routed through the correct one:
    route add -net 224.0.0.0 netmask 240.0.0.0 dev eth0
  • I had to install also glibc package:

    yum -y install glibc.i686

  • Make sure the kernel supports multicast

    cat /boot/config-<kernel version> | grep CONFIG_IP_MULTICAST

  • Enable ICMP ECHO on broadcast/multicast

    sysctl net.ipv4.icmp_echo_ignore_broadcasts=0

  • Start the smcroute daemon first:
    smcroute -d

To join and leave a multicast group use -j and -l commands:
smcroute -j eth0 239.0.0.1
smcroute -l eth0 239.0.0.1

To check current memberships use:
netstat -ng

or

ip maddr

IGMP version can be changed with following command:

echo “2” > /proc/sys/net/ipv4/conf/eth0/force_igmp_version

Additional useful statistics about IGMP joins:

cat /proc/net/igmp

To see which hosts are member of particular IGMP group just ping it and see who replies:

[root@CentOS~]# ping 239.1.0.10
PING 239.1.0.10 (239.1.0.10) 56(84) bytes of data.
64 bytes from 1.1.0.1: icmp_seq=1 ttl=64 time=0.141 ms
64 bytes from 1.1.0.3: icmp_seq=1 ttl=64 time=0.256 ms (DUP!)

Hosts 1.1.0.1 and 1.1.0.3 replied to ping on 239.1.0.10 multicast group.

How To Change VXLAN VTEP MTU Size and Teaming Policy

One of my customers has configured VXLAN in vCloud Director environment and then created multiple Provider and Org VDCs and deployed virtual networks. Then we found out that MTU and teaming policy configuration was set up incorrectly. Redeployment of the whole environment would take too much time, fortunately there is a way to do this without rip and replace approach.

First little bit of background. VXLAN VTEPs are configured in vShield Manager or in NSX Manager (via vSphere Web Client plugin) on cluster/distributed switch level. vShield/NSX Manager creates one distributed switch port group with given parameters (VLAN, teaming policy) and then for each host added to the cluster creates VTEP vmknic (with configured MTU size and DHCP/IP Pool addressing scheme). This means that teaming policy can be easily changed directly at vSphere level by direct edit of the distributed switch port group and MTU size can be changed on each host VTEP vmknic. However every new host deployed into the VXLAN prepared cluster would still use the wrong MTU size set in vShield/NSX Manager. Note that as there can be only one VTEP port group per distributed switch, clusters sharing the same vSwitch need to have identical VTEP teaming policy and VLAN ID.

The actual vCNS/NSX Manager VTEP configuration can be changed via following REST API call:

PUT https://<vCNS/NSX Manager FQDN>/api/api/2.0/vdn/switches/<switch ID>

with the Body containing the new configuration.

Example using Firefox RESTClient plugin:

  1. Install Firefox RESTClient plugin.
  2. Make sure vCNS/NSX Manager certificate is trusted by Firefox.
  3. In Firefox toolbar click on RESTClient icon.
  4. Create authentication header: Authentication > Basic Authentication > enter vCNS/NSX Manager credentials
  5. Select GET method and in the URL enter https://<vCNS/NSX Manager FQDN>/api/2.0/vdn/switches
    VDS Contexts
  6. This will retrieve all vswitch contexts in vCNS/NSX domain. Find ID of the one you want to change and use it in the following GET call
  7. Select GET method and in the URL enter https://<vCNS/NSX Manager FQDN>/api/api/2.0/vdn/switches/<switch-ID>
    VDS Context
  8. Now copy the Response Body and paste it into the Request Body box. In the XML edit the parameters you want to change. In my case I have changed:
    <mtu>9000</mtu> to <mtu>1600</mtu> and
    <teaming>ETHER_CHANNEL</teaming> to <teaming>FAILOVER_ORDER</teaming>
  9. Change the metod to PUT and add a new header: Content-Type: application/xml.
    PUT Request
  10. Send the request. If everything went successfully we should get Status Code: 200 OK response.
    OK Response

Now we need in vSphere Client change MTU size of all existing hosts to the new value and also change the teaming policy on VTEP portgroup (in my case from Route based on IP hash to Use explicit failover order).

vCloud Network and Security (vShield Manager) supports following teaming policies:

  • FAILOVER_ORDER
  • ETHER_CHANNEL
  • LACP_ACTIVE
  • LACP_PASSIVE
  • LACP_V2

NSX adds following two teaming policies for multiple VTEP vmknics:

  • LOADBALANCE_SRCID
  • LOADBALANCE_SRCMAC

Update 9/22/2014

Existing VXLAN VNI portgroups (virtual wires) will use original teaming policy, therefore they need to be changed to match the new one as well.

When using FAILOVER_ORDER teaming policy there must be also specification of the uplinks in the XML. The uplinks should use the names as defined at the distributed switch level.

<teaming>FAILOVER_ORDER</teaming>
<uplinkPortName>Uplink 2</uplinkPortName>
<uplinkPortName>Uplink 1</uplinkPortName>

Update 4/1/2015

As mentioned in the comments below vCNS and NSX differ slightly in the API call. For NSX the correct call is:

PUT https://nsx01.fojta.com/api/2.0/switches

(without the switch-id at the end).