vCloud Director Portal Access over IPv6

I got interesting question from a colleague if vCloud Director portal can be accessed over IPv6. I suspected the answer is yes so I had little bit of fun and did a quick test.

With NSX load balancer in front of my two VCD cells I created IPv6 VIPs for HTTP, HTTPs and VMware Remote Console (TCP 443) traffic and used the existing IPv4 pools. I also added these IPv6 addresses to my DNS servers so name resolution and certificates would work and was ready to test.

Load Balancer Virtual IPs

 

As I terminate SSL session on the LB and insert client IP into the header with X-Insert-For-HTTP I could observe both IPv6 and IPv4 clients in the vCloud Director logs:

Client coming from IPv6 fd13:5905:f858:e502::20:

2015-01-16 19:06:06,431 | SECURITY | pool-eventPublishing-4-thread-1 | SyslogEventPublisher           | Event [id=6869f13c-0643-4afc-b083-982ecc920341, timestamp=1421431566380, type=com/vmware/vcloud/event/session/login, serviceNamespace=com.vmware.vcloud, properties={
...
currentContext.user.clientIpAddress=fd13%3A5905%3Af858%3Ae502%3A%3A20,
entity.name=administrator,
currentContext.user.proxyAddress=10.0.1.1,

Client coming from IPv4 10.0.2.104:


2015-01-16 19:29:46,879 | SECURITY | pool-eventPublishing-4-thread-1 | SyslogEventPublisher | Event [id=6a414e3f-19e7-45c2-83b7-5e0a7d90758b, timestamp=1421432986823, type=com/vmware/vcloud/event/session/login, serviceNamespace=com.vmware.vcloud, properties={
...
currentContext.user.clientIpAddress=10.0.2.104,
entity.name=administrator,
currentContext.user.proxyAddress=10.0.1.1,

Where 10.0.1.1 is load balancer internal interface. Remote Console proxy and OVF Tool also work.

Promiscuous Portgroup Myth

Topic of promiscuous portgroup on virtual switch came up lately from different directions therefore I decided to summarize some information and also debunk one particular myth.

What is promiscuous port? This is what Wikipedia says:

In computer networking, promiscuous mode or promisc mode is a mode for a wired network interface controller (NIC) or wireless network interface controller (WNIC) that causes the controller to pass all traffic it receives to the central processing unit (CPU) rather than passing only the frames that the controller is intended to receive. This mode is normally used for packet sniffing that takes place on a router or on a computer connected to a hub (instead of a switch)…

So how can this be related to virtual environment and virtual switch? VMware KB article 1002934 sheds some light here:

By default, a guest operating system’s virtual network adapter only receives frames that are meant for it. Placing the guest’s network adapter in promiscuous mode causes it to receive all frames passed on the virtual switch that are allowed under the VLAN policy for the associated portgroup. This can be useful for intrusion detection monitoring or if a sniffer needs to analyze all traffic on the network segment.

So does this mean enabling promiscuous port on a vSwitch will make all vSwitch frames visible to VM connected to such port? So let’s step back and explain how VMware vSwitch works. The main difference from a physical switch is that it does not learn MAC addresses by observing passing traffic (and that is why you hear sometimes networking people saying it is not a real switch). It instead relies on the the information the hypervisor (VMkernel) provides about VM vNIC MAC addresses. It basically knows that all vSwitch non uplink ports are used only by VMs (with known MAC addresses). So a frame originating on a VM connected to vSwitch will be either delivered to the right port on the same host (if it is in the same VLAN and matches the destination MAC) or sent to uplink (usually trunk) port. There it is either flooded or switched to the right port by the physical switching infrastructure depending if it is unknown or known unicast and eventually delivered to the right host (if it belongs to a VM) or to a physical device.

So if you think about this behavior described above it should be clear that VM connected to a promiscuous port or portgroup will not see all the vSwitch traffic but only the traffic that is accessible on the host where the VM resides.

Let’s have a look at the following example with three virtual machines (VM A, VM B, VM C) each on different ESX host (Host A, Host B, Host C) connected via physical switch (Port A, Port B, Port C):

Three Host Diagram

 

All VMs are in the same portgroup of VMware vSphere Distributed Switch in VLAN 100, while the port of VM A is set as promiscuous.

So which traffic between VM B and C will and will not VM A see?

When VM B (with MAC B) will try to talk to VM C it will send broadcast ARP packet to find VM C MAC address (MAC C). The physical switch will see the broadcast frame coming to Port B, will note the MAC B address on VLAN 100 is behind this port and will flood the frame to all other ports. vSwitch on host A will get this frame and forward it to all ports on VLAN 100 (it is a broadcast frame) thus to VM A.
VM B will get the ARP request and reply to it with unicast reply from MAC C to MAC B. Physical switch will enter the MAC C into its MAC table noting it is behind port C on VLAN 100 and switch the frame to already learned location of MAC B – to port B. vSwitch will then deliver the frame to VM B. As you can see no frame was delivered to host A and therefore VM A will not see the reply.

Now when communication between VM B and VM C has been established they can start talking with each other and physical switch knowing locations of MAC B and MAC C will switch the frames only between ports B and C of hosts B and C. VM A will see nothing from this unicast communication.

After while the MAC table on the physical switch will expire (if it has shorter timeout than VM B or C ARP cache). In such case it will forget the location of MAC B or C and will flood frame to B (or C) to all VLAN 100 ports and only then VM A will get the frame as the flooded frame reached host A as well.

Broadcast and possibly multicast traffic from VM B or C will reach host A and thus VM A as well.

This should debunk the myth that promiscuous port can be used for packet sniffing. For that you need port mirroring.

There are however use cases for promiscuous port and these are related to the (non-)learning behavior of vSwitch. If VM A would like to see traffic for additional MAC address D which is not hardcoded to its vNIC promiscuous port is requirement. Examples of such use case are nested VMs (VM A is virtual ESXi host) or floating MAC for highly available load balancing VMs (MAC masquarade). As the MAC D responds to ARP requests the physical switch will learn that MAC D is behind port A and will deliver the frame properly. vSwitch on host A will then flood the traffic to all promiscuous ports in the VLAN on the host as it does not know otherwise where to deliver it. Read William Lam’s article how to improve efficiency of this through VMware Fling (VMkernel vib plugin) that gives vSwitch learning ability.

Nested VM

Troubleshooting Multicast with Linux

I was looking for lightweight tool which would help me with troubleshooting multicast on VXLAN transport network (underlay). While both vCNS and NSX have built in tools (pings of various sizes and broadcast packets) I needed something more flexibile where I could do arbitrary IGMP joins and leaves.

I used CentOS VM with one interface directly on transport network and software SMCRoute. This link contains binary package that works on RHEL/CentOS. Some other notes:

  • if you have multiple interfaces make sure the multicast is routed through the correct one:
    route add -net 224.0.0.0 netmask 240.0.0.0 dev eth0
  • I had to install also glibc package:

    yum -y install glibc.i686

  • Make sure the kernel supports multicast

    cat /boot/config-<kernel version> | grep CONFIG_IP_MULTICAST

  • Enable ICMP ECHO on broadcast/multicast

    sysctl net.ipv4.icmp_echo_ignore_broadcasts=0

  • Start the smcroute daemon first:
    smcroute -d

To join and leave a multicast group use -j and -l commands:
smcroute -j eth0 239.0.0.1
smcroute -l eth0 239.0.0.1

To check current memberships use:
netstat -ng

or

ip maddr

IGMP version can be changed with following command:

echo “2” > /proc/sys/net/ipv4/conf/eth0/force_igmp_version

Additional useful statistics about IGMP joins:

cat /proc/net/igmp

To see which hosts are member of particular IGMP group just ping it and see who replies:

[root@CentOS~]# ping 239.1.0.10
PING 239.1.0.10 (239.1.0.10) 56(84) bytes of data.
64 bytes from 1.1.0.1: icmp_seq=1 ttl=64 time=0.141 ms
64 bytes from 1.1.0.3: icmp_seq=1 ttl=64 time=0.256 ms (DUP!)

Hosts 1.1.0.1 and 1.1.0.3 replied to ping on 239.1.0.10 multicast group.

How To Change VXLAN VTEP MTU Size and Teaming Policy

One of my customers has configured VXLAN in vCloud Director environment and then created multiple Provider and Org VDCs and deployed virtual networks. Then we found out that MTU and teaming policy configuration was set up incorrectly. Redeployment of the whole environment would take too much time, fortunately there is a way to do this without rip and replace approach.

First little bit of background. VXLAN VTEPs are configured in vShield Manager or in NSX Manager (via vSphere Web Client plugin) on cluster/distributed switch level. vShield/NSX Manager creates one distributed switch port group with given parameters (VLAN, teaming policy) and then for each host added to the cluster creates VTEP vmknic (with configured MTU size and DHCP/IP Pool addressing scheme). This means that teaming policy can be easily changed directly at vSphere level by direct edit of the distributed switch port group and MTU size can be changed on each host VTEP vmknic. However every new host deployed into the VXLAN prepared cluster would still use the wrong MTU size set in vShield/NSX Manager. Note that as there can be only one VTEP port group per distributed switch, clusters sharing the same vSwitch need to have identical VTEP teaming policy and VLAN ID.

The actual vCNS/NSX Manager VTEP configuration can be changed via following REST API call:

PUT https://<vCNS/NSX Manager FQDN>/api/api/2.0/vdn/switches/<switch ID>

with the Body containing the new configuration.

Example using Firefox RESTClient plugin:

  1. Install Firefox RESTClient plugin.
  2. Make sure vCNS/NSX Manager certificate is trusted by Firefox.
  3. In Firefox toolbar click on RESTClient icon.
  4. Create authentication header: Authentication > Basic Authentication > enter vCNS/NSX Manager credentials
  5. Select GET method and in the URL enter https://<vCNS/NSX Manager FQDN>/api/2.0/vdn/switches
    VDS Contexts
  6. This will retrieve all vswitch contexts in vCNS/NSX domain. Find ID of the one you want to change and use it in the following GET call
  7. Select GET method and in the URL enter https://<vCNS/NSX Manager FQDN>/api/api/2.0/vdn/switches/<switch-ID>
    VDS Context
  8. Now copy the Response Body and paste it into the Request Body box. In the XML edit the parameters you want to change. In my case I have changed:
    <mtu>9000</mtu> to <mtu>1600</mtu> and
    <teaming>ETHER_CHANNEL</teaming> to <teaming>FAILOVER_ORDER</teaming>
  9. Change the metod to PUT and add a new header: Content-Type: application/xml.
    PUT Request
  10. Send the request. If everything went successfully we should get Status Code: 200 OK response.
    OK Response

Now we need in vSphere Client change MTU size of all existing hosts to the new value and also change the teaming policy on VTEP portgroup (in my case from Route based on IP hash to Use explicit failover order).

vCloud Network and Security (vShield Manager) supports following teaming policies:

  • FAILOVER_ORDER
  • ETHER_CHANNEL
  • LACP_ACTIVE
  • LACP_PASSIVE
  • LACP_V2

NSX adds following two teaming policies for multiple VTEP vmknics:

  • LOADBALANCE_SRCID
  • LOADBALANCE_SRCMAC

Update 9/22/2014

Existing VXLAN VNI portgroups (virtual wires) will use original teaming policy, therefore they need to be changed to match the new one as well.

When using FAILOVER_ORDER teaming policy there must be also specification of the uplinks in the XML. The uplinks should use the names as defined at the distributed switch level.

<teaming>FAILOVER_ORDER</teaming>
<uplinkPortName>Uplink 2</uplinkPortName>
<uplinkPortName>Uplink 1</uplinkPortName>

Update 4/1/2015

As mentioned in the comments below vCNS and NSX differ slightly in the API call. For NSX the correct call is:

PUT https://nsx01.fojta.com/api/2.0/switches

(without the switch-id at the end).

Rate Limiting of External Networks in vCloud Director and Nexus 1000V

There is a new feature in vCloud Director 5.1 which was requested a lot by service providers – configurable limits on routed external networks (for example Internet) for each tenant. Limits can be set both for incoming and outgoing directions by vCloud Administrator on tenant’s Edge Gateway.

Edge Rate Limit Configuration
Edge Rate Limit Configuration

However this feature only works with VMware vSphere distributed switch – it does not work with Cisco Nexus 1000V or VMware standard switch. Why? Although the feature is provided by the Edge Gateway, what is actually happening in the background is that vShield Manager instructs vCenter to create a traffic shaping policy on the distributed vswitch port used by the Edge VM.

vSphere Distributed Switch Traffic Shaping
vSphere Distributed Switch Traffic Shaping

Standard switch does not allow port specific traffic shaping and Nexus 1000V management plane (Virtual Supervisor Module) is not accessible by the vShield Manager/vCenter. The rate limit could be applied on the port of the Cisco switch manually, however any Edge redeploy operation, which is accessible by the tenant via GUI would deploy a new Edge and use different port on the virtual switch and tenant could thus easily disable the limit.

For the standard switch backed external network vCloud Director GUI will not even present the option to set the rate limit, however for the Nexus backed external network the operation will fail with similar error:

Cannot update edge gateway “ACME_GW”
java.util.concurrent.ExecutionException: com.vmware.vcloud.fabric.nsm.error.VsmException: VSM response error (10086): Traffic shaping policy can be set only for a Vnic connected to a vmware distributed virtual portgroup configured with static port binding. Invalid portgroup ‘dvportgroup-9781’.

Nexus 1000V Error
Nexus 1000V Error

Btw the rate limit can be set on the Edge (when not using vCloud Director) also via vShield Manager or its API – it is called Traffic Shaping Policy and configurable in the vSM > Edge > Configure > Interfaces > Actions menu.

vShield Manager Traffic Shaping
vShield Manager Traffic Shaping

Do not forget to consider this when designing vCloud Director environments and choosing the virtual switch technology.