vCloud Networking and Security Upgrade to NSX in vCloud Director Environments

Just a short post to link (Edit 8/2/2016 read here for updated link) a new whitepaper I wrote about upgrade of vCloud Networking and Security to NSX in vCloud Director Environment.

It discusses:

  • interoperability and upgrade path
  • impact of network virtualization technologies (Nexus 1000V,VCDNI)
  • migration considerations
  • migrations scenario with minimal production impact

VMware vCloud Director® relies on VMware vCloud® Networking and Security or VMware NSX® for vSphere® to provide abstraction of the networking services. Until now, both platforms could be used interchangeably because they both provide the same APIs that vCloud Director uses to provide networks and networking services.
The vCloud Networking and Security platform end-of-support (EOS) date is 19 September 2016. Only NSX for vSphere will be supported with vCloud Director after the vCloud Networking and Security end-of-support date.
To secure the highest level of support and compatibility going forward, all service providers should migrate from vCloud Networking and Security to NSX for vSphere. This document provides guidance and considerations to simplify the process and to understand the impact of changes to the environment.
NSX for vSphere provides a smooth, in-place upgrade from vCloud Networking and Security. The upgrade process is documented in the corresponding VMware NSX Upgrade Guides (versions 6.0 , v6.1 , 6.2 ). This document is not meant to replace these guides. Instead, it augments them with specific information that applies to the usage of vCloud Director in service provider environments.

read more

 

Migration from Nexus 1000V to NSX in vCloud Director

A few service providers were asking me questions about migration possibilities from Nexus 1000V to NSX if using vCloud Director. I will first try to explain the process in high level and then offer a step by step approach.

Introduction

NSX (actually NSX for vSphere) is built on top of vSphere Distributed Switch which is extended by a few VMkernel modules and host level agents. It cannot run on top of Nexus 1000V. Therefore it is necessary to first migrate from Nexus 1000V to vSphere Distributed Switch (vDS) and then install NSX. The core management component of NSX is NSX Manager which runs as a virtual appliance. It is very similar in a sense to vShield or vCloud Networking and Security Manager and there is actually direct upgrade path from vCNS Manager to NSX Manager.

NSX is backward compatible with vCNS API and therefore it works without any problems with vCloud Director. However NSX advanced features (distributed logical router and firewall, dynamic routing protocols support on Edges, etc.) are not available via vCloud Director GUI or vCloud API. The only exception is multicast-less VXLAN which works with vCloud DIrector out of the box.

Migration from Nexus 1000V to vSphere Distributed Switch

In pure vSphere environment it is pretty easy to migrate from Nexus 1000V to vDS and can be done even without any VM downtime as it is possible to have on the same ESXi host both distributed switches (vDS and Nexus). However, this is not the case when vCloud Director with VXLAN technology is involved. VXLAN can be configured with per cluster granularity and therefore it is not possible to mix two VXLAN providers (vDS and Nexus) on the same cluster. This unfortunately means we cannot do live VM migration from Nexus to vDS as they are on different clusters and (live) vMotion does not work across two different distributed switches. Cold migration must be used instead.

Note that if VXLAN is not involved, live migration is possible which will be leveraged while migrating vCloud external networks.

Another point should be noted. We are going to mix two VXLAN providers in the same Provider VDC, meaning that VXLAN scope will span both Nexus 1000V and vSphere Distributed Switch. To my knowledge this is not recommended and supported although it works and both VTEP types can communicate with each other thanks to multicast controller plane. As we will be migrating VMs in powered off state it is not an issue and no communication will run over mixed VXLAN network during the migration.

High Level Steps

We will need two clusters. One legacy with Nexus 1000V and one empty for vDS. I will use two vDS switches, although one could be enough. One vDS is used for management, vMotion, NFS and VLAN traffic (vCloud external networks) the other vDS is used purely for VXLAN VM traffic.

We need to first migrate all VLAN based networks from Nexus 1000V to vDS1. Then we prepare the second cluster with vDS based VXLAN and create elastic Provider VDC which will contain both clusters (resouce pools). We will disable the Nexus resource pool and migrate all VMs, templates and Edges off of it to the new one – see my article vCloud Director: Online Migration of Virtual Data Center.

We will detach the old cluster remove and unistall Nexus 1000V and then create vDS with VXLAN instead and add it back to Provider VDC. After this we can upgrade the vCNS Manager to NSX Manager and prepare all hosts for NSX and also install NSX Controllers. We can optionally change the VXLAN transport mode from Multicast to Unicast/Hybrid. We however cannot upgrade vCloud Director deployed vShield Edges to NSX Edges as that would break its compatibility with vCloud Director.

Step-by-step Procedure

  1. Create new vDS switch. Migrate management networks (vMotion, NFS, management) and vCloud external networks to it (NAT rules on Edge Gateways using external network IPs will need to be removed and re-added).
    Prepare new cluster (Cluster2) and add it to the same vDS. Optionally create a new vDS for VXLAN networks (if the first one will not be used). Prepare VXLAN fabric on Cluster2 in vCNS Manager.
    Distributed switches
    VXLAN fabric before migration
    Nexus1000V – Cluster1 VXLAN only
    dvSwitch01 — Cluster 1 and Cluster 2 external networks, vMotion, management and NFS
    dvSwtich02 – Cluster2 VXLAN only
  2. Create new Provider VDC on new cluster (GoldVDCnew)
  3. Merge new Provider VDC with the old one. Make sure the new Provider VDC is primary! Disable the secondary (Cluster1) Resource Pool.
    Merging PVDCs
    PVDC Resource Pools
  4. Migrate VMs from Cluster1 to Cluster2 from within vCloud Director. VMs connected to a VXLAN network will need to be powered off as it is not possible to do live vMotion between two different distributed switches.
    VM Migration
  5. Migrate templates (move them to another catalog). As the Cluster1 is disabled the templates will be registered on hosts from Cluster2.
  6. Redeploy Edges from within vCloud Director (not vCNS Manager). Again the disabled Cluster1 will mean that the Edges will be registered on hosts from Cluster2.
  7. Detach (now empty) Resource Pool
    Detaching Resource Pool
  8. Rename Provider VDC to the original name (GoldVDC)
  9. Unprepare VXLAN fabric on Cluster1. Remove VXLAN vmknics on all hosts from Nexus1000V.
  10. Remove Nexus1000V switch from the (now empty) cluster and extend VXLAN vDS there (or create a new one if no L2 connectivity exists between clusters). Remove Nexus VEM VIBs from all hosts. Prepare VXLAN fabric on the cluster and add it to Provider VDC.
    VXLAN fabric after migration
  11. Upgrade vCNS Manager to NSX Manager with vCNS to NSX 6 Upgrade Bundle
    vCNS Manager upgrade
    NSX Manager
  12. Update hosts in vSphere Web Client > Networking and Security > Installation > Host Preparation. This action will require host reboot.
    Host preparation for NSX - beforeHost preparation for NSX - after
  13. (Optionally) install NSX controllers
    NSX Controller installation
  14. (Optionally) change VXLAN transport mode to Unicast/Hybrid.
    VXLAN transport mode

Upgrading ESXi host to vSphere 5.5 with Cisco Nexus 1000V

I have upgraded my vSphere lab cluster from ESXi 5.1 to 5.5. Even though my lab consists only of 2 hosts I wanted to use Update Manager orchestrated upgrade to simulate how it would be done in big enterprise or service provider environment with as little manual steps as possible.

As I use Cisco Nexus 1000V and vCloud Director following procedure was devised:

1. It is not recommended to put a host into maintenance mode without first disabling it in vCloud Director. The reason is that vCloud Director catalog media management can get confused by inaccessibility of a host due maintenance mode. However when using Update Manager it is not possible to orchestrate disabling a host before maintenance mode. Therefore I would recommend to do the whole upgrade operation during maintenance window when vCloud Director portal is not accessible to end-users.

2. I have a few custom vibs installed on the hosts. Cisco 1000V VEM vib, vcloud agent vib, VXLAN vib. Other common are NetApp NFS plugin or EMC PowerPath. This means a custom ESXi 5.5 image must be created first. This can be done quite easily in PowerCLI 5.5 Note VXLAN vib does not need to be included as it is installed automatically when host exits maintenance mode (similar to FDM HA vib).

3. Add necessary software depots (ESXi online, Cisco Nexus 1000V and vcloud-agent offline). vCloud Director agent vib can be downloaded from any cell at following location:/opt/vmware/vcloud-director/agent/vcloudagent-esx55-5.5.0-1280396.zip

Add-EsxSoftwareDepot https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml

Add-EsxSoftwareDepot .\VEM550-201308160108-BG-release.zip

Add-EsxSoftwareDepot .\vcloudagent-esx55-5.5.0-1280396.zip

5. Find the newest profile and clone it:

Get-EsxImageProfile | Sort-Object “ModifiedTime” -Descending | format-table -property Name,CreationTime

New-EsxImageProfile -CloneProfile ESXi-5.5.0-1331820-standard “ESXi-5.5.0-1331820-standard-VEM-vcloud” -vendor custom 

6. Get the names of all vibs and add those needed to the new profile

Get-EsxSoftwarePackage

Add-EsxSoftwarePackage -ImageProfile ESXi-5.5.0-1331820-standard-VEM-vcloud cisco-vem-v160-esx

Add-EsxSoftwarePackage -ImageProfile ESXi-5.5.0-1331820-standard-VEM vcloud-agent

7. Export profile to an iso image (this will take a while as we need to download about 300 MBs of data from the internet)

Export-EsxImageProfile -ImageProfile ESXi-5.5.0-1331820-standard-VEM-vcloud -ExportToIso ESXi-5.5.0-1331820-standard-VEM-vcloud.iso

8. Now we can upload the iso to Update Manager, create upgrade baseline and attach it to the cluster.

9. When I run “Scan for Updates” I received status “Incompatible”. VMware Update Manager release notes mention this:

The Incompatible compliance status is because of the way the FDM (HA) agent is installed on ESXi 5.x hosts. Starting with vSphere 5.0, the FDM agent is installed on ESXi hosts as a VIB. When a VIB is installed or updated on an ESXi host, a flag is set to signify that the bootbank on the host has been updated. Update Manager checks for this flag while performing an upgrade scan or remediation and requires this flag to be cleared before upgrading a host. The flag can be cleared by rebooting the host.

I rebooted the hosts and Scanned for Updates again this time without any issue. I was ready for upgrade.

10. The upgrade of my two hosts took about 50 minutes. It was nicely orchestrated by Update Manager and finished without any issues.

11. I still needed to upgrade the vcloud host agents from vCloud Director, but that could be automated with vCloud API (host is put into maintenance mode during this operation).

Nexus 1000V VEM Upgrade Issue

I was upgrading Nexus 1000V in my lab to the latest version 4.2(1)SV2(2.1). First you need to upgrade Nexus VSM (Virtual Supervisor Module) by uploading kickstart and system files to it and then installing them. This was done without any issues. The next step is to upgrade the VEM modules on every host connected to the Nexus 1000V switch.

I did not use Nexus in a while so I was surprised how Cisco changed the upgrade process. It has following steps:

  1. Network admin prepares the Nexus switch for upgrade by upgrading the VSM and notifying vSphere admin that the switch is ready to be upgraded by running following command on the VSM:
    vmware vem upgrade notify
  2. The vSphere admin sees the notification in the switch summary tab in the Networking inventory view and must click Apply Upgrade. He basically confirms that VMware vSphere Update Manager (VUM) is available and configured properly. (Btw you see the message incorrectly says vSphere Distributed Switch instead more generic Distributed vSwitch.)
    Are you ready for upgrade?
  3. The message above changes to “An Upgrade … is in progress” and network admin can proceed with the upgrade.
    Upgrade in process
  4. This is done by running following command on the VSM:
    vmware vem upgrade proceed
  5. Update Manager now tries to upgrade the VEMs on each host. You can observe the issued vCenter tasks (Enter maintenance mode, Check, Install, Exit Maintenance Mode)
  6. Network admin can monitor the status on VSM by running:
    show vmware vem upgrade status
  7. If all hosts are upgraded he will see:
    Nexus1000V-02# show vmware vem upgrade status

    Upgrade VIBs: System VEM Image

    Upgrade Status: Upgrade Complete in vCenter
    Upgrade Notification Sent Time: Sun Sep 15 19:05:18 2013
    Upgrade Status Time(vCenter): Sun Sep 15 19:06:42 2013
    Upgrade Start Time: Sun Sep 15 19:08:06 2013
    Upgrade End Time(vCenter): Sun Sep 15 19:09:14 2013
    Upgrade Error:
    Upgrade Bundle ID:
    VSM: VEM500-201306160100-BG
    DVS: VEM500-201306160100-BGand can finalize the upgrade by issuing

    vmware vem upgrade

    which will get rid of the upgrade in progress message from step #3.

However on my system step #5 was failing with an error: “The host returns esxupdate error code:99. An unhandled exception was encountered. Check the Update Manager log files and esxupdate log files for more details.”

Examining the esxupdate log files I could see that there was some problem with the new VEM vib file:

esxupdate: esxupdate: ERROR: ValueError: Cannot merge VIBs Cisco_bootbank_cisco-vem-v160-esx_4.2.1.2.2.1.0-3.1.1, Cisco_bootbank_cisco-vem-v160-esx_4.2.1.2.2.1.0-3.1.1 with unequal payloads attributes: ([<vmware.esximage.Vib.Payload object at 0x88374ec>], [<vmware.esximage.Vib.Payload object at 0x884632c>])

The problem was caused by conflict of the same VIB metadata obtained from the default CIsco repository (https://hostupdate.vmware.com/software/VUM/PRODUCTION/csco-main/csco-depot-index.xml) and directly from VSM. Unfortunately I do not know a way to remove a patch already in VUM database (other than hacking the database) so instead I did following:

  1. Disconnect VUM from the internet (so it cannot download Cisco patch metadata when it starts).
  2. Stop VUM service
  3. Purge VUM database (this will get rid of all patches, baselines, etc.). See KB 2043170
  4. Start VUM service and uncheck the custom Cisco download source in the VUM Download Settings configuration.
    VUM Download Settings
  5. Connect VUM to the internet again.

The installation will now proceeded without problems.

Note: As the actual VIB binary is downloaded by VUM from the VSM and not the internet so make sure that VUM has connectivity to VSM.

Rate Limiting of External Networks in vCloud Director and Nexus 1000V

There is a new feature in vCloud Director 5.1 which was requested a lot by service providers – configurable limits on routed external networks (for example Internet) for each tenant. Limits can be set both for incoming and outgoing directions by vCloud Administrator on tenant’s Edge Gateway.

Edge Rate Limit Configuration
Edge Rate Limit Configuration

However this feature only works with VMware vSphere distributed switch – it does not work with Cisco Nexus 1000V or VMware standard switch. Why? Although the feature is provided by the Edge Gateway, what is actually happening in the background is that vShield Manager instructs vCenter to create a traffic shaping policy on the distributed vswitch port used by the Edge VM.

vSphere Distributed Switch Traffic Shaping
vSphere Distributed Switch Traffic Shaping

Standard switch does not allow port specific traffic shaping and Nexus 1000V management plane (Virtual Supervisor Module) is not accessible by the vShield Manager/vCenter. The rate limit could be applied on the port of the Cisco switch manually, however any Edge redeploy operation, which is accessible by the tenant via GUI would deploy a new Edge and use different port on the virtual switch and tenant could thus easily disable the limit.

For the standard switch backed external network vCloud Director GUI will not even present the option to set the rate limit, however for the Nexus backed external network the operation will fail with similar error:

Cannot update edge gateway “ACME_GW”
java.util.concurrent.ExecutionException: com.vmware.vcloud.fabric.nsm.error.VsmException: VSM response error (10086): Traffic shaping policy can be set only for a Vnic connected to a vmware distributed virtual portgroup configured with static port binding. Invalid portgroup ‘dvportgroup-9781’.

Nexus 1000V Error
Nexus 1000V Error

Btw the rate limit can be set on the Edge (when not using vCloud Director) also via vShield Manager or its API – it is called Traffic Shaping Policy and configurable in the vSM > Edge > Configure > Interfaces > Actions menu.

vShield Manager Traffic Shaping
vShield Manager Traffic Shaping

Do not forget to consider this when designing vCloud Director environments and choosing the virtual switch technology.