Edge Gateway Deployment Speed in vCloud Director 8.10

Edge GatewayIn vCloud Director 8.10 there is massive improvement in deployment (and configuration) speed of Edge Gateways. This is especially noticeable in use cases where large number of routed vApps are provisioned in as short time as possible – for example nightly builds for testing, or labs for training purposes. But this is also important for customer onboarding – time to login to cloud VM from the swipe of the credit card SLA.

Theory

How is the speed improvement achieved? It is actually not really vCloud Director accomplishment. The deployment and configuration of Edge Gateways were always done by vShield or NSX Manager. However, there is a big difference how vShield Manager and NSX Manager communicate with the Edge Gateway to push its configuration (IP addresses, NAT, firewall and other network services configurations).

As the Edge Gateway can be deployed to any network which can be completely isolated from any external traffic, its configuration cannot be done over the network and instead out-of-band communication channel must be used. vShield Manager always used VIX API (Guest Operations API) which involves communication with vCenter Server, hostd process on ESXi host hosting the Edge Gateway VM and finally VMware Tools running in the Edge Gateway VM (see this older post for more detail).

NSX Manager uses different mechanism. As long as the ESXi host is properly prepared for NSX, message bus communication between the NSX Manager and vsfwd user space process on the ESXi host is established. Additionally the configuration to the Edge Gateway VM is done via VMCI channel.

Prerequisites

There are necessary prerequisites to use the faster message bus communication as opposed to VIX API. If any of these is not fulfilled the communication mechanism fails back to VIX API.

  • The host running the Edge Gateway must be prepared for NSX. So if you are in vCloud Director using solely VLAN (or even VCDNI) backed network pools and you skipped the NSX preparation of underlying clusters, message bus communication cannot be used as the host is missing the NSX VIBs and vsfwd process.
  • The Edge Gateway must be version 6.x. It cannot be the legacy Edge version 5.5 deployed by older vCloud Director releases (8.0, 5.6, etc.). vCloud Director 8.10 deploys Edge Gateway version 6.x however existing Edges deployed before upgrade to 8.10 must be redeployed in vCloud Director or upgraded in NSX (read this whitepaper for a script to do it at once).
  • Obviously NSX Manager must be used (as opposed to vShield Manager) – anyway vCloud Networking and Security is not supported with vCloud Director 8.10 anymore.

Performance Testing

I have done quick proof of concept testing to see what is the relative improvement between the older and newer deployment mechanism.

I used 3 different combinations of the same environment (I was upgrading from one combination to the other).

  • vCloud Director 5.6.5 + vCloud Networking and Security 5.5.4
  • vCloud Director 8.0.1 + NSX 6.2.3 (uses legacy Edges)
  • vCloud Director 8.10 + NSX 6.2.3 (uses NSX Edges)

All 3 combinations used the same hardware and the same vSphere environment (5.5) with nested ESXi hosts. So the point is to look at the relative differences as opposed to absolute deployment times.

I measured in PowerCLI sequential deployment speed of 10 vApps with one isolated network and 10 vApps with one routed network with multiple runs to calculate average per one vApp. The first scenario was to measure differences in provisioning speeds of VXLAN logical switches to see impact of controller based control plane mode. The second includes provisioning of an Edge Gateway and logical switch. The vApps were otherwise empty (no VMs).

Note; If you want to do similar test in your environment, I captured the two empty vApps with only the routed or isolated networks to a catalog with vCloud API (PowerCLI) as it cannot be done from vCloud UI.

Here are the average deployment times of each vApp.

vCloud Director 5.6.5 + vCloud Networking and Security 5.5.4

  • Isolated 5-5.5 seconds
  • Routed 2:17 min

vCloud Director 8.0.1 + NSX 6.2.3

  • Isolated cca 6.8 seconds (Multicast), 7.5 seconds (Unicast)
  • Routed 2:20 min

vCloud Director 8.10 + NSX 6.2.3

  • Isolated 7.7 s (Multicast), 8.1 s (Unicast)
  • Routed 1:35 min

While the speed of logical switch provisioning goes little bit down with NSX and with Unicast control plane mode, the Edge Gateway deployment gets massive boost with NSX and VCD 8.10. While the OVF deployment of NSX Edge takes little bit longer (from 20 to 30 s) it is the configuration that makes up for it (from way over a minute down to about 30 s).

Just for comparison here are the tasks done during deployment of each routed vApp as reported by vSphere Client Recent Task window.

vCloud Director 5.6.5 + vCloud Networking and Security
vCloud Director 5.6.5 + vCloud Networking and Security
vCloud Director 8.10 + NSX 6.2.3
vCloud Director 8.10 + NSX 6.2.3

vCloud Networking and Security Upgrade to NSX in vCloud Director Environments – Update

vCNS to NSXIn April I wrote whitepaper describing all considerations that need to be taken when upgrading vCloud Networking And Security to NSX in vCloud Director Environments. I have updated the whitepaper to include additional information related to new releases:

  • updates related to vCloud Director 8.10 release
  • update related to VMware NSX 6.2.3 release
  • updates related to vCenter Chargeback Manager 2.7.1 release
  • NSX Edge Gateway upgrade script example
  • extended upgrade scenario to include vCloud Director 8.10

The whitepaper will be posted later this month on vCloud Architecture Toolkit for Service Providers website, until then it can be downloaded from the link below.

Edit 7/17/2016: New vCAT website is uphttp://www.vmware.com/solutions/cloud-computing/vcat-sp.html

VMware vCloud Networking and Security to VMware NSX Upgrade v2.1.pdf … link

vCloud Networking and Security Upgrade to NSX in vCloud Director Environments

Just a short post to link (Edit 8/2/2016 read here for updated link) a new whitepaper I wrote about upgrade of vCloud Networking and Security to NSX in vCloud Director Environment.

It discusses:

  • interoperability and upgrade path
  • impact of network virtualization technologies (Nexus 1000V,VCDNI)
  • migration considerations
  • migrations scenario with minimal production impact

VMware vCloud Director® relies on VMware vCloud® Networking and Security or VMware NSX® for vSphere® to provide abstraction of the networking services. Until now, both platforms could be used interchangeably because they both provide the same APIs that vCloud Director uses to provide networks and networking services.
The vCloud Networking and Security platform end-of-support (EOS) date is 19 September 2016. Only NSX for vSphere will be supported with vCloud Director after the vCloud Networking and Security end-of-support date.
To secure the highest level of support and compatibility going forward, all service providers should migrate from vCloud Networking and Security to NSX for vSphere. This document provides guidance and considerations to simplify the process and to understand the impact of changes to the environment.
NSX for vSphere provides a smooth, in-place upgrade from vCloud Networking and Security. The upgrade process is documented in the corresponding VMware NSX Upgrade Guides (versions 6.0 , v6.1 , 6.2 ). This document is not meant to replace these guides. Instead, it augments them with specific information that applies to the usage of vCloud Director in service provider environments.

read more

 

Reboot All Hosts in vCloud Director

Reboot RequiredvCloud Director based clouds support non-disruptive maintenance of the underlying physical hosts. They can be patched, upgraded or completely exchanged without any impact on the customer workloads all that thanks to vMotion and DRS Maintenance Mode which can evacuate all running, suspended or powered-off workloads from an ESXi host.

Many service providers are going to be upgrading their networking platform from vCloud Network and Security (vCNS) to NSX. This upgrade besides upgrading the Manager and deploying new NSX Controllers requires upgrade of all hosts with new NSX VIBs. This host upgrade results in the need to reboot every host in the service provider environment.

Depending on number of hosts, their size and vMotion network throughput evacuating each host can take 5-10 minutes and reboot can add additional 5 minutes. So for example sequential reboot of 200 hosts could result in full weekend long maintenance window. However, as I mentioned, these reboots can be done non-disruptively without any impact on customers – so no maintenance windows is necessary and no SLA is breached.

So how do you properly reboot all hosts in vCloud Director environment?

While vSphere maintenance mode helps, it is important to properly coordinate it with vCloud Director.

  • Before a host is put into a vSphere maintenance mode it should be disabled in vCloud Director which will make sure it does not try to communicate with the host for example for image uploads.
  • All workloads (not just running VMs) must be evacuated during the maintenance mode. A customer who decides to power on VM or clone a VM which is registered to a rebooting (and temporarily unavailable) host would be otherwise impacted.

So here is the correct process (omitting the parts that actually lead to the need to reboot the hosts):

  1. Make sure that cluster has enough capacity to temporarily run without 1 host (it is very common to have atleast N+1 HA redundancy)
  2. Disable host in vCloud Director
  3. Put host into vSphere maintenance mode while evacuating all running, suspended and powered-off VMs
  4. Reboot host
  5. When hosts comes up exit the maintenance mode
  6. Enable host
  7. Repeat with other hosts

As a quick proof of concept I am attaching a PowerCLI script that automates this. It needs to talk to both vCloud Director and vCenter Server therefore replace Connect strings at the beginning to match your environment.

## Connect to vCloud Director and all vCenter Servers it manages
Connect-CIServer -Server vcloud.gcp.local -User Administrator -Password VMware1!
Connect-VIServer -Server vcenter.gcp.local -User Administrator -Password VMware1!

$ESXiHosts = Search-cloud -QueryType Host
foreach ($ESXiHost in $ESXiHosts) {
	$CloudHost = Get-CIView -SearchResult $ESXiHost
	Write-Host
	Write-Host "Working on host" $CloudHost.Name
	Write-Host "Disabling host in vCloud Director"
	$CloudHost.Disable()
	Write-Host "Evacuating host"
	Set-VMHost $CloudHost.Name -State Maintenance -Evacuate | Out-Null
	Write-Host "Rebooting host"
	Restart-VMHost $CloudHost.Name -Confirm:$false | Out-Null
    Write-Host -NoNewline "Waiting for host to come online "
    do {
		sleep 15
		$HostState = (get-vmhost $CloudHost.Name).ConnectionState
		Write-Host -NoNewline "."
    }
    while ($HostState -ne "NotResponding")
    do {
		sleep 15
		$HostState = (get-vmhost $CloudHost.Name).ConnectionState
		Write-Host -NoNewline "."
    }
	while ($HostState -ne "Maintenance")
	Write-Host
	Write-Host "Host rebooted"
	Set-VMHost $CloudHost.Name -State Connected | Out-Null
	Write-Host "Enabling Host in vCloud Director"
	$CloudHost.Enable()
}

 

PowerCLI output

Edit 9/5/2016

Michael Belohaubek sent me his modified script that is more user friendly, takes into account VM Tools installation that could potentially block host maintenance mode and also HA Admission Control set to failover hosts. 

if(-not (Get-PSSnapin VMware.VimAutomation.Cloud -ErrorAction SilentlyContinue))
{
 Add-PSSnapin VMware.VimAutomation.Cloud -ErrorAction SilentlyContinue
}

if(-not (Get-PSSnapin VMware.VimAutomation.Core -ErrorAction SilentlyContinue))
{
 Add-PSSnapin VMware.VimAutomation.Core -ErrorAction SilentlyContinue
}

#Getting vCenter information from user
$ResourceVC = Read-Host "Please enter internal FQDN of vCloud Resource vCenter (e.g. gcp-atx-vc01.gcp.local)"
$ResourceVCcredentials = Get-Credential -Message "Please enter Resource vCenter Admin User (e.g. administator@vsphere.local)"
$vCenterConnection = Connect-VIServer -Server $ResourceVC -Credential $ResourceVCcredentials -ErrorAction SilentlyContinue

#check if connection to VC was working
if(-not $vCenterConnection.IsConnected){
 Write-Host "vCenter connection not established. Program exit."
 exit
}

#Ask for vCloud environment
$vCloudServer = Read-Host "Please enter internal FQDN of vCloud environment (e.g. gcp-atx-vcloud1.gcp.local)"
$vCloudCredential = Get-Credential -Message "Please enter vCloud local Admin (e.g. Administrator)"
$vCloudConnection = Connect-CIserver -server $vCloudServer -Org "system" -Credential $vCloudCredential

#check if connection to VC was working
if(-not $vCloudConnection.IsConnected){
 Write-Host "vCloud connection not established. Program exit"
 exit
}

$ResourceClusterInput = Read-Host "Please enter cluster name for rebooting all cluster hosts (e.g for VIB Upgrade)"

$ResourceCluster = Get-Cluster -Name $ResourceClusterInput -ErrorAction SilentlyContinue

#check if cluster read from user is existent
if(-not $ResourceCluster){
 Write-Host "Cluster specified not found. Exiting script."
 exit
}
else{
 
 $ConfirmReboots = Read-Host "All hosts of cluster " $ResourceClusterInput " (one by one) will get disabled in VCD, set to maintenance mode and rebooted. VMware Installers will be disconnected. Continue (y/n)"
 
 if($ConfirmReboots -eq "y")
 {
 #Get all VMHosts
 $ClusterHosts = Get-VMhost -Location $ResourceCluster.Name

 #Get Failover host
 $ClusterView = $ResourceCluster | Get-View
 $FailoverHostID = $ClusterView.Configuration.DasConfig.AdmissionControlPolicy.FailoverHosts
 
 #Disable Admission Control, to ensure that sufficient resource are available during reboot
 Set-Cluster -HAAdmissionControlEnabled $false -Cluster $ResourceCluster.Name -Confirm:$false
 Write-Host "Admission Control Disabled"

 #Foreach host in cluster
 foreach($VCHost in $ClusterHosts){
 
 #Get vCloud Host Entity and View
 $CloudHost = Search-Cloud -QueryType Host -Name $VCHost.Name
 $CloudHostView = Get-CIView -SearchResult $CloudHost
 Write-Host
 Write-Host "Working on host" $CloudHost.Name
 Write-Host "Disabling host in vCloud Director"
 #Disable host in VCD
 $CloudHostView.Disable()

 Write-Host "Checking for VMware Tools installations on host" $VCHost.Name

 #Check for VMs with VMware Tools installer mounted
 $VMsCDconnected = Get-VM -Location $VCHost | where { $_ | get-cddrive | where { $_.ConnectionState.Connected -eq "true" -and $_.IsoPath -like "*/usr/lib/vmware*"} }

 #If VMs exist with VMware Tools installer mounted, dismount them
 if($VMsCDconnected){
 foreach($VMumount in $VMsCDconnected ){
 Dismount-Tools -VM $VMumount
 Write-Host "Unmounted VMware Installer for VM " $VMumount
 }
 }

 #Enter Maintenance mode
 Write-Host "Evacuating host"
 Set-VMHost $VCHost.Name -State Maintenance -Evacuate | Out-Null
 #Reboot host and wait till reboot is complete
 Write-Host "Rebooting host"
 Restart-VMHost $VCHost.Name -Confirm:$false | Out-Null
 Write-Host -NoNewline "Waiting for host to come online "
 do
 {
 sleep 15 
 $HostState = (get-vmhost $VCHost.Name).ConnectionState
 Write-Host -NoNewline "."
 }
 while ($HostState -ne "NotResponding")
 do
 { 
 sleep 15
 $HostState = (get-vmhost $VCHost.Name).ConnectionState
 Write-Host -NoNewline "."
 }
 while ($HostState -ne "Maintenance")
 
 #Exit maintenance mode
 Write-Host "Host rebooted"
 Set-VMHost $VCHost.Name -State Connected | Out-Null
 #Enable host in VCD
 Write-Host "Enabling Host in vCloud Director"
 $CloudHostView.Enable()
 }

 #Get Failover Hosts
 $FailoverHosts = Get-VMHost | where {$FailoverHostID.Value -contains $_.Id.Replace("HostSystem-","") }
 
 #Enter maintenance mode to ensure that failover hosts are empty
 foreach($EvacHost in $FailoverHosts){
 Set-VMHost $EvacHost.Name -State Maintenance -Evacuate | Out-Null
 }

 #Enable Admission Control again
 Set-Cluster -HAAdmissionControlEnabled $true -Cluster $ResourceCluster.Name -Confirm:$false
 Write-Host "Admission Control Enabled"
 Write-Host "Evacute Failover Hosts"

 #Exit maintenance mode for failover hosts
 foreach($ConHost in $FailoverHosts){
 Set-VMHost $ConHost.Name -State Connected | Out-Null
 }
 }
}

#Close open connections
Disconnect-VIServer -Server $vCenterConnection -Confirm:$false
Disconnect-CIServer -Server $vCloudConnection -Confirm:$false 

RebootVCDHosts

Source NAT Rule for All Internal Networks in vCloud Director

In order to access external network resources from internal Org VDC networks Source NAT (SNAT) rule must be created on the Edge Gateway which translates internal IP address to a sub-allocated IP address of a particular external interface.

The internal source IP address can be entered in these formats:

  • Single IP address
  • Range of IP addresses
  • CIDR format

As you can see it is not possible to put ‘Any’ as it is with firewall rules configuration.

After investigating what would be the easiest option to use, this is what I found out:

In case where Edge Gateway is deployed by NSX Manager then it is possible to use following CIDR entry 0.0.0.0/0.

SNAT Rule

Unfortunately this is not working with Edge Gateway deployed by vShield Manager (vCNS) where Edge configuration fails with the following error:

…- java.util.concurrent.ExecutionException: com.vmware.vcloud.fabric.nsm.error.VsmException: VSM response error (15012): Invalid IP Address input ‘0.0.0.0/0’ for field ‘rules.natRulesDtos[4].originalAddress’.
– com.vmware.vcloud.fabric.nsm.error.VsmException: VSM response error (15012): Invalid IP Address input ‘0.0.0.0/0’ for field ‘rules.natRulesDtos[4].originalAddress’.
– VSM response error (15012): Invalid IP Address input ‘0.0.0.0/0’ for field ‘rules.natRulesDtos[4].originalAddress’.

The alternative is to use the following IP range: 0.0.0.1-255.255.255.253.