vCloud Availability – Orchestration with PowerCLI

vCloud Availability – Orchestration with PowerCLI

In my last post I have introduced vCloud Availability – a Disaster Recovery extension for vCloud Director and also have shown how service providers can monitor the key components of the solution.

Today I will show how tenants can orchestrate failover of their VMs to the cloud with PowerCLI without the need to have access to their on-prem vCenter Server which in case of disaster might be down.

vCloud Availability is extending vCloud APIs with new API calls that can be used to gather information about ongoing replications and provide fail over, test failover and cleanup test failover actions.

I have created the following PowerCLI cmdlets that demonstrate usage of the new APIs.

Get-VRReplication

The function returns one (based on its name) or all tenant configured replications, displays their status and also vCloud vApp identifier which is important for additional orchestration as I will show later below.

FailOver-VRReplication

This function needs an Replication object (which can be obtain with the previous command) as an input and performs real replication to the cloud. Additional parameters specifie if the recovered VM should be powered on or if the task should run asynchronously.

TestFailover-VRReplication

This is identical function to the previous one, however it performs only Test Failover to the cloud. This means replication still goes on and the recovered VM is connected to a test network in the cloud instead of the production one. Input parameters are identical.

TestCleanup-VRRreplication

This command cleans up replication that is in Test Recovery State.

Orchestration

As I hinted above, these new cmdlets can be combined with existing vCloud PowerCLI cmdlets to orchestrate complex workflows that require changes on the recovered VMs. As an example I am showing simple script that recovers one VM and changes its IP address. The code is heavily commented to explain each step.

#Replication name matches the protected VM name
$Name = "VM1"
#New IP addresses(s) - can be array of multiple entries, if VM has multiple NICs
[array] $IPAddress = '192.168.1.150'

#First we need to get the replication object
$Replication = Get-VRReplication $Name

#Now we can perform the failover however we will not power-on the VM yet.
FailOver-VRReplication -Replication $Replication -PowerOn $false

#Now we need to find the recovered VM within the vApp. There is always 1:1 relationship between vApp and recovery VM.
$VM = Get-CIVApp -Id $Replication.VappId | Get-CIVM

#We will go through all VM NICs and change their IP allocation mode from DHCP to manual and set new IP address
$i = 0
foreach ($NetworkAdapter in $VM|Get-CINetworkAdapter) {
 Set-CINetworkAdapter $NetworkAdapter -IPAddressAllocationMode Manual -IPAddress $IPAddress[$i]
 $i++
 }
 
#Guest customization must be enabled so new IP addresses are assigned within the Guest OS upon first boot 
$GuestCustomization = $VM.ExtensionData.Section | Where {$_.GetType() -like "*GuestCustomizationSection"}
$GuestCustomization.Enabled = $True
$Result = $GuestCustomization.UpdateServerData()

#We can finaly start the VM
Start-CIVM $VM -RunAsync:$true

And here are the PowerShell functions:

Function Get-VRReplication {
<#
.SYNOPSIS
Collects specific or all replications in particular organization
.DESCRIPTION
Collects specific or all replications in particular organization, their compliance status and other information
.EXAMPLE
PS C:\> Connect-CIServer -Org ACME
PS C:\> Get-VRReplication
PS C:\> Get-VRReplication VM1
.EXAMPLE
PS C:\> Get-VRReplication -Org ACME -Name VM1
.NOTES
Author: Tomas Fojta
#>
[CmdletBinding()]
param(
[Parameter(Mandatory=$false,Position=1)]
[String]$Name,
[Parameter(Mandatory=$false,Position=2)]
[String]$Org
)
if (-not $global:DefaultCIServers) {Connect-CIServer}
If ($Org -eq "") {$Org = $global:DefaultCIServers[0].Org}

$VRReplications = @()

$Uri = $global:DefaultCIServers[0].ServiceUri.AbsoluteUri + ((get-org $org).id).Replace("urn:vcloud:org:","org/") + "/replications"
$head = @{"x-vcloud-authorization"=$global:DefaultCIServers[0].SessionSecret} + @{"Accept"="application/*+xml;version=20.0;vr-version=3.0"}
$r = Invoke-WebRequest -URI $Uri -Method Get -Headers $head -ErrorAction:Stop
[xml]$sxml = $r.Content

foreach ($Replication in $sxml.References.Reference) {
$n = @{} | select Name,href,Rpo,ReplicationState,CurrentRpoViolation,TestRecoveryState,RecoveryState,VappId,VrServerInfo,ReplicaSpaceRequirements, Instance
$n.href = $Replication.href

$r = Invoke-WebRequest -URI $Replication.href -Method Get -Headers $head -ErrorAction:Stop
[xml]$sxml = $r.Content

$n.name = $sxml.ReplicationGroup.name
$n.Rpo = $sxml.ReplicationGroup.Rpo
$n.ReplicationState = $sxml.ReplicationGroup.ReplicationState
$n.CurrentRpoViolation = $sxml.ReplicationGroup.CurrentRpoViolation
$n.TestRecoveryState = $sxml.ReplicationGroup.TestRecoveryState
$n.RecoveryState = $sxml.ReplicationGroup.RecoveryState
$n.VappId = $sxml.ReplicationGroup.PlaceholderVappId
$n.VrServerInfo = $sxml.ReplicationGroup.VrServerInfo.Uuid

$VRReplications += $n
}
if ( $name ) {
 $VRReplications | ? { $_.name -eq $name }
 } else {
 $VRReplications}

}


Function FailOver-VRReplication {
<#
.SYNOPSIS
Fails over replicated VM in the cloud 
.DESCRIPTION
Fails over replicated VM in the cloud. The input must be replication object, power-on (default = true) and RunAsync (default false) booleans 
.EXAMPLE
PS C:\> Connect-CIServer -Org ACME
PS C:\> FailOver-VRReplication (Get-VRReplication VM1)
.EXAMPLE
PS C:\> FailOver-VRReplication -Replication (Get-VRReplication -Org ACME -Name VM1) -PowerOn:$False -RunAsync:$True
.NOTES
Author: Tomas Fojta
#>
[CmdletBinding()]
param(
[Parameter(Mandatory=$true,Position=1)]
$Replication,
[Parameter(Mandatory=$false,Position=2)]
[boolean]$PowerOn=$true,
[Parameter(Mandatory=$false,Position=3)]
[boolean]$RunAsync=$false
)

if (-not $global:DefaultCIServers) {Connect-CIServer}
If ($Org -eq "") {$Org = $global:DefaultCIServers[0].Org}



$Uri = $Replication.href + "/action/failover"
$head = @{"x-vcloud-authorization"=$global:DefaultCIServers[0].SessionSecret} + @{"Accept"="application/*+xml;version=20.0;vr-version=4.0"} + @{"Content-Type"="application/vnd.vmware.hcs.failoverParams+xml"}

if ($PowerOn -eq $False) {$body = 
'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:FailoverParams xmlns="http://www.vmware.com/vcloud/v1.5" xmlns:ns2="http://www.vmware.com/vr/v6.0" xmlns:ns3="http://schemas.dmtf.org/ovf/envelope/1" xmlns:ns4="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_VirtualSystemSettingData"
xmlns:ns5="http://schemas.dmtf.org/wbem/wscim/1/common" xmlns:ns6="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSettingData"
xmlns:ns7="http://schemas.dmtf.org/ovf/environment/1">
<ns2:PowerOn>false</ns2:PowerOn>
</ns2:FailoverParams>'} else
{$body = 
'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:FailoverParams xmlns="http://www.vmware.com/vcloud/v1.5" xmlns:ns2="http://www.vmware.com/vr/v6.0" xmlns:ns3="http://schemas.dmtf.org/ovf/envelope/1" xmlns:ns4="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_VirtualSystemSettingData"
xmlns:ns5="http://schemas.dmtf.org/wbem/wscim/1/common" xmlns:ns6="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSettingData"
xmlns:ns7="http://schemas.dmtf.org/ovf/environment/1">
<ns2:PowerOn>true</ns2:PowerOn>
</ns2:FailoverParams>'}

$r = Invoke-WebRequest -URI $Uri -Method Post -Headers $head -Body $Body -ErrorAction:Stop
[xml]$sxml = $r.Content

$Uri = $sxml.Task.href
if ($RunAsync -eq $false) {
 Do
 {
 $r = Invoke-WebRequest -URI $Uri -Method Get -Headers $head -ErrorAction:Stop
 [xml]$sxml = $r.Content
 $Progress = $sxml.Task.Progress
 Write-Progress -Activity FailOver -Status 'Progress->' -PercentComplete $Progress
 Start-Sleep -s 5
 } until ($Progress -eq 100)
 }
}

Function TestFailOver-VRReplication {
<#
.SYNOPSIS
Performs test failover of replicated VM in the cloud 
.DESCRIPTION
Performs test failover replicated VM in the cloud. The input must be replication object, power-on (default = true) and RunAsync (default false) booleans
.EXAMPLE
PS C:\> Connect-CIServer -Org ACME
PS C:\> TestFailOver-VRReplication (Get-VRReplication VM1) -PowerOn:$False
.EXAMPLE
PS C:\> TestFailOver-VRReplication -Replication (Get-VRReplication -Org ACME -Name VM1) -RunAsync:$True
.NOTES
Author: Tomas Fojta
#>
[CmdletBinding()]
param(
[Parameter(Mandatory=$true,Position=1)]
$Replication,
[Parameter(Mandatory=$false,Position=2)]
[boolean]$PowerOn=$true,
[Parameter(Mandatory=$false,Position=3)]
[boolean]$RunAsync=$false
)

if (-not $global:DefaultCIServers) {Connect-CIServer}
If ($Org -eq "") {$Org = $global:DefaultCIServers[0].Org}



$Uri = $Replication.href + "/action/testFailover"
$head = @{"x-vcloud-authorization"=$global:DefaultCIServers[0].SessionSecret} + @{"Accept"="application/*+xml;version=20.0;vr-version=4.0"} + @{"Content-Type"="application/vnd.vmware.hcs.failoverParams+xml"}

if ($PowerOn -eq $False) {$body = 
'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:TestFailoverParams
xmlns="http://www.vmware.com/vcloud/v1.5"
xmlns:ns2="http://www.vmware.com/vr/v6.0"
xmlns:ns3="http://schemas.dmtf.org/ovf/envelope/1"
xmlns:ns4="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_VirtualSystemSettingData"
xmlns:ns5="http://schemas.dmtf.org/wbem/wscim/1/common"
xmlns:ns6="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSettingData"
xmlns:ns7="http://schemas.dmtf.org/ovf/environment/1">
<ns2:PowerOn>false</ns2:PowerOn>
<ns2:Synchronize>false</ns2:Synchronize>
</ns2:TestFailoverParams>'} else
{$body = 
'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:TestFailoverParams
xmlns="http://www.vmware.com/vcloud/v1.5"
xmlns:ns2="http://www.vmware.com/vr/v6.0"
xmlns:ns3="http://schemas.dmtf.org/ovf/envelope/1"
xmlns:ns4="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_VirtualSystemSettingData"
xmlns:ns5="http://schemas.dmtf.org/wbem/wscim/1/common"
xmlns:ns6="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSettingData"
xmlns:ns7="http://schemas.dmtf.org/ovf/environment/1">
<ns2:PowerOn>true</ns2:PowerOn>
<ns2:Synchronize>false</ns2:Synchronize>
</ns2:TestFailoverParams>'}

$r = Invoke-WebRequest -URI $Uri -Method Post -Headers $head -Body $Body -ErrorAction:Stop
[xml]$sxml = $r.Content

$Uri = $sxml.Task.href
if ($RunAsync -eq $false) {
 Do
 {
 $r = Invoke-WebRequest -URI $Uri -Method Get -Headers $head -ErrorAction:Stop
 [xml]$sxml = $r.Content
 $Progress = $sxml.Task.Progress
 Write-Progress -Activity 'Test FailOver' -Status 'Progress->' -PercentComplete $Progress
 Start-Sleep -s 5
 } until ($Progress -eq 100)
 }
}

Function TestCleanup-VRReplication {
<#
.SYNOPSIS
Performs test replication cleanup in the cloud.
.DESCRIPTION
Performs test replication cleanup in the cloud. The input must be replication object and optional RunAsync (default false) boolean.
.EXAMPLE
PS C:\> Connect-CIServer -Org ACME
PS C:\> TeastCleanup-VRReplication (Get-VRReplication VM1)
.EXAMPLE
PS C:\> TestCleanup-VRReplication -Replication (Get-VRReplication -Org ACME -Name VM1) -RunAsync:$True
.NOTES
Author: Tomas Fojta
#>
[CmdletBinding()]
param(
[Parameter(Mandatory=$true,Position=1)]
$Replication,
[Parameter(Mandatory=$false,Position=2)]
[boolean]$RunAsync=$false
)

if (-not $global:DefaultCIServers) {Connect-CIServer}
If ($Org -eq "") {$Org = $global:DefaultCIServers[0].Org}



$Uri = $Replication.href + "/action/testCleanup"
$head = @{"x-vcloud-authorization"=$global:DefaultCIServers[0].SessionSecret} + @{"Accept"="application/*+xml;version=20.0;vr-version=4.0"} + @{"Content-Type"="application/vnd.vmware.hcs.failoverParams+xml"}

$body = 
'<?xml version="1.0" encoding="UTF-8" standalone="yes"?><ns2:SyncParams
xmlns="http://www.vmware.com/vcloud/v1.5"
xmlns:ns2="http://www.vmware.com/vr/v6.0"
xmlns:ns3="http://schemas.dmtf.org/ovf/envelope/1"
xmlns:ns4="http://schemas.dmtf.org/wbem/wscim/1/cimschema/
2/CIM_VirtualSystemSettingData"
xmlns:ns5="http://schemas.dmtf.org/wbem/wscim/1/common"
xmlns:ns6="http://schemas.dmtf.org/wbem/wscim/1/cimschema/
2/CIM_ResourceAllocationSettingData"
xmlns:ns7="http://schemas.dmtf.org/ovf/environment/1">
<ns2:RepeatOngoingOnlineSync>false</ns2:RepeatOngoingOnlineSync>
</ns2:SyncParams>'

$r = Invoke-WebRequest -URI $Uri -Method Post -Headers $head -Body $Body -ErrorAction:Stop
[xml]$sxml = $r.Content

$Uri = $sxml.Task.href
if ($RunAsync -eq $false) {
 Do
 {
 $r = Invoke-WebRequest -URI $Uri -Method Get -Headers $head -ErrorAction:Stop
 [xml]$sxml = $r.Content
 $Progress = $sxml.Task.Progress
 Write-Progress -Activity 'Test Cleanup' -Status 'Progress->' -PercentComplete $Progress
 Start-Sleep -s 5
 } until ($Progress -eq 100)
 }
}
Advertisements

Query ESXi Hosts Serial Numbers

I was asked by our IT department to provide serial numbers of our lab servers. Fortunately this can be done remotely with esxcfg-info CLI command and can be automated with William Lam (@lamw) PowerCLI function Get-Esxcfginfo. I just had to find the right entry in the xml file returned by the function.

Here is the script I used:

<#
.SYNOPSIS Remoting collecting esxcfg-info from an ESXi host using vCenter Server
.NOTES Author: William Lam
.NOTES Site: www.virtuallyghetto.com
.NOTES Reference: http://www.virtuallyghetto.com/2016/06/using-the-vsphere-api-to-remotely-collect-esxi-esxcfg-info.html
.PARAMETER Vmhost
 ESXi host
.EXAMPLE
 PS> Get-VMHost -Name "esxi-1" | Get-Esxcfginfo
#>

Function Get-Esxcfginfo {
 param(
 [Parameter(
 Position=0,
 Mandatory=$true,
 ValueFromPipeline=$true,
 ValueFromPipelineByPropertyName=$true)
 ]
 [VMware.VimAutomation.ViCore.Impl.V1.Inventory.InventoryItemImpl[]]$VMHost
 )

 $sessionManager = Get-View ($global:DefaultVIServer.ExtensionData.Content.sessionManager)

 # URL to the ESXi esxcfg-info info
 $url = "https://" + $vmhost.Name + "/cgi-bin/esxcfg-info.cgi?xml"

 $spec = New-Object VMware.Vim.SessionManagerHttpServiceRequestSpec
 $spec.Method = "httpGet"
 $spec.Url = $url
 $ticket = $sessionManager.AcquireGenericServiceTicket($spec)

 # Append the cookie generated from VC
 $websession = New-Object Microsoft.PowerShell.Commands.WebRequestSession
 $cookie = New-Object System.Net.Cookie
 $cookie.Name = "vmware_cgi_ticket"
 $cookie.Value = $ticket.id
 $cookie.Domain = $vmhost.name
 $websession.Cookies.Add($cookie)

 # Retrieve file
 $result = Invoke-WebRequest -Uri $url -WebSession $websession -ContentType "application/xml"
 
 # cast output as an XML object
 return [ xml]$result.content
}

Connect-VIServer -Server xxx.gcp.local -User administrator@vsphere.local -password VMware1! | Out-Null

$hosts = Get-VMHost

foreach ($ESXhost in $hosts)
{
$xmlResult = $ESXhost | Get-Esxcfginfo
Write-Host $ESXhost.name ($xmlResult.host.'hardware-info'.value[3].'#text')
}

Disconnect-VIServer * -Confirm:$false

host-serials

PowerCLI Stops Working After NSX 6.2.4 Upgrade

NSX LBAs of NSX 6.2.3 TLS 1.0 support is deprecated on Edge Service Gateways. So if you are using load balancer with SSL offload, TLS 1.0 ciphers are no longer being supported and those clients that rely on them will not work anymore.

The supported ciphers can be easily checked with nmap. Here is nmap output to website behind NSX Edge 6.2.2 and 6.2.4 load balancer:

NSX 6.2.2 with TLS 1.0
NSX 6.2.2 with TLS 1.0
NSX 6.2.4 without TLS 1.0
NSX 6.2.4 without TLS 1.0

In my case PowerCLI stopped working and could not connect anymore to vCloud Director endpoint behind the Edge load balancer. The error was not very descriptive: The underlying connection was closed: An unexpected error occurred on a send.

PowerCLI Error
PowerCLI Error

Fortunately, it is possible to force PowerCLI to use TLS 1.1/1.2 by editing Windows Registry as described in the KB article: Enabling the TLSv1.1 and TLSv1.2 protocols for PowerCLI (2137109).

 

 

 

Gathering Health Status of vCloud Director Edge Gateways

Some time ago I wrote about how to monitor health of NSX Edge Gateways. In this blog post I will show how to get health and other info about vCloud Director Edge Gateways with PowerCLI.

PowerCLI already includes vCloud Director related cmdlets, unforunatelly there is none related to Edge Gateways. This can be easily remediated by using vCloud API however to get detailed information about Edge health we must use NSX API. As of vCloud Director 8.0 the service provider can easily get NSX Edge ID which is backing up particular vCloud Director Edge as a new type GatewayBacking was added.

What follows is an example of function that collects as much information as possible (interfaces, network services, size, syslog, default gateway, health of all services, Org, Org VDC and Provider VDC) about all Edge Gateways from PowerCLI, vCloud API and NSX API.

Note: there is dependency on the Get-NSXEdgeHealth function.

function Get-CIEdgeGateways {
<# .SYNOPSIS Gathers Edge Gateways from vCloud Director and all info through PowerCLI, vCloud API and NSX API .DESCRIPTION Will inventory all of your vCloud Director Edge Gateways .NOTES Author: Tomas Fojta #>
	[CmdletBinding()]
	param(
	[Parameter(Mandatory=$true,Position=0)]
	[String]$NSXManager,
	[Parameter(Mandatory=$false,Position=1)]
	[String]$NSXUsername = "admin",
	[Parameter(Mandatory=$true)]
	[String]$NSXPassword
	)

	$output = @();
	$EdgeGWs = Search-Cloud -QueryType EdgeGateway

	Foreach ($Edge in $EdgeGWs) {
		$Edgeview = $Edge | get-ciview
		$Vdc = get-OrgVdc -Id ($Edge.PropertyList.Vdc) -ErrorAction SilentlyContinue
		$webclient = New-Object system.net.webclient
		$webclient.Headers.Add("x-vcloud-authorization",$Edgeview.Client.SessionKey)
		$webclient.Headers.Add("accept",$EdgeView.Type + ";version=9.0")
		[xml]$EGWConfXML = $webclient.DownloadString($EdgeView.href)
		$n = "" | Select Name,Description,EdgeBacking,Interfaces,Firewall,NAT,LoadBalancer,DHCP,VPN,Routing,Syslog,Size,HA,DNSRelay,DefaultGateway,AdvancedNetworking, Org, TenantId, OrgVDC, OrgVDCId, ProviderVDC, ProviderVDCId, Health
		$n.Name = $EGWConfXML.EdgeGateway.Name
		$n.Description = $EGWConfXML.EdgeGateway.Description
		$n.EdgeBacking = $EGWConfXML.EdgeGateway.GatewayBackingRef.gatewayId
		$n.Interfaces = $EGWConfXML.EdgeGateway.Configuration.GatewayInterfaces.GatewayInterface
		$n.Firewall = $EGWConfXML.EdgeGateway.Configuration.EdgegatewayServiceConfiguration.FirewallService.FirewallRule
		$n.NAT = $EGWConfXML.EdgeGateway.Configuration.EdgegatewayServiceConfiguration.NatService.NatRule
		$n.LoadBalancer = $EGWConfXML.EdgeGateway.Configuration.EdgegatewayServiceConfiguration.LoadBalancerService.VirtualServer	
		$n.DHCP = $EGWConfXML.EdgeGateway.Configuration.EdgegatewayServiceConfiguration.GatewayDHCPService.Pool
		$n.VPN = $EGWConfXML.EdgeGateway.Configuration.EdgegatewayServiceConfiguration.GatewayIpsecVpnService
		$n.Routing = $EGWConfXML.EdgeGateway.Configuration.EdgeGatewayServiceConfiguration.StaticRoutingService
		$n.Syslog = $EGWConfXML.EdgeGateway.Configuration.SyslogServerSettings.TenantSyslogServerSettings.SyslogServerIp
		$n.Size = $EGWConfXML.EdgeGateway.Configuration.GatewayBackingConfig
		$n.HA = $EGWConfXML.EdgeGateway.Configuration.HaEnabled
		$n.DNSRelay = $EGWConfXML.EdgeGateway.Configuration.UseDefaultRouteForDnsRelay
		Foreach ($Interface in $n.Interfaces) {
			if ($Interface.UseForDefaultRoute -eq 'true') {$n.DefaultGateway = $Interface.SubnetParticipation.Gateway}
			}
		$n.AdvancedNetworking= $EGWConfXML.EdgeGateway.Configuration.HaEnabled = $EGWConfXML.EdgeGateway.Configuration.AdvancedNetworkingEnabled
		$n.Org = $Vdc.Org.Name
		$n.TenantId = $Vdc.Org.Id.Split(':')[3]
		$n.OrgVDC = $Vdc.Name
		$n.OrgVDCId = $Vdc.Id.Split(':')[3]
		$n.ProviderVDC = $Vdc.ProviderVDC.Name
		$n.ProviderVDCId = $Vdc.ProviderVDC.Id.Split(':')[3]
		$n.Health = Get-NSXEdgeHealth -NSXManager $NSXManager -Username $NSXUsername -Password $NSXPassword -EdgeID ($n.EdgeBacking)
		$Output += $n
		}
	return $Output
}

Reboot All Hosts in vCloud Director

Reboot RequiredvCloud Director based clouds support non-disruptive maintenance of the underlying physical hosts. They can be patched, upgraded or completely exchanged without any impact on the customer workloads all that thanks to vMotion and DRS Maintenance Mode which can evacuate all running, suspended or powered-off workloads from an ESXi host.

Many service providers are going to be upgrading their networking platform from vCloud Network and Security (vCNS) to NSX. This upgrade besides upgrading the Manager and deploying new NSX Controllers requires upgrade of all hosts with new NSX VIBs. This host upgrade results in the need to reboot every host in the service provider environment.

Depending on number of hosts, their size and vMotion network throughput evacuating each host can take 5-10 minutes and reboot can add additional 5 minutes. So for example sequential reboot of 200 hosts could result in full weekend long maintenance window. However, as I mentioned, these reboots can be done non-disruptively without any impact on customers – so no maintenance windows is necessary and no SLA is breached.

So how do you properly reboot all hosts in vCloud Director environment?

While vSphere maintenance mode helps, it is important to properly coordinate it with vCloud Director.

  • Before a host is put into a vSphere maintenance mode it should be disabled in vCloud Director which will make sure it does not try to communicate with the host for example for image uploads.
  • All workloads (not just running VMs) must be evacuated during the maintenance mode. A customer who decides to power on VM or clone a VM which is registered to a rebooting (and temporarily unavailable) host would be otherwise impacted.

So here is the correct process (omitting the parts that actually lead to the need to reboot the hosts):

  1. Make sure that cluster has enough capacity to temporarily run without 1 host (it is very common to have atleast N+1 HA redundancy)
  2. Disable host in vCloud Director
  3. Put host into vSphere maintenance mode while evacuating all running, suspended and powered-off VMs
  4. Reboot host
  5. When hosts comes up exit the maintenance mode
  6. Enable host
  7. Repeat with other hosts

As a quick proof of concept I am attaching a PowerCLI script that automates this. It needs to talk to both vCloud Director and vCenter Server therefore replace Connect strings at the beginning to match your environment.

## Connect to vCloud Director and all vCenter Servers it manages
Connect-CIServer -Server vcloud.gcp.local -User Administrator -Password VMware1!
Connect-VIServer -Server vcenter.gcp.local -User Administrator -Password VMware1!

$ESXiHosts = Search-cloud -QueryType Host
foreach ($ESXiHost in $ESXiHosts) {
	$CloudHost = Get-CIView -SearchResult $ESXiHost
	Write-Host
	Write-Host "Working on host" $CloudHost.Name
	Write-Host "Disabling host in vCloud Director"
	$CloudHost.Disable()
	Write-Host "Evacuating host"
	Set-VMHost $CloudHost.Name -State Maintenance -Evacuate | Out-Null
	Write-Host "Rebooting host"
	Restart-VMHost $CloudHost.Name -Confirm:$false | Out-Null
    Write-Host -NoNewline "Waiting for host to come online "
    do {
		sleep 15
		$HostState = (get-vmhost $CloudHost.Name).ConnectionState
		Write-Host -NoNewline "."
    }
    while ($HostState -ne "NotResponding")
    do {
		sleep 15
		$HostState = (get-vmhost $CloudHost.Name).ConnectionState
		Write-Host -NoNewline "."
    }
	while ($HostState -ne "Maintenance")
	Write-Host
	Write-Host "Host rebooted"
	Set-VMHost $CloudHost.Name -State Connected | Out-Null
	Write-Host "Enabling Host in vCloud Director"
	$CloudHost.Enable()
}

 

PowerCLI output

Edit 9/5/2016

Michael Belohaubek sent me his modified script that is more user friendly, takes into account VM Tools installation that could potentially block host maintenance mode and also HA Admission Control set to failover hosts. 

if(-not (Get-PSSnapin VMware.VimAutomation.Cloud -ErrorAction SilentlyContinue))
{
 Add-PSSnapin VMware.VimAutomation.Cloud -ErrorAction SilentlyContinue
}

if(-not (Get-PSSnapin VMware.VimAutomation.Core -ErrorAction SilentlyContinue))
{
 Add-PSSnapin VMware.VimAutomation.Core -ErrorAction SilentlyContinue
}

#Getting vCenter information from user
$ResourceVC = Read-Host "Please enter internal FQDN of vCloud Resource vCenter (e.g. gcp-atx-vc01.gcp.local)"
$ResourceVCcredentials = Get-Credential -Message "Please enter Resource vCenter Admin User (e.g. administator@vsphere.local)"
$vCenterConnection = Connect-VIServer -Server $ResourceVC -Credential $ResourceVCcredentials -ErrorAction SilentlyContinue

#check if connection to VC was working
if(-not $vCenterConnection.IsConnected){
 Write-Host "vCenter connection not established. Program exit."
 exit
}

#Ask for vCloud environment
$vCloudServer = Read-Host "Please enter internal FQDN of vCloud environment (e.g. gcp-atx-vcloud1.gcp.local)"
$vCloudCredential = Get-Credential -Message "Please enter vCloud local Admin (e.g. Administrator)"
$vCloudConnection = Connect-CIserver -server $vCloudServer -Org "system" -Credential $vCloudCredential

#check if connection to VC was working
if(-not $vCloudConnection.IsConnected){
 Write-Host "vCloud connection not established. Program exit"
 exit
}

$ResourceClusterInput = Read-Host "Please enter cluster name for rebooting all cluster hosts (e.g for VIB Upgrade)"

$ResourceCluster = Get-Cluster -Name $ResourceClusterInput -ErrorAction SilentlyContinue

#check if cluster read from user is existent
if(-not $ResourceCluster){
 Write-Host "Cluster specified not found. Exiting script."
 exit
}
else{
 
 $ConfirmReboots = Read-Host "All hosts of cluster " $ResourceClusterInput " (one by one) will get disabled in VCD, set to maintenance mode and rebooted. VMware Installers will be disconnected. Continue (y/n)"
 
 if($ConfirmReboots -eq "y")
 {
 #Get all VMHosts
 $ClusterHosts = Get-VMhost -Location $ResourceCluster.Name

 #Get Failover host
 $ClusterView = $ResourceCluster | Get-View
 $FailoverHostID = $ClusterView.Configuration.DasConfig.AdmissionControlPolicy.FailoverHosts
 
 #Disable Admission Control, to ensure that sufficient resource are available during reboot
 Set-Cluster -HAAdmissionControlEnabled $false -Cluster $ResourceCluster.Name -Confirm:$false
 Write-Host "Admission Control Disabled"

 #Foreach host in cluster
 foreach($VCHost in $ClusterHosts){
 
 #Get vCloud Host Entity and View
 $CloudHost = Search-Cloud -QueryType Host -Name $VCHost.Name
 $CloudHostView = Get-CIView -SearchResult $CloudHost
 Write-Host
 Write-Host "Working on host" $CloudHost.Name
 Write-Host "Disabling host in vCloud Director"
 #Disable host in VCD
 $CloudHostView.Disable()

 Write-Host "Checking for VMware Tools installations on host" $VCHost.Name

 #Check for VMs with VMware Tools installer mounted
 $VMsCDconnected = Get-VM -Location $VCHost | where { $_ | get-cddrive | where { $_.ConnectionState.Connected -eq "true" -and $_.IsoPath -like "*/usr/lib/vmware*"} }

 #If VMs exist with VMware Tools installer mounted, dismount them
 if($VMsCDconnected){
 foreach($VMumount in $VMsCDconnected ){
 Dismount-Tools -VM $VMumount
 Write-Host "Unmounted VMware Installer for VM " $VMumount
 }
 }

 #Enter Maintenance mode
 Write-Host "Evacuating host"
 Set-VMHost $VCHost.Name -State Maintenance -Evacuate | Out-Null
 #Reboot host and wait till reboot is complete
 Write-Host "Rebooting host"
 Restart-VMHost $VCHost.Name -Confirm:$false | Out-Null
 Write-Host -NoNewline "Waiting for host to come online "
 do
 {
 sleep 15 
 $HostState = (get-vmhost $VCHost.Name).ConnectionState
 Write-Host -NoNewline "."
 }
 while ($HostState -ne "NotResponding")
 do
 { 
 sleep 15
 $HostState = (get-vmhost $VCHost.Name).ConnectionState
 Write-Host -NoNewline "."
 }
 while ($HostState -ne "Maintenance")
 
 #Exit maintenance mode
 Write-Host "Host rebooted"
 Set-VMHost $VCHost.Name -State Connected | Out-Null
 #Enable host in VCD
 Write-Host "Enabling Host in vCloud Director"
 $CloudHostView.Enable()
 }

 #Get Failover Hosts
 $FailoverHosts = Get-VMHost | where {$FailoverHostID.Value -contains $_.Id.Replace("HostSystem-","") }
 
 #Enter maintenance mode to ensure that failover hosts are empty
 foreach($EvacHost in $FailoverHosts){
 Set-VMHost $EvacHost.Name -State Maintenance -Evacuate | Out-Null
 }

 #Enable Admission Control again
 Set-Cluster -HAAdmissionControlEnabled $true -Cluster $ResourceCluster.Name -Confirm:$false
 Write-Host "Admission Control Enabled"
 Write-Host "Evacute Failover Hosts"

 #Exit maintenance mode for failover hosts
 foreach($ConHost in $FailoverHosts){
 Set-VMHost $ConHost.Name -State Connected | Out-Null
 }
 }
}

#Close open connections
Disconnect-VIServer -Server $vCenterConnection -Confirm:$false
Disconnect-CIServer -Server $vCloudConnection -Confirm:$false 

RebootVCDHosts