Reboot All Hosts in vCloud Director

Reboot RequiredvCloud Director based clouds support non-disruptive maintenance of the underlying physical hosts. They can be patched, upgraded or completely exchanged without any impact on the customer workloads all that thanks to vMotion and DRS Maintenance Mode which can evacuate all running, suspended or powered-off workloads from an ESXi host.

Many service providers are going to be upgrading their networking platform from vCloud Network and Security (vCNS) to NSX. This upgrade besides upgrading the Manager and deploying new NSX Controllers requires upgrade of all hosts with new NSX VIBs. This host upgrade results in the need to reboot every host in the service provider environment.

Depending on number of hosts, their size and vMotion network throughput evacuating each host can take 5-10 minutes and reboot can add additional 5 minutes. So for example sequential reboot of 200 hosts could result in full weekend long maintenance window. However, as I mentioned, these reboots can be done non-disruptively without any impact on customers – so no maintenance windows is necessary and no SLA is breached.

So how do you properly reboot all hosts in vCloud Director environment?

While vSphere maintenance mode helps, it is important to properly coordinate it with vCloud Director.

  • Before a host is put into a vSphere maintenance mode it should be disabled in vCloud Director which will make sure it does not try to communicate with the host for example for image uploads.
  • All workloads (not just running VMs) must be evacuated during the maintenance mode. A customer who decides to power on VM or clone a VM which is registered to a rebooting (and temporarily unavailable) host would be otherwise impacted.

So here is the correct process (omitting the parts that actually lead to the need to reboot the hosts):

  1. Make sure that cluster has enough capacity to temporarily run without 1 host (it is very common to have atleast N+1 HA redundancy)
  2. Disable host in vCloud Director
  3. Put host into vSphere maintenance mode while evacuating all running, suspended and powered-off VMs
  4. Reboot host
  5. When hosts comes up exit the maintenance mode
  6. Enable host
  7. Repeat with other hosts

As a quick proof of concept I am attaching a PowerCLI script that automates this. It needs to talk to both vCloud Director and vCenter Server therefore replace Connect strings at the beginning to match your environment.

## Connect to vCloud Director and all vCenter Servers it manages
Connect-CIServer -Server vcloud.gcp.local -User Administrator -Password VMware1!
Connect-VIServer -Server vcenter.gcp.local -User Administrator -Password VMware1!

$ESXiHosts = Search-cloud -QueryType Host
foreach ($ESXiHost in $ESXiHosts) {
	$CloudHost = Get-CIView -SearchResult $ESXiHost
	Write-Host
	Write-Host "Working on host" $CloudHost.Name
	Write-Host "Disabling host in vCloud Director"
	$CloudHost.Disable()
	Write-Host "Evacuating host"
	Set-VMHost $CloudHost.Name -State Maintenance -Evacuate | Out-Null
	Write-Host "Rebooting host"
	Restart-VMHost $CloudHost.Name -Confirm:$false | Out-Null
    Write-Host -NoNewline "Waiting for host to come online "
    do {
		sleep 15
		$HostState = (get-vmhost $CloudHost.Name).ConnectionState
		Write-Host -NoNewline "."
    }
    while ($HostState -ne "NotResponding")
    do {
		sleep 15
		$HostState = (get-vmhost $CloudHost.Name).ConnectionState
		Write-Host -NoNewline "."
    }
	while ($HostState -ne "Maintenance")
	Write-Host
	Write-Host "Host rebooted"
	Set-VMHost $CloudHost.Name -State Connected | Out-Null
	Write-Host "Enabling Host in vCloud Director"
	$CloudHost.Enable()
}

 

PowerCLI output

Edit 9/5/2016

Michael Belohaubek sent me his modified script that is more user friendly, takes into account VM Tools installation that could potentially block host maintenance mode and also HA Admission Control set to failover hosts. 

if(-not (Get-PSSnapin VMware.VimAutomation.Cloud -ErrorAction SilentlyContinue))
{
 Add-PSSnapin VMware.VimAutomation.Cloud -ErrorAction SilentlyContinue
}

if(-not (Get-PSSnapin VMware.VimAutomation.Core -ErrorAction SilentlyContinue))
{
 Add-PSSnapin VMware.VimAutomation.Core -ErrorAction SilentlyContinue
}

#Getting vCenter information from user
$ResourceVC = Read-Host "Please enter internal FQDN of vCloud Resource vCenter (e.g. gcp-atx-vc01.gcp.local)"
$ResourceVCcredentials = Get-Credential -Message "Please enter Resource vCenter Admin User (e.g. administator@vsphere.local)"
$vCenterConnection = Connect-VIServer -Server $ResourceVC -Credential $ResourceVCcredentials -ErrorAction SilentlyContinue

#check if connection to VC was working
if(-not $vCenterConnection.IsConnected){
 Write-Host "vCenter connection not established. Program exit."
 exit
}

#Ask for vCloud environment
$vCloudServer = Read-Host "Please enter internal FQDN of vCloud environment (e.g. gcp-atx-vcloud1.gcp.local)"
$vCloudCredential = Get-Credential -Message "Please enter vCloud local Admin (e.g. Administrator)"
$vCloudConnection = Connect-CIserver -server $vCloudServer -Org "system" -Credential $vCloudCredential

#check if connection to VC was working
if(-not $vCloudConnection.IsConnected){
 Write-Host "vCloud connection not established. Program exit"
 exit
}

$ResourceClusterInput = Read-Host "Please enter cluster name for rebooting all cluster hosts (e.g for VIB Upgrade)"

$ResourceCluster = Get-Cluster -Name $ResourceClusterInput -ErrorAction SilentlyContinue

#check if cluster read from user is existent
if(-not $ResourceCluster){
 Write-Host "Cluster specified not found. Exiting script."
 exit
}
else{
 
 $ConfirmReboots = Read-Host "All hosts of cluster " $ResourceClusterInput " (one by one) will get disabled in VCD, set to maintenance mode and rebooted. VMware Installers will be disconnected. Continue (y/n)"
 
 if($ConfirmReboots -eq "y")
 {
 #Get all VMHosts
 $ClusterHosts = Get-VMhost -Location $ResourceCluster.Name

 #Get Failover host
 $ClusterView = $ResourceCluster | Get-View
 $FailoverHostID = $ClusterView.Configuration.DasConfig.AdmissionControlPolicy.FailoverHosts
 
 #Disable Admission Control, to ensure that sufficient resource are available during reboot
 Set-Cluster -HAAdmissionControlEnabled $false -Cluster $ResourceCluster.Name -Confirm:$false
 Write-Host "Admission Control Disabled"

 #Foreach host in cluster
 foreach($VCHost in $ClusterHosts){
 
 #Get vCloud Host Entity and View
 $CloudHost = Search-Cloud -QueryType Host -Name $VCHost.Name
 $CloudHostView = Get-CIView -SearchResult $CloudHost
 Write-Host
 Write-Host "Working on host" $CloudHost.Name
 Write-Host "Disabling host in vCloud Director"
 #Disable host in VCD
 $CloudHostView.Disable()

 Write-Host "Checking for VMware Tools installations on host" $VCHost.Name

 #Check for VMs with VMware Tools installer mounted
 $VMsCDconnected = Get-VM -Location $VCHost | where { $_ | get-cddrive | where { $_.ConnectionState.Connected -eq "true" -and $_.IsoPath -like "*/usr/lib/vmware*"} }

 #If VMs exist with VMware Tools installer mounted, dismount them
 if($VMsCDconnected){
 foreach($VMumount in $VMsCDconnected ){
 Dismount-Tools -VM $VMumount
 Write-Host "Unmounted VMware Installer for VM " $VMumount
 }
 }

 #Enter Maintenance mode
 Write-Host "Evacuating host"
 Set-VMHost $VCHost.Name -State Maintenance -Evacuate | Out-Null
 #Reboot host and wait till reboot is complete
 Write-Host "Rebooting host"
 Restart-VMHost $VCHost.Name -Confirm:$false | Out-Null
 Write-Host -NoNewline "Waiting for host to come online "
 do
 {
 sleep 15 
 $HostState = (get-vmhost $VCHost.Name).ConnectionState
 Write-Host -NoNewline "."
 }
 while ($HostState -ne "NotResponding")
 do
 { 
 sleep 15
 $HostState = (get-vmhost $VCHost.Name).ConnectionState
 Write-Host -NoNewline "."
 }
 while ($HostState -ne "Maintenance")
 
 #Exit maintenance mode
 Write-Host "Host rebooted"
 Set-VMHost $VCHost.Name -State Connected | Out-Null
 #Enable host in VCD
 Write-Host "Enabling Host in vCloud Director"
 $CloudHostView.Enable()
 }

 #Get Failover Hosts
 $FailoverHosts = Get-VMHost | where {$FailoverHostID.Value -contains $_.Id.Replace("HostSystem-","") }
 
 #Enter maintenance mode to ensure that failover hosts are empty
 foreach($EvacHost in $FailoverHosts){
 Set-VMHost $EvacHost.Name -State Maintenance -Evacuate | Out-Null
 }

 #Enable Admission Control again
 Set-Cluster -HAAdmissionControlEnabled $true -Cluster $ResourceCluster.Name -Confirm:$false
 Write-Host "Admission Control Enabled"
 Write-Host "Evacute Failover Hosts"

 #Exit maintenance mode for failover hosts
 foreach($ConHost in $FailoverHosts){
 Set-VMHost $ConHost.Name -State Connected | Out-Null
 }
 }
}

#Close open connections
Disconnect-VIServer -Server $vCenterConnection -Confirm:$false
Disconnect-CIServer -Server $vCloudConnection -Confirm:$false 

RebootVCDHosts

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s