vCloud Director 8.20: Orchestrated Upgrade

vCloud Director architecture consist of multiple cells that share common database. The upgrade process involves shutting down services on all cells, upgrading them, upgrading the database and starting the cells. In large environments where there are three or more cells this can be quite labor intensive.

vCloud Director 8.20 brings new feature – an orchestrated upgrade. All cells and vCloud database can be upgraded with a single command from the primary cell VM. This brings two advantages. Simplicity – it is no longer needed to login to each cell VM, upload binaries and execute upgrade process manually. Availability – downtime during the upgrade maintenance window is reduced.

Prerequisites

Set up ssh private key login from the primary cell to all other cells in the vCloud Director instance for user vcloud.

  1. On the primary cell generate private/public key (with no passphrase):

    ssh-keygen -t rsa -f $VCLOUD_HOME/etc/id_rsa
    chown vcloud:vcloud $VCLOUD_HOME/etc/id_rsa
    chmod 600 /opt/vmware/vcloud-director/etc/id_rsa
     

  2. Copy public key to each additional cell in the instance to authorized_keys file. This can be done with one line command ran from the primary cell or with this ssh-copy-id. Use IP/FQDN it is registered with in VCD

    cat $VCLOUD_HOME/etc/id_rsa.pub | ssh root@<cell-IP> “mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys” 

  3. Verify that login with private key works for each secondary cell in the environment

    sudo -u vcloud ssh -i $VCLOUD_HOME/etc/id_rsa root@<cell IP/FQDN>

Multi-cell Installation

Upload vCloud Director binary to the primary cell and make it executable. Execute the file with private-key-path option pointing to the private key.

/root/vmware-vcloud-director-distribution-8.20.0-5070903.bin –private-key-path $VCLOUD_HOME/etc/id_rsa

 

Optionally a maintenance cell can be specified with –maintenance-cell option.

For troubleshooting, the upgrade log is located on the primary cell in  $VCLOUD_HOME/logs/upgrade-<date and time>.log

For no-prompt execution you can add –unattended-upgrade option.

Workflow

This is the workflow that is automatically executed:

  1. Quiesce, shutdown and upgrade of the primary cell. Does not start the cell.
  2. If maintenance cell was specified, it is put into maintenance mode.
  3. Quiescing and shut down of all the other cells.
  4. Upgrade of the vCloud Database (a prompt for backup)
  5. Upgrade and start of all other cells (except the maintenance cell)
  6. If maintenance cell was specified, it is upgraded and started.
  7. Start of the primary cell

What is the difference between a quiesced cell and a cell in the maintenance mode?

Quiesced cell:

  • finishes existing long running operations
  • answers to new requests and queues them
  • does not dequeue any operations (they will stay in the queue)
  • VC lister keeps running
  • Console proxy keeps running

Cell in maintenance mode

  • waits for finish of long running but fails all queued operations
  • answer to most requests with HTTP Error code 504 (unavailable)
  • still issues auth token for /api/sessions login requests
  • No VC listener
  • No Console proxy

Interoperability with vCloud Availability

vCloud Availability uses Cloud Proxies to terminate replication tunnels from the internet. Cloud Proxies are essentially stripped down vCloud Director cells and are therefore treated as regular cells during the orchestrated upgrade.

Quiesced Cloud Proxy has no impact on replication operations and traffic. Cloud Proxy in the maintenance mode still preserves existing replications however new replications cannot be established.

2/27/2017: Multiple edits based on feedback from engineering. Thank you Matthew Frost!

Upgrading ESXi host to vSphere 5.5 with Cisco Nexus 1000V

I have upgraded my vSphere lab cluster from ESXi 5.1 to 5.5. Even though my lab consists only of 2 hosts I wanted to use Update Manager orchestrated upgrade to simulate how it would be done in big enterprise or service provider environment with as little manual steps as possible.

As I use Cisco Nexus 1000V and vCloud Director following procedure was devised:

1. It is not recommended to put a host into maintenance mode without first disabling it in vCloud Director. The reason is that vCloud Director catalog media management can get confused by inaccessibility of a host due maintenance mode. However when using Update Manager it is not possible to orchestrate disabling a host before maintenance mode. Therefore I would recommend to do the whole upgrade operation during maintenance window when vCloud Director portal is not accessible to end-users.

2. I have a few custom vibs installed on the hosts. Cisco 1000V VEM vib, vcloud agent vib, VXLAN vib. Other common are NetApp NFS plugin or EMC PowerPath. This means a custom ESXi 5.5 image must be created first. This can be done quite easily in PowerCLI 5.5 Note VXLAN vib does not need to be included as it is installed automatically when host exits maintenance mode (similar to FDM HA vib).

3. Add necessary software depots (ESXi online, Cisco Nexus 1000V and vcloud-agent offline). vCloud Director agent vib can be downloaded from any cell at following location:/opt/vmware/vcloud-director/agent/vcloudagent-esx55-5.5.0-1280396.zip

Add-EsxSoftwareDepot https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml

Add-EsxSoftwareDepot .\VEM550-201308160108-BG-release.zip

Add-EsxSoftwareDepot .\vcloudagent-esx55-5.5.0-1280396.zip

5. Find the newest profile and clone it:

Get-EsxImageProfile | Sort-Object “ModifiedTime” -Descending | format-table -property Name,CreationTime

New-EsxImageProfile -CloneProfile ESXi-5.5.0-1331820-standard “ESXi-5.5.0-1331820-standard-VEM-vcloud” -vendor custom 

6. Get the names of all vibs and add those needed to the new profile

Get-EsxSoftwarePackage

Add-EsxSoftwarePackage -ImageProfile ESXi-5.5.0-1331820-standard-VEM-vcloud cisco-vem-v160-esx

Add-EsxSoftwarePackage -ImageProfile ESXi-5.5.0-1331820-standard-VEM vcloud-agent

7. Export profile to an iso image (this will take a while as we need to download about 300 MBs of data from the internet)

Export-EsxImageProfile -ImageProfile ESXi-5.5.0-1331820-standard-VEM-vcloud -ExportToIso ESXi-5.5.0-1331820-standard-VEM-vcloud.iso

8. Now we can upload the iso to Update Manager, create upgrade baseline and attach it to the cluster.

9. When I run “Scan for Updates” I received status “Incompatible”. VMware Update Manager release notes mention this:

The Incompatible compliance status is because of the way the FDM (HA) agent is installed on ESXi 5.x hosts. Starting with vSphere 5.0, the FDM agent is installed on ESXi hosts as a VIB. When a VIB is installed or updated on an ESXi host, a flag is set to signify that the bootbank on the host has been updated. Update Manager checks for this flag while performing an upgrade scan or remediation and requires this flag to be cleared before upgrading a host. The flag can be cleared by rebooting the host.

I rebooted the hosts and Scanned for Updates again this time without any issue. I was ready for upgrade.

10. The upgrade of my two hosts took about 50 minutes. It was nicely orchestrated by Update Manager and finished without any issues.

11. I still needed to upgrade the vcloud host agents from vCloud Director, but that could be automated with vCloud API (host is put into maintenance mode during this operation).

vCloud Director 5.1 Features and their vSphere Dependency

I see more and more customers are migrating from vCloud Director 1.5 to vCloud Director 5.1. One question they have is: “Do we have to migrate to vSphere 5.1 at the same time”? The answer is definite no. vCloud Director 5.1 supports vCenter 5.0 and ESXi 5.0 and even ESX(i) 4.0U2 if managed by vCenter 5.

I always recommend to upgrade vCloud Director in two phases.

Phase 1 (vCloud Director Upgrade)

  • vCloud Director Cell operating system (RHEL). RHEL 5 is still supported but if customer wants to use RHEL 6 he will need to deploy a new cell as RHEL 5 to RHEL 6 upgrade is not possible.
  • vCloud Director runtime upgrade
  • vCloud Director database schema upgrade
  • vShield Manager upgrade
  • vShield Edges upgrade

Phase 2 (vSphere Upgrade)

  • Installation of SSO
  • Installation of Inventory Service
  • Installation/upgrade of Web Client
  • vCenter Server upgrade
  • ESX hosts upgrade
  • distributed virtual switches upgrade

As the phases can be spread out in time this brings the main topic of the article – which new vCloud Director 5.1 features depend on vSphere 5.1 and will not be available during the time between Phase 1 and Phase 2? I have compiled a table which lists the new vCloud Director features and if that feature will be available with vSphere 5.0 (vCenter 5.0 + ESX 5.0. Note: I don’t dare to consider ESX 4).

Feature

vSphere 5.0

Note
VM Snapshots

Storage Profiles

Elastic VDC

Allocation pool Org VDC type can span multiple clusters. Online migrations and merging of Provider VDCs.
Provider Single Sign On vCenter SSO required
Customer Single Sign On

SSPI, SAML2
VXLAN Networks vSphere 5.1 vmkernel module is required
Storage clusters (SDRS)

VM placement engine leverages SDRS. Migration of linked clones supported. Difference in shadow VM handling¹
New Edge Gateway Features

Performance, HA, Load balancing, DNS relay, Rate limits, Multiple interfaces, IP allocations, SNAT and DNAT rules
Virtual Hardware 9 Requires vSphere 5.1 (64 vCPUs)
Additional Guest OS Support

possibly

Depends on ESX version (Windows 8/2012 requires ESXi 5.0 U1), but Virtual Hardware 9 is recommended (KB 2034491)
NFS VAAI Fast Provisioning Requires vSphere 5.1 (hardware accelerated linked clones)
Clustered database support

¹) With vSphere 5.0 vCloud Director does not use SDRS recommendation for linked clone placement (Fast Provisioning). vCloud Director picks individual datastore and optionally deploys shadow VM. With vSphere 5.1 vCloud Director fully leverages SDRS recommendations, shadow VMs are deployed by vSphere SDRS.

Table in PNG format.

Disclaimer: I don’t claim this table is complete and that it is an official VMware document. If you think something is missing, please comment and I will edit the table.

Edit 27 April 2013: Explained difference in linked clone placement.