Upgrading ESXi host to vSphere 5.5 with Cisco Nexus 1000V

I have upgraded my vSphere lab cluster from ESXi 5.1 to 5.5. Even though my lab consists only of 2 hosts I wanted to use Update Manager orchestrated upgrade to simulate how it would be done in big enterprise or service provider environment with as little manual steps as possible.

As I use Cisco Nexus 1000V and vCloud Director following procedure was devised:

1. It is not recommended to put a host into maintenance mode without first disabling it in vCloud Director. The reason is that vCloud Director catalog media management can get confused by inaccessibility of a host due maintenance mode. However when using Update Manager it is not possible to orchestrate disabling a host before maintenance mode. Therefore I would recommend to do the whole upgrade operation during maintenance window when vCloud Director portal is not accessible to end-users.

2. I have a few custom vibs installed on the hosts. Cisco 1000V VEM vib, vcloud agent vib, VXLAN vib. Other common are NetApp NFS plugin or EMC PowerPath. This means a custom ESXi 5.5 image must be created first. This can be done quite easily in PowerCLI 5.5 Note VXLAN vib does not need to be included as it is installed automatically when host exits maintenance mode (similar to FDM HA vib).

3. Add necessary software depots (ESXi online, Cisco Nexus 1000V and vcloud-agent offline). vCloud Director agent vib can be downloaded from any cell at following location:/opt/vmware/vcloud-director/agent/vcloudagent-esx55-5.5.0-1280396.zip

Add-EsxSoftwareDepot https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml

Add-EsxSoftwareDepot .\VEM550-201308160108-BG-release.zip

Add-EsxSoftwareDepot .\vcloudagent-esx55-5.5.0-1280396.zip

5. Find the newest profile and clone it:

Get-EsxImageProfile | Sort-Object “ModifiedTime” -Descending | format-table -property Name,CreationTime

New-EsxImageProfile -CloneProfile ESXi-5.5.0-1331820-standard “ESXi-5.5.0-1331820-standard-VEM-vcloud” -vendor custom 

6. Get the names of all vibs and add those needed to the new profile

Get-EsxSoftwarePackage

Add-EsxSoftwarePackage -ImageProfile ESXi-5.5.0-1331820-standard-VEM-vcloud cisco-vem-v160-esx

Add-EsxSoftwarePackage -ImageProfile ESXi-5.5.0-1331820-standard-VEM vcloud-agent

7. Export profile to an iso image (this will take a while as we need to download about 300 MBs of data from the internet)

Export-EsxImageProfile -ImageProfile ESXi-5.5.0-1331820-standard-VEM-vcloud -ExportToIso ESXi-5.5.0-1331820-standard-VEM-vcloud.iso

8. Now we can upload the iso to Update Manager, create upgrade baseline and attach it to the cluster.

9. When I run “Scan for Updates” I received status “Incompatible”. VMware Update Manager release notes mention this:

The Incompatible compliance status is because of the way the FDM (HA) agent is installed on ESXi 5.x hosts. Starting with vSphere 5.0, the FDM agent is installed on ESXi hosts as a VIB. When a VIB is installed or updated on an ESXi host, a flag is set to signify that the bootbank on the host has been updated. Update Manager checks for this flag while performing an upgrade scan or remediation and requires this flag to be cleared before upgrading a host. The flag can be cleared by rebooting the host.

I rebooted the hosts and Scanned for Updates again this time without any issue. I was ready for upgrade.

10. The upgrade of my two hosts took about 50 minutes. It was nicely orchestrated by Update Manager and finished without any issues.

11. I still needed to upgrade the vcloud host agents from vCloud Director, but that could be automated with vCloud API (host is put into maintenance mode during this operation).

DCUI – High CPU usage on ESXi

Yesterday I have noticed that one of my home lab ESXi hosts has unusual high CPU usage. One CPU core was running at 100% constantly for a week or so. My first reaction was to look for which virtual machine went crazy and started consuming all its alocated CPU resources. To my surprise all VMs running on the host were almost idle. Obviously the CPU usage of VMs did not add up to the CPU usage of the host. For further investigation I run esxtop on the tech support console with the following result:

The process dcui.345917 had unusual high CPU usage. DCUI is the Direct Console User Interface on ESXi hosts (the yellow BIOS like page). The host has been rebooted 3 days before and that did not fix the problem. Google and VMware KB search did not return anything. Something funny had to be going on directly on the DCUI console. So I went directly to the host attached monitor and immediately noticed the screen was flashing. Why? The two keyboards of my ESXi hosts were stacked on top of each other and one was pushing a key on the other. And that was it. The pressed key (probably Enter) generated so much interupts to keep one CPU core busy.

Iomega VMware ESX NFS Datastore Issue

In my VMware vSphere home lab I have been using for shared storage various hardware or software appliances: from Openfiler, Falconstor VSA, HP LeftHand/StorageWorks P4000 VSA to EMC Celerra VSA. Recently I have added Iomega ix4-200d. Its NFS sharing is VMware vSphere certified. Although Iomega is not very powerfull (see my previous blog post about Iomega) I moved all my VMs to it to free up my storage server to play with other storage appliances (I am testing Nexenta now, but that is for another blog post).

My setup is now very simple. I have diskless ESXi that runs all the VMs from the NFS datastore served by Iomega. Today I have restarted the ESXi server and was surprised that due to inaccessible NFS datastore no VM was started.  The datastore was grayed out in the vSphere Client GUI.

I have virtual domain controller, internet firewall/router, mail server and some other less important machines. So if the ESX does not start properly I have no internet, email and I cannot even log in to Iomega CIFS shares because it is joined to domain which was also not available.
I was very surprised as I had no idea why the ESX server could not connect to the NFS datastore. Storege rescan did not help, so I have unmounted the datastore and tried to reconnect it. I received this error message:

Call “HostDatastoreSystem.CreateNasDatastore” for object “ha-datastoresystem” on ESXi “10.0.4.202” failed.
Operation failed, diagnostics report: Unable to complete Sysinfo operation. Please see the VMkernel log file for more details.

VMkernel log (which is on ESXi stored in /var/log/messages) did not help much:

Jan 15 22:10:25 vmkernel: 0:00:01:30.282 cpu0:4767)WARNING: NFS: 946: MOUNT RPC failed with RPC status 13 (RPC was aborted due to timeout) trying to mount Server (10.0.4.251) Path (/nfs/Iomega)

I was able to connect to the NFS Iomega export from regular linux machine. I was also able to connect the ESX server to regular linux NFS export. And that helped me to find the solution.

Because both of my DNS servers were running in virtual machines and not accessible, Iomega took more time to connect the ESX server to the NFS datastore and ESX server meanwhile gave up. The remedy was very simple. To Iomega /etc/hosts file I have added a line with the ESX server IP address and its hostname. This must be done via Iomega ssh console and not via web GUI:

root@Iomega:/# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
10.0.4.251 Iomega.FOJTA.COM Iomega
10.0.2.251 Iomega.FOJTA.COM Iomega
10.0.4.202 esx2.fojta.com esx2

From now when the ESX server reboots it mounts the NFS datastore immediately.

My vSphere Home Lab

Building vSphere home lab is in my opinion essential. It is quite popular subject and there was even VMworld session dedicated to it:

My reasons for having a home lab are following:

  • Run my home IT infrastructure (firewall, DHCP, Active Directory, Mail server, VoIP, …)
  • Try and learn new products
  • Get hands on experience needed for certifications

I tried many approaches (VMware server, VMware Workstation, Xeon server, …). My goals were to build quite powerful lab, that would fulfill the reasons stated above but at the same time to be cheap, to have low power requirements not to be noisy and not make too much heat. Here is my current config:

I have two servers. One acts as ‘unified’ storage server (NAS and iSCSI SAN) and one as the host for all other workloads.

Storage server

To try advanced virtualization features shared storage is a must. I needed huge fileserver even before I started with virtualization to store my DVD collection so I build multi terabyte linux based (Debian) fileserver with software raid, which later become Openfiler NAS with iSCSI protocol. However to learn VMware Site Recovery Manager I needed storage which could replicate and was compatible with SRM. The best low cost choice is VSA – Virtual Storage Appliance. VSAs are in OVF format and need either VMware Workstation or ESX server. I installed ESXi on my storage server. I virtualize the multiterabyte fileserver with RDM disks and next to it I run HP StorageWorks (LeftHand) VSA, FalconStor VSA or EMC Celerra VSA. For example for SRM tests I had two copies of Lefthand VSA replicating each other.

Hardware

The server was build with the purpose to fit there as many hard drives as possible. I use Chieftec Smart case with 9 external 5.25“ drive bays that are multiplied with 3in2 or 4in3 disk backplanes. The motherboard is ASUS P5B-VM DO with 7 SATA ports. I added 4 more with two PCI SATA controllers (Kouwell). I experimented with hardware RAID card (Dell PERC 5/i) but it made too much heat and the local Raw Disk Mapping did not work. The OS is booted from USB flash disk so I can put 11 drives to this server. Currently I have five 1TB drives (low powered green RE3 Western Digitals) for NAS in RAID 5 and two 500 GB drives for VSAs. Intel Core 2 Duo E6400 CPU with 3 GB RAM more than enough to run 3 VMs at the moment (RedHat based NAS fileserver, FalconStor and LeftHand).  One onboard Intel NIC is coupled with Intel PRO/1000 PT dual port PCIe adapter cheaply bought off eBay. One NIC is used for management and fileserver traffic (with VLAN separation), two NICs are teamed for iSCSI traffic.

Workload server

The purpose of ‘workload server’ is to run all the VMs needed for infrastructure purposes and testing. From my experience I found out that the consolidation of non production servers is usually limited by the memory available, therefore I was looking for the most cost effective option to build server with as much memory as possible. At the end I settled on Asus P5Q-VM which has 4 DIMM slots and supports up to 16GB of DDR2 RAM. I bought it cheaply off local eBay equivalent, added Intel Core 2 Duo E8400 3GHz (also bought used) processor and 12 GB of RAM (brand new 4 GB DIMM costs around 110 EUR). The onboard Realtek NIC is not ESX compatible, so I added Intel PRO/1000 PT dual port PCIe, Intel Pro PCIe and PCI adapters to get 4 network interfaces. The server is diskless, boots from USB flash disk and is very quite housed in small micro ATX case.

At the moment I run my infrastructure VMs (two firewalls – Endian and Vyatta), Domain Controller, Mail server, vCenter, vMA, TrixBox, Win XP) and vCloud Director test VMs (vCloud Director, Oracle DB, vShield Manager) without memory ballooning and CPU usage is at less than 25%. The heat and power consumption is acceptable although the noise of 7 hard drives and notably the backplane fans is substantial. In the future I may go the Iomega StorCenter direction to replace the fileserving functions as it is very quite and power efficient but most likely it will not replace the flexibility of VSA.

RDM (raw disk mapping) with Local Disks

I wanted to virtualize my home storage server with five 1 TB drives in software raid 5 configuration (linux mdadm). The main reason was to use the server not only for CIFS (samba) shares, but also for block storage (iSCSI) access with HP LeftHand VSA and similar ESX appliances. I have around 3.5 TB of data so copying it somewhere to rebuild the storage was not an option. So I decided to use RDM. I created the virtual machine (Red Hat Enterprise Linux) and added the 1TB local disks as RDM disks. Although this is not supported configuration it works both on ESX and ESXi. The advantage is that if I decide I want to go back to physical I can easily migrate the RDM disks.

To create the RDM local drives console access is needed. As I am using ESXi this must be done with the UNSUPPORTED method.

Next step is to find how are the disks seen and linked by ESX.

ls -la /vmfs/devices/disks

...
---------    0 root     root               1984 Jan  1  1970 vml.0100000000202020202057442d574341534a32333638383631574443205744 ->  10.ATA_____WDC_WD1000FYPS2D01ZKB1________________________WD2DWCASJ2368861
---------    0 root     root               1984 Jan  1  1970 vml.0100000000202020202057442d574341534a32333638383631574443205744:1 -> t10.ATA_____WDC_WD1000FYPS2D01ZKB1________________________WD2DWCASJ2368861:1
...

These are two symbolic links to a drive (we can see its type and serial number) and its only partition. We need to remember the vml.0… long name.

Then we create the rdm links in the VM directory:

vmkfstools -r /vmfs/devices/disks/vml.0100000000202020202057442d574341534a32333638383631574443205744 /vmfs/volumes/datastore1/FileServer/rdm_disk1.vmdk

In my case I created 5 such links.

The last step is to add the rdm links to the virtual machine (Add – Hard Disk – Use an existing virtual disk and browse to rdm_disk1.vmdk}.

Here is the final screenshot.