vSAN File Services with vCloud Director

vSphere 7 is now generally available and with it came also new vSAN update that introduces vSAN File Service. Cormac Hogan has good overview of the feature on his blog so definitely head there first to understand what it is.

I want to dive into the possibility of using vSAN File Service NFS in vCloud Director environments.

Let me start with current (April 2020) interop – vSphere 7 is not supported yet with vCloud Director. Which means vCenter Server 7 cannot be used as a target for IaaS services. But that is not an issue for the use case I want to discuss today.

vCloud Director deployment needs NFS storage for its Transfer Share directory. vCloud Director architecture consists of multiple cells (VM nodes) that scale out horizontally based on the size of the managed environment. The cells need shared database and shared Transfer Share directory to function properly. The Transfer Share must be NFS mount and is used mostly for OVF import/export operations related to vApp template and catalog management however the appliance deployment mode of vCloud Director also uses transfer share for storing appliance related info, ssh keys, responses.properties file for deployment of additional cells, and embedded database backups.

vCloud Director cell VMs are usually deployed in the management cluster and that can be separate vSphere 7 environment with vSAN. Can we (or should we) use vSAN NFS for vCloud Director Transfer Share?

Current practice is either to use external hardware storage NFS (NetApp) or to deploy Linux VM with large disk that acts as NFS server. The first approach is not always possible especially if you use vSAN only and have no external storage available. Then you have to go with the Linux VM approach. But not anymore.

 

vSAN File Service NFS has the following advantages:

  • no external dependency on hardware storage or Linux VM
  • easy to deploy and manage from UI or programmatically
  • capacity management with quotas and thresholds
  • high availability
  • integrated lifecycle

The whole end-to-end deployment is indeed very simple, let me demonstrate the whole process:

  1. Start with vSAN FS configuration in vSphere Cluster > Configure > vSAN > Services > File Service
  2. Directly download vSAN File service agent (the lightweight container image OVA)
  3. Configure vSAN domain and networking

  4. Provide pool of IP addresses for the containers (I used 4 as I have 4 host management cluster).
  5. After while you will see the agent containers deployed on each cluster node.
  6. Now we can proceed with NFS share configuration. In the vSphere Cluster > Configure > vSAN > File Service Shares > ADD. We can define name, vSAN storage policy and quotas.
  7. Enter IP addresses of your vCloud Director cells to grant them access to this share. Set permission Read/Write and make sure root squash is disabled.
  8. Once the share is created, select the check box and copy its URL. Chose the NFSv4.1.
    In my case it looks like 192.168.110.181:/vsanfs/VCDTransferShare
  9. Now you use the string in your vCloud Director cell deployment. I am using the vCloud Director appliance.
  10. Once the cell is started we can see how the transfer share is mounted:

    Notice that while the mount IP address in /etc/fstab is the provided one 192.168.110.171, the actual one used is 192.168.110.172. This provides load balancing across all service node when more exports are created and when NFSv4.1 mount address is used.

It is imported to understand that although we have 4 vSAN FS agents deployed, the TransferShare will be provided via single container – in my case it is the one with IP 192.168.110.172. To find out on which host this particular container is running go to Cluster > Monitor > vSAN > Skyline Health > File Service > File Service Health.

So what happens if the host esx-03a.corp.local is unavailable? The share will fail over to another host. This took in my tests around 60-90 seconds. During that time the NFS share will not be accessible but the mount should persist and once the failover finishes it will become accessible again.

Notice that 192.168.110.172 is now served from esx-04.corp.local.

Also note that maintenance mode of the host will not vMotion the agent. It will just shut it down (and after while undeploy) and rely on the above mechanism to fail over the share to another agent. You should never treat the vSAN FS agents as regular VMs.

I am excited to see vSAN File Services as another piece of VMware technology that removes 3rd party dependencies from running vCloud Director (as was the case with NSX load balancing and PostgreSQL embedded database).

vSphere HA and NFS Datastores

Recently during a vCloud Director project we were testing how long it  takes to recover from an HA event. The test was done on two node management cluster where we loaded one host with almost all of the management VMs and then shut it down and measured how long it takes all the affected services to recover. This exercise was done to see if we can fulfill the required SLA.

The expectation was that it will take about 20 second for the other host to find out the first one is dead and then it will start to power up all the VMs based on their restart priority. Database server and Domain Controller have high priority, the rest of the VMs had the default one. To our surprise it did not took 20 seconds or so, but 8:40 minutes to register the database server and start the boot procedure. For some reason the particular server was shown with 95% Power On status. Although there are books written about vSphere HA, this behaviour was not explained.

See the Start and Completed Times:

At first it looked like a bug so SR was raised but then we found out it is like that by design. We were using NFS storage and NFS locking is influencing how long it takes to release the locks on VMs vmdk and vswp files. KB article 1007909 states that the time to recover the lock on NFS storage can be calculated:

X = (NFS.DiskFileLockUpdateFreq * NFS.LockRenewMaxFailureNumber) + NFS.LockUpdateTimeout

which with default values is

X = (10 * 3) + 5 = 35 seconds.

However the database server had 12 vmdk disks (2 disks per database) and the restart actually took (12+1)*(35+5) = 8:40 minutes. It means the locks were released sequentially, additional 5 seconds was added to each and also the VM swap file lock had to be released. This is expected behavior for vSphere 5.0 and older. The newly release vSphere 5.1 lowers the time down to 2 minutes as there are 5 threads (main vmx + 4 worker threads) working in parallel and those 13 files can be released in 3 takes.

KB article 2034935 was written about this behaviour.

If this is by design what can you do to avoid it?

1. Upgrade to ESX 5.1 to get up to 5 time faster HA restart times

2. Use block storage instead of NFS

3. Tweak NFS advanced parameters (DiskFileLockUpdateFreq, LockRenewMaxFailureNumber, LockUpdateTimeout) – however this is not recommended

4. Do not use that many VMDKs. Either consolidate on smaller number of disks, or use in guest disk mapping (iSCSI, NFS)

5. Just accept it when you calculate your SLAs.

Iomega VMware ESX NFS Datastore Issue

In my VMware vSphere home lab I have been using for shared storage various hardware or software appliances: from Openfiler, Falconstor VSA, HP LeftHand/StorageWorks P4000 VSA to EMC Celerra VSA. Recently I have added Iomega ix4-200d. Its NFS sharing is VMware vSphere certified. Although Iomega is not very powerfull (see my previous blog post about Iomega) I moved all my VMs to it to free up my storage server to play with other storage appliances (I am testing Nexenta now, but that is for another blog post).

My setup is now very simple. I have diskless ESXi that runs all the VMs from the NFS datastore served by Iomega. Today I have restarted the ESXi server and was surprised that due to inaccessible NFS datastore no VM was started.  The datastore was grayed out in the vSphere Client GUI.

I have virtual domain controller, internet firewall/router, mail server and some other less important machines. So if the ESX does not start properly I have no internet, email and I cannot even log in to Iomega CIFS shares because it is joined to domain which was also not available.
I was very surprised as I had no idea why the ESX server could not connect to the NFS datastore. Storege rescan did not help, so I have unmounted the datastore and tried to reconnect it. I received this error message:

Call “HostDatastoreSystem.CreateNasDatastore” for object “ha-datastoresystem” on ESXi “10.0.4.202” failed.
Operation failed, diagnostics report: Unable to complete Sysinfo operation. Please see the VMkernel log file for more details.

VMkernel log (which is on ESXi stored in /var/log/messages) did not help much:

Jan 15 22:10:25 vmkernel: 0:00:01:30.282 cpu0:4767)WARNING: NFS: 946: MOUNT RPC failed with RPC status 13 (RPC was aborted due to timeout) trying to mount Server (10.0.4.251) Path (/nfs/Iomega)

I was able to connect to the NFS Iomega export from regular linux machine. I was also able to connect the ESX server to regular linux NFS export. And that helped me to find the solution.

Because both of my DNS servers were running in virtual machines and not accessible, Iomega took more time to connect the ESX server to the NFS datastore and ESX server meanwhile gave up. The remedy was very simple. To Iomega /etc/hosts file I have added a line with the ESX server IP address and its hostname. This must be done via Iomega ssh console and not via web GUI:

root@Iomega:/# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
10.0.4.251 Iomega.FOJTA.COM Iomega
10.0.2.251 Iomega.FOJTA.COM Iomega
10.0.4.202 esx2.fojta.com esx2

From now when the ESX server reboots it mounts the NFS datastore immediately.