Custom vCenter Server Event and Alarm

Related to my previous post about monitoring Edge Gateways my customer asked me if he could leverage vCenter Server alarms as they are integrated with their monitoring and alerting infrastructure.

So basically is there a way to create vCenter alarm via scripted action (for example with PowerCLI)?

The answer is yes and it is not that difficult. There are two types of vCenter alarms: based on condition/state or an event. And it is possible to create custom user event via Loguserevent method of entity manager via vSphere API.

This is example of PowerCLI code the creates user event “Edge Gateway event” at the root datacenter folder.

$DCFolderEntity = Get-Folder -Name datacenters
$eventMgr = Get-View EventManager
$eventMgr.LogUserEvent($entity.ExtensionData.MoRef,"Edge Gateway event")

User logged event

Now it is easy to create custom alarm based on the user logged event. Create the alarm at the vCenter Server root level, as alarm type pick monitor vCenter Server and as triggers manually enter (type):

Event; vim.event.GeneralUserEvent 

That’s it.

Alarm

Advertisements

Graceful Shutdown of vCloud Director Cell

I have been challenged by one of my customers how to properly shutdown vCloud Director cell without any disruption of the service if multiple cells are used. Although we have KB article 2034994 about this particular subject it omits some important details.

When vCenter is connected to vCloud Director a VC Proxy service is started on one of the cells. The service is responsible for monitoring of active vCenter tasks and inventory updates which are then shared with other cells. Unless there is a network partition between the cells there is always one vCenter proxy service for one vCenter. Multiple VC Proxies can run on one cell. You can see which cell is running the VC Proxy service at the vCenter screen in the vCloud Director Admin interface.

vCenter Proxy

The screenshot shows two vCenters connected to vCloud Director with one having its vCenter proxy on vcloud1 cell and the second on vcloud2 cell.

If the VC Proxy service is not running most of the activities in the vCloud Director that require vCenter will not work properly. For example simple creation of a vApp with one VM will fail with message:

Folder vApp_system_34 (8ce90b57-da8b-4714-914b-5073457155b0) does not exist in our inventory, but vCenter Server claims that it does.

This is because the inventory listener on the VC Proxy was not running and vCloud Director could not verify successful creation of vApp folder in vCenter. When a cell with VC Proxy service dies the service fails over to a surviving cell. However that failover takes 5 minutes which is govern by vcloud:vcloud.heartbeat.failoverTimeoutMsecs property (stored in vCloud Director database). I am not aware if it is supported to change this value.

Anyway in order to shutdown a cell gracefully we need to move the VC Proxy service to another cell. This can be done by simple reconnect of vCenter and the move is very quick without any disruption of the running tasks.

Reconnect vCenter
Reconnect vCenter

I have observed that if possible different cell then the original and the least loaded (in terms of number of VC Proxy services) is chosen. This is also good for manually distributing the load if there are multiple vCenters and multiple cells (good practice is to have at least N+1 cells, where N is number of vCenters).

So what should be the correct graceful cell shutdown procedure?

  1. Make sure the cell is not running any VC Proxy service. No checkmark should be in the vCenter column of the Cloud Cells inventory in the vCloud Director Admin interface.Cloud CellsIf yes, then reconnect vCenters that have VC Proxy running on the cell.
  2. Quiesce the cell with the cell-management-tool:

    $VCLOUD_HOME/bin/cell-management-tool -u <user> cell –quiesce true

    where <user> is vCloud administrator username

  3. Monitor the number of outstanding active tasks on the cell and wait until it reaches 0.

    $VCLOUD_HOME/bin/cell-management-tool -u <user> cell –status
    Job count = 0
    Is Active = false

  4. Shutdown the cell. This can be done also with cell-management-tool. What I noticed is that it takes multiple attempts (usually two), as the first time only the watchdog service is terminated.

    # $VCLOUD_HOME/bin/cell-management-tool -u <user> cell –shutdown

    # service vmware-vcd status

    vmware-vcd-watchdog is not running
    vmware-vcd-cell is running

    # $VCLOUD_HOME/bin/cell-management-tool -u <user> cell –shutdown
    # service vmware-vcd status

    vmware-vcd-watchdog is not running
    vmware-vcd-cell is not running