How to Monitor Health of NSX Edge Gateways

NSX Edge Service Gateways are virtual machines deployed by NSX Manager that provide network services (routing, bridging, load balancing, VPNs, DNS relay, DHCP, …). This makes them quite a critical component in the infrastructure and thus there might be a need to keep a close eye on their availability.

While NSX Manager reports the status of the Edges in the GUI and logs it might take some time to register a change in the health. If you want up-to-date status of an Edge health you need to query the NSX Manager with NSX API. The NSX Manager then retrieves the current status of the Edge. The mechanism of the communication between NSX Manager and Edge appliance(s) depends on the Edge version and the vSphere cluster status:

VIX Communication

This is the legacy mode of communication. NSX Manager utilizes VIX API to query vCenter Server and the ESXi Host which runs the Edge appliance who then via VM Tools talks to the actual VM. This mode of communication is used for the legacy Edges version 5.5.x (deployed via  the compatibility vShield v2 API) and as failback mode when for some reason Message Bus Communication mode is not possible.

VIX Guest Operations
source: https://www.vmware.com/support/developer/vix-api/guestOps50_technote.pdf

 

Message Bus Communication

This is direct (and faster) communication between NSX Manager and the ESXi host (vsfwd process) running the Edge appliance. During the Edge deployment the cluster where the Edge is deployed to must be prepared for NSX and without any issues.

Message Bus
source: NSXvSphereDesignGuidev2.1.pdf

 

To query the Edge health is an expensive operation – it takes time and creates load on NSX Manager. If there is large number of Edges (for example in service provider scenario) this should be considered.

To test the viability of checking at least once in a given time the status of all Edges health I have created simple Powershell function Get-NSXEdgeHealth:

function Get-NSXEdgeHealth {
<#
.SYNOPSIS Gathers Health Status of a NSX Edge
.DESCRIPTION Will query NSX Manager for the health of a particular NSX Edge
.NOTES Author: Tomas Fojta
.PARAMETER NSXManager
The FQDN or IP of your NSX Manager
.PARAMETER Username
The username to connect with. Defaults to admin if nothing is provided.
.PARAMETER Password
The password to connect with
.PARAMETER EdgeId
ID of the Edge to gather health data for.
.EXAMPLE
PS> Get-NSXEdge -NSXManager nsxmgr.fqdn -Username admin -Password password -EdgeId EdgeId
#>
[CmdletBinding()]
param(
[Parameter(Mandatory=$true,Position=0)]
[String]$NSXManager,
[Parameter(Mandatory=$false,Position=1)]
[String]$Username = "admin",
[Parameter(Mandatory=$true)]
[String]$Password,
[Parameter(Mandatory=$true)]
[String]$EdgeId
)
Process {
### Ignore TLS/SSL errors
add-type @"
using System.Net;
using System.Security.Cryptography.X509Certificates;
public class TrustAllCertsPolicy : ICertificatePolicy {
public bool CheckValidationResult(
ServicePoint srvPoint, X509Certificate certificate,
WebRequest request, int certificateProblem) {
return true;
}
}
"@
[System.Net.ServicePointManager]::CertificatePolicy = New-Object TrustAllCertsPolicy
### Create authorization string and store in $head
$auth = [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($Username + ":" + $Password))
$head = @{"Authorization"="Basic $auth"}
$HealthRequest = "https://$NSXManager/api/3.0/edges"+"/"+$EdgeId+"/status"

$h = @{} | select Health, Detail

$r = Invoke-WebRequest -Uri $HealthRequest -Headers $head -ContentType "application/xml" -ErrorAction:Stop
[xml]$rxml = $r.Content
$h.Health = $rxml.edgeStatus.edgeStatus
$Details = @()
foreach ($Feature in $rxml.edgeStatus.featureStatuses.featureStatus)
{
$n = @{} | select Service, Status
$n.Service = $Feature.service
$n.Status = $Feature.status
$Details += $n
}
$h.Detail = $Details

return ,$h

} # End of process

} # End of function

PowerShell 3.0 or higher and (at least audit) credentials (and connectivity) to NSX Manager are needed.

Usage example:

Example

 

As can be seen the function needs the Edge ID parameter and then returns the overall Edge health and also detailed status of all its network services.

Following health states are defined:

  • green – good. This is the only state that guarantees that the Edge is functional.
  • red – no backing appliance is in service state
  • grey – unknown status (for example undeployed Edge)
  • yellow – intermittent health check failures (if more than 5 consecutive health checks fail the status goes to red)

Following function Get-NSXEdges will collect all Edges in the environment:

function Get-NSXEdges {
<#
.SYNOPSIS Gathers NSX Edges from NSX Manager
.DESCRIPTION Will inventory all of your NSX Edges from NSX Manager
.NOTES Author: Tomas Fojta
.PARAMETER NSXManager
The FQDN or IP of your NSX Manager
.PARAMETER Username
The username to connect with. Defaults to admin if nothing is provided.
.PARAMETER Password
The password to connect with
.EXAMPLE
PS> Get-NSXEdges -NSXManager nsxmgr.fqdn -Username admin -Password password
#>
[CmdletBinding()]
param(
[Parameter(Mandatory=$true,Position=0)]
[String]$NSXManager,
[Parameter(Mandatory=$false,Position=1)]
[String]$Username = "admin",
[Parameter(Mandatory=$true)]
[String]$Password
)

Process {
### Ignore TLS/SSL errors
add-type @"
using System.Net;
using System.Security.Cryptography.X509Certificates;
public class TrustAllCertsPolicy : ICertificatePolicy {
public bool CheckValidationResult(
ServicePoint srvPoint, X509Certificate certificate,
WebRequest request, int certificateProblem) {
return true;
}
}
"@
[System.Net.ServicePointManager]::CertificatePolicy = New-Object TrustAllCertsPolicy
### Create authorization string and store in $head
$auth = [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($Username + ":" + $Password))
$head = @{"Authorization"="Basic $auth"}
### Connect to NSX Manager via API
$Request = "https://$NSXManager/api/3.0/edges"
$r = Invoke-WebRequest -Uri ($Request+"?startIndex=0&pageSize=1") -Headers $head -ContentType "application/xml" -ErrorAction:Stop
$TotalCount = ([xml]$r).pagedEdgeList.edgePage.pagingInfo.totalCount

$r = Invoke-WebRequest -Uri ($Request+"?startIndex=0&pageSize="+$TotalCount) -Headers $head -ContentType "application/xml" -ErrorAction:Stop
[xml]$rxml = $r.Content

### Return the NSX Edges

$Edges = @()

foreach ($EdgeSummary in $rxml.pagedEdgeList.edgePage.edgeSummary)
{
$n = @{} | select Name,Id
$n.Name = $edgeSummary.Name
$n.Id = $edgeSummary.objectId
$Edges += $n
}
return ,$Edges

} # End of process

} # End of function

And here is a sample script leveraging both functions above that continuously displays health status of all the Edges in the environment and also displays the time needed to go through all of them.

$NSXManager = "nsx01.fojta.com"
$Username = "admin"
$Password = "default"

$AllEdges = Get-NSXEdges -NSXManager $NSXManager -Username $Username -Password $Password

DO
{
$StartTime = get-date
foreach ($Edge in $AllEdges)
{
$Health = Get-NSXEdgeHealth -NSXManager $NSXManager -Username $Username -Password $Password -EdgeId $Edge.Id
Write-Host $Edge.Name $Health.Health
}
$Duration = (get-date) - $StartTime
Write-Host
Write-Host "Duration:" $Duration.Minutes "Minutes" $Duration.Seconds "Seconds"
Write-Host
} While ($true)

In my lab it took at least cca 2 seconds to get status of an Edge (depending on the mode of communication and its actual health). It is obvious that most time NSX Manager spends on communication with the ESXi host – so the task should and can be parallelized. While running 5 sessions at the same time the retrieval of health status of one Edge went up to 3-4 seconds (for a green Edge) while the load on NSX Manager went up 14 % (I run NSX Manager only with 2 vCPUs in my lab).

Monitoring

While the article mentions only NSX the scripts should work also with vShield / vCloud Networking and Security Manager.

5 thoughts on “How to Monitor Health of NSX Edge Gateways

  1. Hey Tom…can you clarify this statement?

    During the Edge deployment the cluster where the Edge is deployed to must be prepared for NSX and without any issues.

    What issues are you referring to?

    1. Basically the cluster must be green in the installation / host preparation section in the NSX GUI.This means all the hosts in the cluster have the NSX vibs installed, have been rebooted (if necessary), etc.

  2. Is it possible to monitor the NSX Edge configuration changes like updating local router interface of NSX edge. If yes , could you please help me how to achieve this.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.