Gathering Health Status of vCloud Director Edge Gateways

Some time ago I wrote about how to monitor health of NSX Edge Gateways. In this blog post I will show how to get health and other info about vCloud Director Edge Gateways with PowerCLI.

PowerCLI already includes vCloud Director related cmdlets, unforunatelly there is none related to Edge Gateways. This can be easily remediated by using vCloud API however to get detailed information about Edge health we must use NSX API. As of vCloud Director 8.0 the service provider can easily get NSX Edge ID which is backing up particular vCloud Director Edge as a new type GatewayBacking was added.

What follows is an example of function that collects as much information as possible (interfaces, network services, size, syslog, default gateway, health of all services, Org, Org VDC and Provider VDC) about all Edge Gateways from PowerCLI, vCloud API and NSX API.

Note: there is dependency on the Get-NSXEdgeHealth function.

function Get-CIEdgeGateways {
<# .SYNOPSIS Gathers Edge Gateways from vCloud Director and all info through PowerCLI, vCloud API and NSX API .DESCRIPTION Will inventory all of your vCloud Director Edge Gateways .NOTES Author: Tomas Fojta #>
	[CmdletBinding()]
	param(
	[Parameter(Mandatory=$true,Position=0)]
	[String]$NSXManager,
	[Parameter(Mandatory=$false,Position=1)]
	[String]$NSXUsername = "admin",
	[Parameter(Mandatory=$true)]
	[String]$NSXPassword
	)

	$output = @();
	$EdgeGWs = Search-Cloud -QueryType EdgeGateway

	Foreach ($Edge in $EdgeGWs) {
		$Edgeview = $Edge | get-ciview
		$Vdc = get-OrgVdc -Id ($Edge.PropertyList.Vdc) -ErrorAction SilentlyContinue
		$webclient = New-Object system.net.webclient
		$webclient.Headers.Add("x-vcloud-authorization",$Edgeview.Client.SessionKey)
		$webclient.Headers.Add("accept",$EdgeView.Type + ";version=9.0")
		[xml]$EGWConfXML = $webclient.DownloadString($EdgeView.href)
		$n = "" | Select Name,Description,EdgeBacking,Interfaces,Firewall,NAT,LoadBalancer,DHCP,VPN,Routing,Syslog,Size,HA,DNSRelay,DefaultGateway,AdvancedNetworking, Org, TenantId, OrgVDC, OrgVDCId, ProviderVDC, ProviderVDCId, Health
		$n.Name = $EGWConfXML.EdgeGateway.Name
		$n.Description = $EGWConfXML.EdgeGateway.Description
		$n.EdgeBacking = $EGWConfXML.EdgeGateway.GatewayBackingRef.gatewayId
		$n.Interfaces = $EGWConfXML.EdgeGateway.Configuration.GatewayInterfaces.GatewayInterface
		$n.Firewall = $EGWConfXML.EdgeGateway.Configuration.EdgegatewayServiceConfiguration.FirewallService.FirewallRule
		$n.NAT = $EGWConfXML.EdgeGateway.Configuration.EdgegatewayServiceConfiguration.NatService.NatRule
		$n.LoadBalancer = $EGWConfXML.EdgeGateway.Configuration.EdgegatewayServiceConfiguration.LoadBalancerService.VirtualServer	
		$n.DHCP = $EGWConfXML.EdgeGateway.Configuration.EdgegatewayServiceConfiguration.GatewayDHCPService.Pool
		$n.VPN = $EGWConfXML.EdgeGateway.Configuration.EdgegatewayServiceConfiguration.GatewayIpsecVpnService
		$n.Routing = $EGWConfXML.EdgeGateway.Configuration.EdgeGatewayServiceConfiguration.StaticRoutingService
		$n.Syslog = $EGWConfXML.EdgeGateway.Configuration.SyslogServerSettings.TenantSyslogServerSettings.SyslogServerIp
		$n.Size = $EGWConfXML.EdgeGateway.Configuration.GatewayBackingConfig
		$n.HA = $EGWConfXML.EdgeGateway.Configuration.HaEnabled
		$n.DNSRelay = $EGWConfXML.EdgeGateway.Configuration.UseDefaultRouteForDnsRelay
		Foreach ($Interface in $n.Interfaces) {
			if ($Interface.UseForDefaultRoute -eq 'true') {$n.DefaultGateway = $Interface.SubnetParticipation.Gateway}
			}
		$n.AdvancedNetworking= $EGWConfXML.EdgeGateway.Configuration.HaEnabled = $EGWConfXML.EdgeGateway.Configuration.AdvancedNetworkingEnabled
		$n.Org = $Vdc.Org.Name
		$n.TenantId = $Vdc.Org.Id.Split(':')[3]
		$n.OrgVDC = $Vdc.Name
		$n.OrgVDCId = $Vdc.Id.Split(':')[3]
		$n.ProviderVDC = $Vdc.ProviderVDC.Name
		$n.ProviderVDCId = $Vdc.ProviderVDC.Id.Split(':')[3]
		$n.Health = Get-NSXEdgeHealth -NSXManager $NSXManager -Username $NSXUsername -Password $NSXPassword -EdgeID ($n.EdgeBacking)
		$Output += $n
		}
	return $Output
}

Automate Let’s Encrypt Certificate for NSX Edge Load Balancer

NSX LBI needed public certificate for my lab to avoid issues while testing certain libraries that did not allow untrusted connections or importing private Certificate Authority.

Fortunately, there is possibility to issue free public certificates with Let’s Encrypt certificate authority. These certificates are domain validated, which means you need to own the domain for which you issue the certificate. There are three methods how the validation is done but only one can be used in fully automated mode. Why the need for automation? The issued certificates are valid only for 90 days.

To validate the domain ownership you need to publish on publicly accessible web server under the certificate FQDN a specific generated verification string. You do not actually need to publish the service (and the NSX Edge load balancer) to the internet if you do not want to – I just set up a simple webserver with a sole purposed to complete the validation challenge.

So what is the high level process?

  1. Own a domain for which you want to have the certificate.
  2. Set up publicly accessible web server and point to it a DNS record with the certificate FQDN.
  3. Generate challenge string and place it on the web server.
  4. Validate the domain and obtain the certificates.
  5. Upload the certificates to your NSX Edge Load Balancer.
  6. In 60 days repeat from #3.

There are various ways how to automate steps 3-5. I have chosen to do this on Windows with PowerShell but the same could be accomplished on Linux as there are many Let’s Encrypt clients available to chose from.

On a Windows 2012 R2 Server I installed latest Powershell 5, IIS and ACMESharp with PowerShell gallery:

save-module -name ACMESharp
install-module -name ACMESharp

Then I wrote PowerShell script that first goes through the certificate generation and then using NSX API replaces certificate of a specific load balancer.

Note that you need to supply NSX Manager credentials, Edge ID which is running the load balancer, application profile ID which the web server uses (can be easily looked up in NSX UI) and email and domain for the Let’s Encrypt generation process.

Also be aware that Let’s Encrypt has rate limit on how many times a particular certificate can be issued within 7 day period (currently 20).

$Username = "admin"
$Password = "default"
$NSXManager = "nsx01.fojta.com"
$LBEdge = 'edge-1'
$ApplicationProfile = 'applicationProfile-1'
$Email = "mailto:user@example.com"
$Domain = "domain.example.com"


## Generate random alias
$IdentAlias = 'Ident_'+([guid]::NewGuid()).ToString()
$CertAlias = 'Cert_'+([guid]::NewGuid()).ToString()

## Remove and rename old files
If (Test-Path D:\LetsEncrypt\issuer.crt.old) {Remove-Item D:\LetsEncrypt\issuer.crt.old}
If (Test-Path D:\LetsEncrypt\cert.key.old) {Remove-Item D:\LetsEncrypt\cert.key.old}
If (Test-Path D:\LetsEncrypt\cert.crt.old) {Remove-Item D:\LetsEncrypt\cert.crt.old}

If (Test-Path D:\LetsEncrypt\issuer.crt) {Rename-Item D:\LetsEncrypt\issuer.crt D:\LetsEncrypt\issuer.crt.old}
If (Test-Path D:\LetsEncrypt\cert.key) {Rename-Item D:\LetsEncrypt\cert.key D:\LetsEncrypt\cert.key.old}
If (Test-Path D:\LetsEncrypt\cert.crt) {Rename-Item D:\LetsEncrypt\cert.crt D:\LetsEncrypt\cert.crt.old}

## Let's Encrypt specific code from https://github.com/ebekker/ACMESharp/wiki/Quick-Start
Import-Module ACMESharp
Initialize-ACMEVault -ErrorAction SilentlyContinue
New-ACMERegistration -Contacts $Email -AcceptTos
New-ACMEIdentifier -Dns $Domain -alias $IdentAlias
Complete-ACMEChallenge $IdentAlias -ChallengeType http-01 -Handler iis -HandlerParameters @{ WebSiteRef = 'Default Web Site' }
Submit-ACMEChallenge $IdentAlias -ChallengeType http-01

$Status = "pending"
Do {
	Start-Sleep -s 5
	$Status = ((Update-ACMEIdentifier $Alias -ChallengeType http-01).Challenges | Where-Object {$_.Type -eq "http-01"}).Status
	}
Until ($Status = "valid")


New-ACMECertificate $IdentAlias -Generate -Alias $CertAlias
Submit-ACMECertificate $CertAlias
Get-ACMECertificate $CertAlias -ExportCertificatePEM D:\LetsEncrypt\cert.crt
Get-ACMECertificate $CertAlias -ExportKeyPEM D:\LetsEncrypt\cert.key
Update-ACMECertificate $CertAlias
Get-ACMECertificate $CertAlias -ExportIssuerPEM D:\LetsEncrypt\issuer.crt


$IssuerCert = [IO.File]::ReadAllText("D:\LetsEncrypt\issuer.crt")
$PrivateKey = [IO.File]::ReadAllText("D:\LetsEncrypt\cert.key")
$LBCertificate = [IO.File]::ReadAllText("D:\LetsEncrypt\cert.crt")

## Calculate Issuer Cert Thumbprint
$IssuerCertThumbprint = (Get-PfxCertificate -filepath D:\LetsEncrypt\issuer.crt).Thumbprint.ToLower()

## Create authorization string and store in $head
$auth = [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($Username + ":" + $Password))
$head = @{"Authorization"="Basic $auth"}

## Get all Edge certificates
$Uri = "https://$NSXManager/api/2.0/services/truststore/certificate/scope/" + $LBEdge
$r = Invoke-WebRequest -URI $Uri -Method Get -Headers $head -ContentType "application/xml" -ErrorAction:Stop
[xml]$sxml = $r.Content

## Find if Issuer Certificate already exists
$exists = $false
foreach ($Certificate in $sxml.certificates.certificate) {
	$Thumbprint = $Certificate.x509Certificate.sha1Hash -replace '[:]'
	if ($Thumbprint -eq $IssuerCertThumbprint) { $exists = $true }
	}

##Upload issuer certificate if it does not exist
if (-Not $exists) {
	$Uri = "https://$NSXManager/api/2.0/services/truststore/certificate/" + $LBEdge
	$Body = "
<trustObject>
 <pemEncoding>" +$IssuerCert+ "</pemEncoding>
 <description>Issuer Certificate</description>
</trustObject>"
	$r = Invoke-WebRequest -URI $Uri -Method Post -Headers $head -ContentType "application/xml" -Body $Body -ErrorAction:Stop
	$IssuerId = ([xml]$r).certificates.certificate.objectId
	}
	
##Upload certificate
$Uri = "https://$NSXManager/api/2.0/services/truststore/certificate/" + $LBEdge
$Body = "
<trustObject>
 <pemEncoding>" + $LBCertificate + "</pemEncoding>
 <privateKey>" + $PrivateKey + "</privateKey> 
 <description>vCloud Certificate</description>
</trustObject>"
$r = Invoke-WebRequest -URI $Uri -Method Post -Headers $head -ContentType "application/xml" -Body $Body -ErrorAction:Stop
$NewCertificateId = ([xml]$r).certificates.certificate.objectId

##Replace certificate in the application profile
$Uri = "https://$NSXManager/api/4.0/edges/" + $LBEdge + "/loadbalancer/config/applicationprofiles/" + $ApplicationProfile
$r = Invoke-WebRequest -URI $Uri -Method Get -Headers $head -ContentType "application/xml" -ErrorAction:Stop
[xml]$sxml = $r.Content
$OldCertificateId = $sxml.applicationProfile.clientSsl.serviceCertificate
$sxml.applicationProfile.clientSsl.serviceCertificate = $NewCertificateId
$r = Invoke-WebRequest -Uri $Uri -Method Put -Headers $head -ContentType "application/xml" -Body $sxml.OuterXML -ErrorAction:Stop

##Delete old certificate from the Edge
$Uri = "https://$NSXManager/api/2.0/services/truststore/certificate/" + $OldCertificateId
$r = Invoke-WebRequest -URI $Uri -Method Delete -Headers $head -ContentType "application/xml" -ErrorAction:Stop


Edge Gateway Deployment Speed in vCloud Director 8.10

Edge GatewayIn vCloud Director 8.10 there is massive improvement in deployment (and configuration) speed of Edge Gateways. This is especially noticeable in use cases where large number of routed vApps are provisioned in as short time as possible – for example nightly builds for testing, or labs for training purposes. But this is also important for customer onboarding – time to login to cloud VM from the swipe of the credit card SLA.

Theory

How is the speed improvement achieved? It is actually not really vCloud Director accomplishment. The deployment and configuration of Edge Gateways were always done by vShield or NSX Manager. However, there is a big difference how vShield Manager and NSX Manager communicate with the Edge Gateway to push its configuration (IP addresses, NAT, firewall and other network services configurations).

As the Edge Gateway can be deployed to any network which can be completely isolated from any external traffic, its configuration cannot be done over the network and instead out-of-band communication channel must be used. vShield Manager always used VIX API (Guest Operations API) which involves communication with vCenter Server, hostd process on ESXi host hosting the Edge Gateway VM and finally VMware Tools running in the Edge Gateway VM (see this older post for more detail).

NSX Manager uses different mechanism. As long as the ESXi host is properly prepared for NSX, message bus communication between the NSX Manager and vsfwd user space process on the ESXi host is established. Additionally the configuration to the Edge Gateway VM is done via VMCI channel.

Prerequisites

There are necessary prerequisites to use the faster message bus communication as opposed to VIX API. If any of these is not fulfilled the communication mechanism fails back to VIX API.

  • The host running the Edge Gateway must be prepared for NSX. So if you are in vCloud Director using solely VLAN (or even VCDNI) backed network pools and you skipped the NSX preparation of underlying clusters, message bus communication cannot be used as the host is missing the NSX VIBs and vsfwd process.
  • The Edge Gateway must be version 6.x. It cannot be the legacy Edge version 5.5 deployed by older vCloud Director releases (8.0, 5.6, etc.). vCloud Director 8.10 deploys Edge Gateway version 6.x however existing Edges deployed before upgrade to 8.10 must be redeployed in vCloud Director or upgraded in NSX (read this whitepaper for a script to do it at once).
  • Obviously NSX Manager must be used (as opposed to vShield Manager) – anyway vCloud Networking and Security is not supported with vCloud Director 8.10 anymore.

Performance Testing

I have done quick proof of concept testing to see what is the relative improvement between the older and newer deployment mechanism.

I used 3 different combinations of the same environment (I was upgrading from one combination to the other).

  • vCloud Director 5.6.5 + vCloud Networking and Security 5.5.4
  • vCloud Director 8.0.1 + NSX 6.2.3 (uses legacy Edges)
  • vCloud Director 8.10 + NSX 6.2.3 (uses NSX Edges)

All 3 combinations used the same hardware and the same vSphere environment (5.5) with nested ESXi hosts. So the point is to look at the relative differences as opposed to absolute deployment times.

I measured in PowerCLI sequential deployment speed of 10 vApps with one isolated network and 10 vApps with one routed network with multiple runs to calculate average per one vApp. The first scenario was to measure differences in provisioning speeds of VXLAN logical switches to see impact of controller based control plane mode. The second includes provisioning of an Edge Gateway and logical switch. The vApps were otherwise empty (no VMs).

Note; If you want to do similar test in your environment, I captured the two empty vApps with only the routed or isolated networks to a catalog with vCloud API (PowerCLI) as it cannot be done from vCloud UI.

Here are the average deployment times of each vApp.

vCloud Director 5.6.5 + vCloud Networking and Security 5.5.4

  • Isolated 5-5.5 seconds
  • Routed 2:17 min

vCloud Director 8.0.1 + NSX 6.2.3

  • Isolated cca 6.8 seconds (Multicast), 7.5 seconds (Unicast)
  • Routed 2:20 min

vCloud Director 8.10 + NSX 6.2.3

  • Isolated 7.7 s (Multicast), 8.1 s (Unicast)
  • Routed 1:35 min

While the speed of logical switch provisioning goes little bit down with NSX and with Unicast control plane mode, the Edge Gateway deployment gets massive boost with NSX and VCD 8.10. While the OVF deployment of NSX Edge takes little bit longer (from 20 to 30 s) it is the configuration that makes up for it (from way over a minute down to about 30 s).

Just for comparison here are the tasks done during deployment of each routed vApp as reported by vSphere Client Recent Task window.

vCloud Director 5.6.5 + vCloud Networking and Security

vCloud Director 5.6.5 + vCloud Networking and Security

vCloud Director 8.10 + NSX 6.2.3

vCloud Director 8.10 + NSX 6.2.3

vCloud Networking and Security Upgrade to NSX in vCloud Director Environments – Update

vCNS to NSXIn April I wrote whitepaper describing all considerations that need to be taken when upgrading vCloud Networking And Security to NSX in vCloud Director Environments. I have updated the whitepaper to include additional information related to new releases:

  • updates related to vCloud Director 8.10 release
  • update related to VMware NSX 6.2.3 release
  • updates related to vCenter Chargeback Manager 2.7.1 release
  • NSX Edge Gateway upgrade script example
  • extended upgrade scenario to include vCloud Director 8.10

The whitepaper will be posted later this month on vCloud Architecture Toolkit for Service Providers website, until then it can be downloaded from the link below.

Edit 7/17/2016: New vCAT website is uphttp://www.vmware.com/solutions/cloud-computing/vcat-sp.html

VMware vCloud Networking and Security to VMware NSX Upgrade v2.1.pdf … link

vCenter Chargeback Manager Notes

ChargebackThis blog post summarizes up-to-date (July 2016) information related to vCenter Chargeback Manager.

Chargeback Manager (CBM) is available only for service providers. Although in past its end of support was announced it was extended until end of 2017. The reason for extension is to provide more time to service providers to migrated to its successor – vRealize Business for Cloud Advanced (vRB)

Both CBM and vRB were removed from all but standard vCloud Service Provider Bundles (with the exception for current users) and can be licensed separately. The reason was to give partners more choice which metering tool to use.

  • The latest CBM version is 2.7.1 and is downloadable from here. Note that this version has following support limitations:
    • vSphere 6 is not supported – this is due to new storage APIs introduced in vSphere 6
    • Both vCloud Networking and Security and NSX are supported (in the vCloud Director context)
    • vCloud Director 8.10 is not supported (see the next bullet point)
  • CBM patch for vCloud Director 8.10 was released in the following KB: https://kb.vmware.com/kb/2146041. It replaces one JAR file of the vCloud Director data collector to properly identify new vCloud API versions. It is also backward compatible with older vCloud Director versions.
  • vSphere 6 support is expected in the next release of CBM later this year

In case you are upgrading from CBM 2.7.0 to 2.7.1 here are some notes and considerations:

  • CBM 2.7.0 and older was 32 bit application. CBM 2.7.1 is 64 bit application.
  • There are two ways how to upgrade CBM. You can either do in-place upgrade, or you can uninstall CBM and install fresh binaries while using the same Chargeback database
    • In-place upgrade:
      • Make sure you shut down all services before running the installer (vCenter-CB.exe)
      • Always run the installer with the Run as administrator option.
      • Even though CBM 2.7.1 is 64 bit application it will still be located in the default 32bit Program Files (x86) folder.
    • Fresh install:
      • Uninstall CBM by running Uninstall.bat located in the …\VMware vCenter Chargeback\Uninstall_VMware vCenter Chargeback folder. If non-embedded collectors were used, uninstall them in similar way as well.
      • Reboot and install new binaries while pointing to the same CBM database. Remember to use Run ad administrator option when executing the installer (vCenter-CB.exe)
      • The installer does not remove old collectors from the database. Wait a few minutes until they will be identified as failed (red X in CBM System Health, Data Collectors UI), note their IDs and then manually remove them from CBM database – CB_DC_STATUS table. You can locate them by their ID in the DC_ID column.
      • CBM 2.7.1 will be located by default in the 64 bit Program Files folder.