Collect vCloud Director Cell Logs with Log Insight Agent

vcenter-log-insight-logoWhile it is possible to redirect vCloud Director cell logs by editing log4j.properties file to remote syslog server (see KB 2004564) there is an alternative agent based method utilizing vRealize Log Insight.

Log Insight agent is installed on each cell and then remotely managed from Log Insight server. Here are some advantages of this approach:

  • no manual edits of log4j file which gets overwritten with each upgrade
  • as we do not rely log4j logger we are able to collect also API request log files which are generated by Jetty
  • agent uses reliable TCP communication as opposed to unreliable UDP
  • we no longer rely on source IP to identify sender; cells can use source NAT (with single IP) to communicate with Log Insight server and we can still distinguish them
  • we can remotely change which logs we want to monitor (info vs debug)
  • and much more

Here is quick configuration how to:

  1. Download Log Insight Agent from Log Insight Server. It is already customized installation for your vRLI server. Administration > Agents > scroll down > Download Log Insight Agent Version 3.6.0 > pick rpm package
  2. Upload rpm file to each cell and install it with rpm -i VMware-Log-Insight-Agent-3.6.0-4148343.noarch_XXX.rpm
  3. Back in Agents configuration create active agent group from vCloud Director Cell Server template (copy template icon)
  4. Create hostname filter (use ? for any character substitution, you can add multiple entries in one line for ‘logical or’ or multiple lines for ‘logical and’
  5. Optionally edit agent configuration to include additional files or directories

agent-config

PowerCLI Stops Working After NSX 6.2.4 Upgrade

NSX LBAs of NSX 6.2.3 TLS 1.0 support is deprecated on Edge Service Gateways. So if you are using load balancer with SSL offload, TLS 1.0 ciphers are no longer being supported and those clients that rely on them will not work anymore.

The supported ciphers can be easily checked with nmap. Here is nmap output to website behind NSX Edge 6.2.2 and 6.2.4 load balancer:

NSX 6.2.2 with TLS 1.0
NSX 6.2.2 with TLS 1.0
NSX 6.2.4 without TLS 1.0
NSX 6.2.4 without TLS 1.0

In my case PowerCLI stopped working and could not connect anymore to vCloud Director endpoint behind the Edge load balancer. The error was not very descriptive: The underlying connection was closed: An unexpected error occurred on a send.

PowerCLI Error
PowerCLI Error

Fortunately, it is possible to force PowerCLI to use TLS 1.1/1.2 by editing Windows Registry as described in the KB article: Enabling the TLSv1.1 and TLSv1.2 protocols for PowerCLI (2137109).

 

 

 

Gathering Health Status of vCloud Director Edge Gateways

Some time ago I wrote about how to monitor health of NSX Edge Gateways. In this blog post I will show how to get health and other info about vCloud Director Edge Gateways with PowerCLI.

PowerCLI already includes vCloud Director related cmdlets, unforunatelly there is none related to Edge Gateways. This can be easily remediated by using vCloud API however to get detailed information about Edge health we must use NSX API. As of vCloud Director 8.0 the service provider can easily get NSX Edge ID which is backing up particular vCloud Director Edge as a new type GatewayBacking was added.

What follows is an example of function that collects as much information as possible (interfaces, network services, size, syslog, default gateway, health of all services, Org, Org VDC and Provider VDC) about all Edge Gateways from PowerCLI, vCloud API and NSX API.

Note: there is dependency on the Get-NSXEdgeHealth function.

function Get-CIEdgeGateways {
<# .SYNOPSIS Gathers Edge Gateways from vCloud Director and all info through PowerCLI, vCloud API and NSX API .DESCRIPTION Will inventory all of your vCloud Director Edge Gateways .NOTES Author: Tomas Fojta #>
	[CmdletBinding()]
	param(
	[Parameter(Mandatory=$true,Position=0)]
	[String]$NSXManager,
	[Parameter(Mandatory=$false,Position=1)]
	[String]$NSXUsername = "admin",
	[Parameter(Mandatory=$true)]
	[String]$NSXPassword
	)

	$output = @();
	$EdgeGWs = Search-Cloud -QueryType EdgeGateway

	Foreach ($Edge in $EdgeGWs) {
		$Edgeview = $Edge | get-ciview
		$Vdc = get-OrgVdc -Id ($Edge.PropertyList.Vdc) -ErrorAction SilentlyContinue
		$webclient = New-Object system.net.webclient
		$webclient.Headers.Add("x-vcloud-authorization",$Edgeview.Client.SessionKey)
		$webclient.Headers.Add("accept",$EdgeView.Type + ";version=9.0")
		[xml]$EGWConfXML = $webclient.DownloadString($EdgeView.href)
		$n = "" | Select Name,Description,EdgeBacking,Interfaces,Firewall,NAT,LoadBalancer,DHCP,VPN,Routing,Syslog,Size,HA,DNSRelay,DefaultGateway,AdvancedNetworking, Org, TenantId, OrgVDC, OrgVDCId, ProviderVDC, ProviderVDCId, Health
		$n.Name = $EGWConfXML.EdgeGateway.Name
		$n.Description = $EGWConfXML.EdgeGateway.Description
		$n.EdgeBacking = $EGWConfXML.EdgeGateway.GatewayBackingRef.gatewayId
		$n.Interfaces = $EGWConfXML.EdgeGateway.Configuration.GatewayInterfaces.GatewayInterface
		$n.Firewall = $EGWConfXML.EdgeGateway.Configuration.EdgegatewayServiceConfiguration.FirewallService.FirewallRule
		$n.NAT = $EGWConfXML.EdgeGateway.Configuration.EdgegatewayServiceConfiguration.NatService.NatRule
		$n.LoadBalancer = $EGWConfXML.EdgeGateway.Configuration.EdgegatewayServiceConfiguration.LoadBalancerService.VirtualServer	
		$n.DHCP = $EGWConfXML.EdgeGateway.Configuration.EdgegatewayServiceConfiguration.GatewayDHCPService.Pool
		$n.VPN = $EGWConfXML.EdgeGateway.Configuration.EdgegatewayServiceConfiguration.GatewayIpsecVpnService
		$n.Routing = $EGWConfXML.EdgeGateway.Configuration.EdgeGatewayServiceConfiguration.StaticRoutingService
		$n.Syslog = $EGWConfXML.EdgeGateway.Configuration.SyslogServerSettings.TenantSyslogServerSettings.SyslogServerIp
		$n.Size = $EGWConfXML.EdgeGateway.Configuration.GatewayBackingConfig
		$n.HA = $EGWConfXML.EdgeGateway.Configuration.HaEnabled
		$n.DNSRelay = $EGWConfXML.EdgeGateway.Configuration.UseDefaultRouteForDnsRelay
		Foreach ($Interface in $n.Interfaces) {
			if ($Interface.UseForDefaultRoute -eq 'true') {$n.DefaultGateway = $Interface.SubnetParticipation.Gateway}
			}
		$n.AdvancedNetworking= $EGWConfXML.EdgeGateway.Configuration.HaEnabled = $EGWConfXML.EdgeGateway.Configuration.AdvancedNetworkingEnabled
		$n.Org = $Vdc.Org.Name
		$n.TenantId = $Vdc.Org.Id.Split(':')[3]
		$n.OrgVDC = $Vdc.Name
		$n.OrgVDCId = $Vdc.Id.Split(':')[3]
		$n.ProviderVDC = $Vdc.ProviderVDC.Name
		$n.ProviderVDCId = $Vdc.ProviderVDC.Id.Split(':')[3]
		$n.Health = Get-NSXEdgeHealth -NSXManager $NSXManager -Username $NSXUsername -Password $NSXPassword -EdgeID ($n.EdgeBacking)
		$Output += $n
		}
	return $Output
}

Automate Let’s Encrypt Certificate for NSX Edge Load Balancer

NSX LBI needed public certificate for my lab to avoid issues while testing certain libraries that did not allow untrusted connections or importing private Certificate Authority.

Fortunately, there is possibility to issue free public certificates with Let’s Encrypt certificate authority. These certificates are domain validated, which means you need to own the domain for which you issue the certificate. There are three methods how the validation is done but only one can be used in fully automated mode. Why the need for automation? The issued certificates are valid only for 90 days.

To validate the domain ownership you need to publish on publicly accessible web server under the certificate FQDN a specific generated verification string. You do not actually need to publish the service (and the NSX Edge load balancer) to the internet if you do not want to – I just set up a simple webserver with a sole purposed to complete the validation challenge.

So what is the high level process?

  1. Own a domain for which you want to have the certificate.
  2. Set up publicly accessible web server and point to it a DNS record with the certificate FQDN.
  3. Generate challenge string and place it on the web server.
  4. Validate the domain and obtain the certificates.
  5. Upload the certificates to your NSX Edge Load Balancer.
  6. In 60 days repeat from #3.

There are various ways how to automate steps 3-5. I have chosen to do this on Windows with PowerShell but the same could be accomplished on Linux as there are many Let’s Encrypt clients available to chose from.

On a Windows 2012 R2 Server I installed latest Powershell 5, IIS and ACMESharp with PowerShell gallery:

save-module -name ACMESharp
install-module -name ACMESharp

Then I wrote PowerShell script that first goes through the certificate generation and then using NSX API replaces certificate of a specific load balancer.

Note that you need to supply NSX Manager credentials, Edge ID which is running the load balancer, application profile ID which the web server uses (can be easily looked up in NSX UI) and email and domain for the Let’s Encrypt generation process.

Also be aware that Let’s Encrypt has rate limit on how many times a particular certificate can be issued within 7 day period (currently 20).

$Username = "admin"
$Password = "default"
$NSXManager = "nsx01.fojta.com"
$LBEdge = 'edge-1'
$ApplicationProfile = 'applicationProfile-1'
$Email = "mailto:user@example.com"
$Domain = "domain.example.com"


## Generate random alias
$IdentAlias = 'Ident_'+([guid]::NewGuid()).ToString()
$CertAlias = 'Cert_'+([guid]::NewGuid()).ToString()

## Remove and rename old files
If (Test-Path D:\LetsEncrypt\issuer.crt.old) {Remove-Item D:\LetsEncrypt\issuer.crt.old}
If (Test-Path D:\LetsEncrypt\cert.key.old) {Remove-Item D:\LetsEncrypt\cert.key.old}
If (Test-Path D:\LetsEncrypt\cert.crt.old) {Remove-Item D:\LetsEncrypt\cert.crt.old}

If (Test-Path D:\LetsEncrypt\issuer.crt) {Rename-Item D:\LetsEncrypt\issuer.crt D:\LetsEncrypt\issuer.crt.old}
If (Test-Path D:\LetsEncrypt\cert.key) {Rename-Item D:\LetsEncrypt\cert.key D:\LetsEncrypt\cert.key.old}
If (Test-Path D:\LetsEncrypt\cert.crt) {Rename-Item D:\LetsEncrypt\cert.crt D:\LetsEncrypt\cert.crt.old}

## Let's Encrypt specific code from https://github.com/ebekker/ACMESharp/wiki/Quick-Start
Import-Module ACMESharp
Initialize-ACMEVault -ErrorAction SilentlyContinue
New-ACMERegistration -Contacts $Email -AcceptTos
New-ACMEIdentifier -Dns $Domain -alias $IdentAlias
Complete-ACMEChallenge $IdentAlias -ChallengeType http-01 -Handler iis -HandlerParameters @{ WebSiteRef = 'Default Web Site' }
Submit-ACMEChallenge $IdentAlias -ChallengeType http-01

$Status = "pending"
Do {
	Start-Sleep -s 5
	$Status = ((Update-ACMEIdentifier $Alias -ChallengeType http-01).Challenges | Where-Object {$_.Type -eq "http-01"}).Status
	}
Until ($Status = "valid")


New-ACMECertificate $IdentAlias -Generate -Alias $CertAlias
Submit-ACMECertificate $CertAlias
Get-ACMECertificate $CertAlias -ExportCertificatePEM D:\LetsEncrypt\cert.crt
Get-ACMECertificate $CertAlias -ExportKeyPEM D:\LetsEncrypt\cert.key
Update-ACMECertificate $CertAlias
Get-ACMECertificate $CertAlias -ExportIssuerPEM D:\LetsEncrypt\issuer.crt


$IssuerCert = [IO.File]::ReadAllText("D:\LetsEncrypt\issuer.crt")
$PrivateKey = [IO.File]::ReadAllText("D:\LetsEncrypt\cert.key")
$LBCertificate = [IO.File]::ReadAllText("D:\LetsEncrypt\cert.crt")

## Calculate Issuer Cert Thumbprint
$IssuerCertThumbprint = (Get-PfxCertificate -filepath D:\LetsEncrypt\issuer.crt).Thumbprint.ToLower()

## Create authorization string and store in $head
$auth = [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($Username + ":" + $Password))
$head = @{"Authorization"="Basic $auth"}

## Get all Edge certificates
$Uri = "https://$NSXManager/api/2.0/services/truststore/certificate/scope/" + $LBEdge
$r = Invoke-WebRequest -URI $Uri -Method Get -Headers $head -ContentType "application/xml" -ErrorAction:Stop
[xml]$sxml = $r.Content

## Find if Issuer Certificate already exists
$exists = $false
foreach ($Certificate in $sxml.certificates.certificate) {
	$Thumbprint = $Certificate.x509Certificate.sha1Hash -replace '[:]'
	if ($Thumbprint -eq $IssuerCertThumbprint) { $exists = $true }
	}

##Upload issuer certificate if it does not exist
if (-Not $exists) {
	$Uri = "https://$NSXManager/api/2.0/services/truststore/certificate/" + $LBEdge
	$Body = "
<trustObject>
 <pemEncoding>" +$IssuerCert+ "</pemEncoding>
 <description>Issuer Certificate</description>
</trustObject>"
	$r = Invoke-WebRequest -URI $Uri -Method Post -Headers $head -ContentType "application/xml" -Body $Body -ErrorAction:Stop
	$IssuerId = ([xml]$r).certificates.certificate.objectId
	}
	
##Upload certificate
$Uri = "https://$NSXManager/api/2.0/services/truststore/certificate/" + $LBEdge
$Body = "
<trustObject>
 <pemEncoding>" + $LBCertificate + "</pemEncoding>
 <privateKey>" + $PrivateKey + "</privateKey> 
 <description>vCloud Certificate</description>
</trustObject>"
$r = Invoke-WebRequest -URI $Uri -Method Post -Headers $head -ContentType "application/xml" -Body $Body -ErrorAction:Stop
$NewCertificateId = ([xml]$r).certificates.certificate.objectId

##Replace certificate in the application profile
$Uri = "https://$NSXManager/api/4.0/edges/" + $LBEdge + "/loadbalancer/config/applicationprofiles/" + $ApplicationProfile
$r = Invoke-WebRequest -URI $Uri -Method Get -Headers $head -ContentType "application/xml" -ErrorAction:Stop
[xml]$sxml = $r.Content
$OldCertificateId = $sxml.applicationProfile.clientSsl.serviceCertificate
$sxml.applicationProfile.clientSsl.serviceCertificate = $NewCertificateId
$r = Invoke-WebRequest -Uri $Uri -Method Put -Headers $head -ContentType "application/xml" -Body $sxml.OuterXML -ErrorAction:Stop

##Delete old certificate from the Edge
$Uri = "https://$NSXManager/api/2.0/services/truststore/certificate/" + $OldCertificateId
$r = Invoke-WebRequest -URI $Uri -Method Delete -Headers $head -ContentType "application/xml" -ErrorAction:Stop


Edge Gateway Deployment Speed in vCloud Director 8.10

Edge GatewayIn vCloud Director 8.10 there is massive improvement in deployment (and configuration) speed of Edge Gateways. This is especially noticeable in use cases where large number of routed vApps are provisioned in as short time as possible – for example nightly builds for testing, or labs for training purposes. But this is also important for customer onboarding – time to login to cloud VM from the swipe of the credit card SLA.

Theory

How is the speed improvement achieved? It is actually not really vCloud Director accomplishment. The deployment and configuration of Edge Gateways were always done by vShield or NSX Manager. However, there is a big difference how vShield Manager and NSX Manager communicate with the Edge Gateway to push its configuration (IP addresses, NAT, firewall and other network services configurations).

As the Edge Gateway can be deployed to any network which can be completely isolated from any external traffic, its configuration cannot be done over the network and instead out-of-band communication channel must be used. vShield Manager always used VIX API (Guest Operations API) which involves communication with vCenter Server, hostd process on ESXi host hosting the Edge Gateway VM and finally VMware Tools running in the Edge Gateway VM (see this older post for more detail).

NSX Manager uses different mechanism. As long as the ESXi host is properly prepared for NSX, message bus communication between the NSX Manager and vsfwd user space process on the ESXi host is established. Additionally the configuration to the Edge Gateway VM is done via VMCI channel.

Prerequisites

There are necessary prerequisites to use the faster message bus communication as opposed to VIX API. If any of these is not fulfilled the communication mechanism fails back to VIX API.

  • The host running the Edge Gateway must be prepared for NSX. So if you are in vCloud Director using solely VLAN (or even VCDNI) backed network pools and you skipped the NSX preparation of underlying clusters, message bus communication cannot be used as the host is missing the NSX VIBs and vsfwd process.
  • The Edge Gateway must be version 6.x. It cannot be the legacy Edge version 5.5 deployed by older vCloud Director releases (8.0, 5.6, etc.). vCloud Director 8.10 deploys Edge Gateway version 6.x however existing Edges deployed before upgrade to 8.10 must be redeployed in vCloud Director or upgraded in NSX (read this whitepaper for a script to do it at once).
  • Obviously NSX Manager must be used (as opposed to vShield Manager) – anyway vCloud Networking and Security is not supported with vCloud Director 8.10 anymore.

Performance Testing

I have done quick proof of concept testing to see what is the relative improvement between the older and newer deployment mechanism.

I used 3 different combinations of the same environment (I was upgrading from one combination to the other).

  • vCloud Director 5.6.5 + vCloud Networking and Security 5.5.4
  • vCloud Director 8.0.1 + NSX 6.2.3 (uses legacy Edges)
  • vCloud Director 8.10 + NSX 6.2.3 (uses NSX Edges)

All 3 combinations used the same hardware and the same vSphere environment (5.5) with nested ESXi hosts. So the point is to look at the relative differences as opposed to absolute deployment times.

I measured in PowerCLI sequential deployment speed of 10 vApps with one isolated network and 10 vApps with one routed network with multiple runs to calculate average per one vApp. The first scenario was to measure differences in provisioning speeds of VXLAN logical switches to see impact of controller based control plane mode. The second includes provisioning of an Edge Gateway and logical switch. The vApps were otherwise empty (no VMs).

Note; If you want to do similar test in your environment, I captured the two empty vApps with only the routed or isolated networks to a catalog with vCloud API (PowerCLI) as it cannot be done from vCloud UI.

Here are the average deployment times of each vApp.

vCloud Director 5.6.5 + vCloud Networking and Security 5.5.4

  • Isolated 5-5.5 seconds
  • Routed 2:17 min

vCloud Director 8.0.1 + NSX 6.2.3

  • Isolated cca 6.8 seconds (Multicast), 7.5 seconds (Unicast)
  • Routed 2:20 min

vCloud Director 8.10 + NSX 6.2.3

  • Isolated 7.7 s (Multicast), 8.1 s (Unicast)
  • Routed 1:35 min

While the speed of logical switch provisioning goes little bit down with NSX and with Unicast control plane mode, the Edge Gateway deployment gets massive boost with NSX and VCD 8.10. While the OVF deployment of NSX Edge takes little bit longer (from 20 to 30 s) it is the configuration that makes up for it (from way over a minute down to about 30 s).

Just for comparison here are the tasks done during deployment of each routed vApp as reported by vSphere Client Recent Task window.

vCloud Director 5.6.5 + vCloud Networking and Security
vCloud Director 5.6.5 + vCloud Networking and Security
vCloud Director 8.10 + NSX 6.2.3
vCloud Director 8.10 + NSX 6.2.3