While it is possible to redirect vCloud Director cell logs by editing log4j.properties file to remote syslog server (see KB 2004564) there is an alternative agent based method utilizing vRealize Log Insight.
Log Insight agent is installed on each cell and then remotely managed from Log Insight server. Here are some advantages of this approach:
no manual edits of log4j file which gets overwritten with each upgrade
as we do not rely log4j logger we are able to collect also API request log files which are generated by Jetty
agent uses reliable TCP communication as opposed to unreliable UDP
we no longer rely on source IP to identify sender; cells can use source NAT (with single IP) to communicate with Log Insight server and we can still distinguish them
we can remotely change which logs we want to monitor (info vs debug)
Download Log Insight Agent from Log Insight Server. It is already customized installation for your vRLI server. Administration > Agents > scroll down > Download Log Insight Agent Version 3.6.0 > pick rpm package
Upload rpm file to each cell and install it with rpm -i VMware-Log-Insight-Agent-3.6.0-4148343.noarch_XXX.rpm
Back in Agents configuration create active agent group from vCloud Director Cell Server template (copy template icon)
Create hostname filter (use ? for any character substitution, you can add multiple entries in one line for ‘logical or’ or multiple lines for ‘logical and’
Optionally edit agent configuration to include additional files or directories
As of NSX 6.2.3 TLS 1.0 support is deprecated on Edge Service Gateways. So if you are using load balancer with SSL offload, TLS 1.0 ciphers are no longer being supported and those clients that rely on them will not work anymore.
The supported ciphers can be easily checked with nmap. Here is nmap output to website behind NSX Edge 6.2.2 and 6.2.4 load balancer:
In my case PowerCLI stopped working and could not connect anymore to vCloud Director endpoint behind the Edge load balancer. The error was not very descriptive: The underlying connection was closed: An unexpected error occurred on a send.
Some time ago I wrote about how to monitor health of NSX Edge Gateways. In this blog post I will show how to get health and other info about vCloud Director Edge Gateways with PowerCLI.
PowerCLI already includes vCloud Director related cmdlets, unforunatelly there is none related to Edge Gateways. This can be easily remediated by using vCloud API however to get detailed information about Edge health we must use NSX API. As of vCloud Director 8.0 the service provider can easily get NSX Edge ID which is backing up particular vCloud Director Edge as a new type GatewayBacking was added.
What follows is an example of function that collects as much information as possible (interfaces, network services, size, syslog, default gateway, health of all services, Org, Org VDC and Provider VDC) about all Edge Gateways from PowerCLI, vCloud API and NSX API.
I needed public certificate for my lab to avoid issues while testing certain libraries that did not allow untrusted connections or importing private Certificate Authority.
Fortunately, there is possibility to issue free public certificates with Let’s Encrypt certificate authority. These certificates are domain validated, which means you need to own the domain for which you issue the certificate. There are three methods how the validation is done but only one can be used in fully automated mode. Why the need for automation? The issued certificates are valid only for 90 days.
To validate the domain ownership you need to publish on publicly accessible web server under the certificate FQDN a specific generated verification string. You do not actually need to publish the service (and the NSX Edge load balancer) to the internet if you do not want to – I just set up a simple webserver with a sole purposed to complete the validation challenge.
So what is the high level process?
Own a domain for which you want to have the certificate.
Set up publicly accessible web server and point to it a DNS record with the certificate FQDN.
Generate challenge string and place it on the web server.
Validate the domain and obtain the certificates.
Upload the certificates to your NSX Edge Load Balancer.
In 60 days repeat from #3.
There are various ways how to automate steps 3-5. I have chosen to do this on Windows with PowerShell but the same could be accomplished on Linux as there are many Let’s Encrypt clients available to chose from.
On a Windows 2012 R2 Server I installed latest Powershell 5, IIS and ACMESharp with PowerShell gallery:
Then I wrote PowerShell script that first goes through the certificate generation and then using NSX API replaces certificate of a specific load balancer.
Note that you need to supply NSX Manager credentials, Edge ID which is running the load balancer, application profile ID which the web server uses (can be easily looked up in NSX UI) and email and domain for the Let’s Encrypt generation process.
Also be aware that Let’s Encrypt has rate limit on how many times a particular certificate can be issued within 7 day period (currently 20).
In vCloud Director 8.10 there is massive improvement in deployment (and configuration) speed of Edge Gateways. This is especially noticeable in use cases where large number of routed vApps are provisioned in as short time as possible – for example nightly builds for testing, or labs for training purposes. But this is also important for customer onboarding – time to login to cloud VM from the swipe of the credit card SLA.
How is the speed improvement achieved? It is actually not really vCloud Director accomplishment. The deployment and configuration of Edge Gateways were always done by vShield or NSX Manager. However, there is a big difference how vShield Manager and NSX Manager communicate with the Edge Gateway to push its configuration (IP addresses, NAT, firewall and other network services configurations).
As the Edge Gateway can be deployed to any network which can be completely isolated from any external traffic, its configuration cannot be done over the network and instead out-of-band communication channel must be used. vShield Manager always used VIX API (Guest Operations API) which involves communication with vCenter Server, hostd process on ESXi host hosting the Edge Gateway VM and finally VMware Tools running in the Edge Gateway VM (see this older post for more detail).
NSX Manager uses different mechanism. As long as the ESXi host is properly prepared for NSX, message bus communication between the NSX Manager and vsfwd user space process on the ESXi host is established. Additionally the configuration to the Edge Gateway VM is done via VMCI channel.
There are necessary prerequisites to use the faster message bus communication as opposed to VIX API. If any of these is not fulfilled the communication mechanism fails back to VIX API.
The host running the Edge Gateway must be prepared for NSX. So if you are in vCloud Director using solely VLAN (or even VCDNI) backed network pools and you skipped the NSX preparation of underlying clusters, message bus communication cannot be used as the host is missing the NSX VIBs and vsfwd process.
The Edge Gateway must be version 6.x. It cannot be the legacy Edge version 5.5 deployed by older vCloud Director releases (8.0, 5.6, etc.). vCloud Director 8.10 deploys Edge Gateway version 6.x however existing Edges deployed before upgrade to 8.10 must be redeployed in vCloud Director or upgraded in NSX (read this whitepaper for a script to do it at once).
Obviously NSX Manager must be used (as opposed to vShield Manager) – anyway vCloud Networking and Security is not supported with vCloud Director 8.10 anymore.
I have done quick proof of concept testing to see what is the relative improvement between the older and newer deployment mechanism.
I used 3 different combinations of the same environment (I was upgrading from one combination to the other).
vCloud Director 5.6.5 + vCloud Networking and Security 5.5.4
vCloud Director 8.0.1 + NSX 6.2.3 (uses legacy Edges)
vCloud Director 8.10 + NSX 6.2.3 (uses NSX Edges)
All 3 combinations used the same hardware and the same vSphere environment (5.5) with nested ESXi hosts. So the point is to look at the relative differences as opposed to absolute deployment times.
I measured in PowerCLI sequential deployment speed of 10 vApps with one isolated network and 10 vApps with one routed network with multiple runs to calculate average per one vApp. The first scenario was to measure differences in provisioning speeds of VXLAN logical switches to see impact of controller based control plane mode. The second includes provisioning of an Edge Gateway and logical switch. The vApps were otherwise empty (no VMs).
Note; If you want to do similar test in your environment, I captured the two empty vApps with only the routed or isolated networks to a catalog with vCloud API (PowerCLI) as it cannot be done from vCloud UI.
Here are the average deployment times of each vApp.
vCloud Director 5.6.5 + vCloud Networking and Security 5.5.4
Isolated 5-5.5 seconds
Routed 2:17 min
vCloud Director 8.0.1 + NSX 6.2.3
Isolated cca 6.8 seconds (Multicast), 7.5 seconds (Unicast)
Routed 2:20 min
vCloud Director 8.10 + NSX 6.2.3
Isolated 7.7 s (Multicast), 8.1 s (Unicast)
Routed 1:35 min
While the speed of logical switch provisioning goes little bit down with NSX and with Unicast control plane mode, the Edge Gateway deployment gets massive boost with NSX and VCD 8.10. While the OVF deployment of NSX Edge takes little bit longer (from 20 to 30 s) it is the configuration that makes up for it (from way over a minute down to about 30 s).
Just for comparison here are the tasks done during deployment of each routed vApp as reported by vSphere Client Recent Task window.