vCloud Usage Meter with Signed SSL Certificates

VCUMvCloud Usage Meter is a small virtual appliance used by service providers to measure their VMware product consumption for VSPP (VMware Service Provider Program) type licensing.

I needed to replace the self signed certificate of the web user interface. While there is a KB article 2047572 and also a chapter in the user guide dedicated to the subject neither was correct for my version 3.3.1 installation.

The web interfaces is provided by tc server which stores its certificate keystore in the following location:

/usr/local/tcserver/vfabric-tc-server-standard/um/conf/tcserver.jks

The keystore password is silverpen and the certificate alias is um. The location and password can be changed by editing server.xml in the same directory.

Here is a quick guide how to generate and sign new certificate with java keytool. Note if you need to generate private key externally use the steps described in my older article here.

  1. Modify default path to include java keytool location:
    export PATH=$PATH:/usr/java/latest/bin 
  2. Go to tc server conf folderd
    cd /usr/local/tcserver/vfabric-tc-server-standard/um/conf/ 
  3. Backup current keystore
    mv tcserver.jks tcserver.jks.backup 
  4. Generate private key. When asked always use password silverpen
    keytool -genkey -alias um -keyalg RSA -keysize 2048 -keystore tcserver.jks 
  5. Modify ownership of the keystore file:
    chown usgmtr tcserver.jks 
  6. Create certificate signing request
    keytool -certreq -alias um -keyalg RSA -file vcum.csr -keystore tcserver.jks 
  7. Sign CSR with your CA (save certificate as vcum.crt)
  8. Import root (and optionally intermediate) certificates if needed
    keytool -import -trustcacerts -alias root -file fojta-dc-CA.cer -keystore tcserver.jks 
  9. Import the signed certificate
    keytool -import -alias um -file vcum.crt -keystore tcserver.jks 
  10. Verify certificates were successfully imported into keystore
    keytool -list -keystore tcserver.jksKeystore type: JKS

    Keystore provider: SUN
    Your keystore contains 2 entries

    root, Aug 1, 2014, trustedCertEntry,
    Certificate fingerprint (MD5): E3:EE:7F:47:1A:3E:76:07:8F:27:5D:87:54:94:A4:E7
    um, Aug 2, 2014, PrivateKeyEntry,
    Certificate fingerprint (MD5): 26:3C:96:08:63:86:2B:E8:CA:2C:7F:53:6A:B2:EE:FA

  11. Restart tc service
    service tomcat restart

 

Load Balancing vCloud Director Cells with NSX Edge Gateway

About two years ago I have written article how to use vShield Edge to load balance vCloud Director cells. I am revisiting the subject; however this time with NSX Edge load balancer. One of the main improvements of the NSX Edge load balancer is SSL termination.

Why SSL Termination?

All vCloud GUI or API requests are made over HTTPS. When vShield (vCNS) Edge was used for load balancing it was just passing through the traffic untouched. There was no chance to inspect the request – the load balancer saw only source and destination IP and encrypted data. If we want to inspect the HTTP request we need to terminate the SSL session on the load balancer and then create a new SSL session towards the cell pool.

SSL Termination

 

This way we can filter URLs, modify the header or do even more advanced inspection. I will demonstrate how we can easily block portal access for a given organization and how to add X-Forwarded-For header so vCloud Director can log the actual end-user’s and not only load balancer’s IP addresses.

Basic Configuration

I am going to use exactly the same setup as in my vShield article. Two vCloud Director cells (IP addresses 10.0.1.60-61 and 10.0.1.62-63) behind Virtual IPs – 10.0.2.80 (portal/API) and 10.0.2.81 (VMRC).

vCloud Director Design

While NSX Edge load balancer is very similar to vShield load balancer the UI and the configuration workflow has changed quite a bit. I will only briefly describe the steps to set up basic load balancing:

  1. Create Application Profiles for VCD HTTP (port 80), VCD HTTPS (port 443) and VCD VMRC (port 443). We will use HTTP, HTTPS and TCP types respectively. For HTTPS we will for now enable SSL passthrough.
  2. Create new Service Monitoring (type HTTPS, method GET, URL /cloud/server_status)
    Service Monitor
  3.  Create server Pools (VCD_HTTP with members 10.0.1.60 and 62, port 80, monitor port 443; VCD_HTTPS with members 10.0.1.60 and 62, port 443, monitor port 443 and VCD_VMRC with members 10.0.1.61 and 63, port 443, monitor port 443). Always use monitor created in previous step. I used Round Robin algorithm.
    Pools
  4. Create Virtual Servers for respective pools, application profiles and external IP/port (10.0.2.80:80 for VCD_HTTP, 10.0.2.80:443 for VCD_HTTPS and 10.0.2.81:443 for VCD_VMRC).
  5. Enable load balancer in its Global Configuration.
    Global LB Config

Now we should have load balanced access to vCloud Director with identical functionality as in vShield Edge case.

Advanced Configuration

Now comes the fun part. To terminate SSL session at the Edge we need to create and upload to the load balancer vCloud http SSL certificate. Note that it is not possible to terminate VMRC proxy as it is a poor socket SSL connection. As I have vCloud Director 5.5 I had to use identical certificate as the one on the cells otherwise catalog OVF/ISO upload would fail with SSL thumbprint mismatch (see KB 2070908 for more details). The actual private key, certificate signing request and certificate creation and import was not straightforward so I am listing exact commands I used (do not create CSR on the load balancer as you will not be able to export the key to later import it to the cells):

  1. Create private key with pass phrase encryption with openssl:
    openssl genrsa -aes128 -passout pass:passwd -out http.key 2048
  2. Create certificate signing request with openssl:
    openssl.exe req -new -in http.key -out http.csr
  3. Sign CSR (http.csr) with your or public Certificate Authority to get http.crt.
  4. Upload the certificate and key to both cells. See how to import private key in my older article.
  5. Import your root CA and http certificate to the NSX Edge (Manager > Settings > Certificates).
    Certificates

Now we will create a simple Application Rule that will block vCloud portal access to organization ACME (filter /cloud/org/acme URL).

acl block_ACME path_beg – i /cloud/org/acme
block if block_ACME

Application Rule

Now we will change previously created VCD_HTTPS Application Profile. We will disable SSL Passthrough, check Insert X-Forwarded-For HTTP header (which will pass to vCloud Director the original client IP address) and Enable Pool Side SSL. Select previously imported Service Certificate.

Application Profiles

And finally we will assign the Application Rule to the VCD_HTTPS Virtual Server.

Virtual Servers

Now we can test if we can access vCloud Director portal and see the new certificate, we should not be able to access vCloud Director portal for ACME organization and we should see in the logs the client and proxy IP addresses.

ACME Forbidden

Client IP

For more advanced application rules check HAProxy documentation.

vCloud Connector in Multisite Environment with Single VC

I have received an interesting question. Customer has single vCenter Server environment with large number of sites. Can they use vCloud Connector Content Sync to keep templates synced among the sites?

The answer is yes but the set up is not so straight forward. vCloud Connector client does not allow registration of multiple nodes pointing to the same end-point (vCenter Server). The workaround is to fool vCloud Connector by using different FQDNs for the same end-point (IP address). Example: vcenter01.fojta.com, vcenter01a.fojta.com, vcenter01b.fojta.com all pointing to the same IP address. This will obviously impact SSL certificate verification which must be turned of or the vCenter Server certificates must include all those alternative names in S.A.N. attribute.

So the customer would deploy vCloud Connector node in each site, register it with the central vCenter Server always using different FQDN. Then in vCloud Connector client each site would look like a different vCenter Server and the Content Sync could be set up.

Availability Zone Design in vCloud Director

Introduction

Service providers with multiple datacenter often want to offer to their customers the choice of having multiple virtual datacenters from different availability zones. Failure of one availability zone should not impact different availability zone. The customer then can deploy his application resiliently in both virtual datacenters leveraging load balancing and application clustering.

Depending on the distance of datacenters and network latency between them it is possible to have multiple availability zones accessible from within single vCloud Director instance which means one single GUI or API endpoint and very easy consumption from customer’s perspective. Read vCloud Architecture Toolkit – Architecting vCloud for more detail on latency and supportability considerations.

Multiple vCenter Server Design

Typical approach in single instance vCloud Director is to have for each availability zone its own vCenter Server and vCNS Manager. vCloud Director in version 5.5 can connect up to 20 vCenter Servers.

Following diagram shows how the management and resource clusters are typically placed between two sites.

Multi vC Design

Each site has management cluster. The shared cloud management VMs (vCloud Director cells, databases, Chargeback, AMQP, etc) run primarily from site 1 with failover to site 2. Provider VDC management resources (vCenter Server, vCNS/NSX Managers, databases) are distributed to each site. There is no sharing of resource group components which makes very clean availability zone design.

One problem for the customers is that they cannot stretch organization VDC networks between the sites. The reason for this is that although VXLAN networks could be stretched over routed Layer 3 networks between sites, they cannot be stretched between different vCenter Servers. Single vCNS/NSX manager is the boundary for VXLAN network and there is 1:1 relationship between vCenter Server and vCNS/NSX Manager. This means that if the customer wants to achieve communication between VMs in each of his VDCs from different availability zones he has to create Edge Gateway IPsec VPN or provide external network connectivity between them. All that results in quite complicated routing configuration. Following diagram shows the typical example of such setup.

VDC design in multi vCenter Provider VDC

Single vCenter Server Design with Stretched Org Networks

I have come up with an alternative approach. The goal is to be able to achieve stretched OrgVDC network between two sites and have only one highly available Edge Gateway to manage. The desirable target state is shown in the following diagram.

VDC design with streched network

To accomplish this we need only one Resource group vCenter Server instance and thus one VXLAN domain while still having the separation of resources into two availability zones. vCenter Server can be made resilient with vSphere HA (stretched cluster), vCenter Server Heartbeat or Site Recovery Manager.

Could we have the same cluster design as in multi-vCenter scenario with each Provider VDCs having its own set of clusters based on site? To answer this question I first need to describe the VXLAN transport zone (VXLAN scope) concept. VXLAN network pools created by vCloud Director have only Provider VDC scope. This means that any Org VDC network created from such VXLAN network pool will be able to span clusters that are used by the Provider VDC. When a cluster is added or removed to or from Provider VDC, the VXLAN transport zone scope is expanded or reduced by the cluster. This can be viewed in vCNS Manager or in NSX – Transport Zone menu.

VXLAN - Transpost zone

There are two ways how to expand the VXLAN transport zone.

Manual VXLAN Scope Expanding

The first one is simple enough and involves manually extending the VXLAN transport zone scope in vCNS or NSX Manager. The drawback is that any reconfiguration of Provider VDC clusters or resource pool will remove this manual change. As Provider VDC reconfiguration does not happen too often this is viable option.

Stretching Provider VDCs

The second solution involves stretching at least one Provider VDC into the other site so its VXLAN scope covers both sites. The resulting Network Pool (which created the VXLAN transport zone) then can be assigned to Org VDCs needing to span networks between sites. This can be achieved with using multiple Resource Pools inside clusters and assigning those to Provider VDCs. As we want to stretch only the VXLAN networks and not the actual compute resources (we do not want vCloud Director deploying VMs into wrong site) we will have site specific storage policies. Although a Provider VDC will have access to Resource Pool from the other site it will not have access to the storage as only storage from the first site is assigned to it.

Hopefully following diagram better describes the second solution:

Stretched PVDC Design

 

The advantage of the second approach is that this is much cleaner solution from support perspective although the actual setup is more complex.

Highly Available Edge Gateway

Now that we have successfully stretched the Org VDC network between both sites we also need to solve the Edge Gateway site resiliency. Resilient applications without connectivity to external world are useless. Edge Gateway (and the actual Org VDC) is created inside one (let’s call it primary) Provider VDC. The Org VDC network is marked as shared so other Org VDCs can use it as well. The Edge Gateways are deployed by the service provider. He will deploy the Edge Gateway in high availability configuration which will result in two Edge Gateway VMs deployed in the same primary Provider VDC (in System VDC sub-resource pool). The VMs will use internal Org VDC network for heartbeat communication. The trick to make it site resilient is to go into vCNS/NSX Edge configuration and change the Resource Pool (System VDC RP with the same name but different cluster) and Datastore for the 2nd instance to the other site. vCNS/NSX Manager then immediately redeploys the reconfigured Edge instance to the other site. This change survives Edge Gateway redeploys from within vCloud Director without any problems.

HA Edge

 

Migration from Nexus 1000V to NSX in vCloud Director

A few service providers were asking me questions about migration possibilities from Nexus 1000V to NSX if using vCloud Director. I will first try to explain the process in high level and then offer a step by step approach.

Introduction

NSX (actually NSX for vSphere) is built on top of vSphere Distributed Switch which is extended by a few VMkernel modules and host level agents. It cannot run on top of Nexus 1000V. Therefore it is necessary to first migrate from Nexus 1000V to vSphere Distributed Switch (vDS) and then install NSX. The core management component of NSX is NSX Manager which runs as a virtual appliance. It is very similar in a sense to vShield or vCloud Networking and Security Manager and there is actually direct upgrade path from vCNS Manager to NSX Manager.

NSX is backward compatible with vCNS API and therefore it works without any problems with vCloud Director. However NSX advanced features (distributed logical router and firewall, dynamic routing protocols support on Edges, etc.) are not available via vCloud Director GUI or vCloud API. The only exception is multicast-less VXLAN which works with vCloud DIrector out of the box.

Migration from Nexus 1000V to vSphere Distributed Switch

In pure vSphere environment it is pretty easy to migrate from Nexus 1000V to vDS and can be done even without any VM downtime as it is possible to have on the same ESXi host both distributed switches (vDS and Nexus). However, this is not the case when vCloud Director with VXLAN technology is involved. VXLAN can be configured with per cluster granularity and therefore it is not possible to mix two VXLAN providers (vDS and Nexus) on the same cluster. This unfortunately means we cannot do live VM migration from Nexus to vDS as they are on different clusters and (live) vMotion does not work across two different distributed switches. Cold migration must be used instead.

Note that if VXLAN is not involved, live migration is possible which will be leveraged while migrating vCloud external networks.

Another point should be noted. We are going to mix two VXLAN providers in the same Provider VDC, meaning that VXLAN scope will span both Nexus 1000V and vSphere Distributed Switch. To my knowledge this is not recommended and supported although it works and both VTEP types can communicate with each other thanks to multicast controller plane. As we will be migrating VMs in powered off state it is not an issue and no communication will run over mixed VXLAN network during the migration.

High Level Steps

We will need two clusters. One legacy with Nexus 1000V and one empty for vDS. I will use two vDS switches, although one could be enough. One vDS is used for management, vMotion, NFS and VLAN traffic (vCloud external networks) the other vDS is used purely for VXLAN VM traffic.

We need to first migrate all VLAN based networks from Nexus 1000V to vDS1. Then we prepare the second cluster with vDS based VXLAN and create elastic Provider VDC which will contain both clusters (resouce pools). We will disable the Nexus resource pool and migrate all VMs, templates and Edges off of it to the new one – see my article vCloud Director: Online Migration of Virtual Data Center.

We will detach the old cluster remove and unistall Nexus 1000V and then create vDS with VXLAN instead and add it back to Provider VDC. After this we can upgrade the vCNS Manager to NSX Manager and prepare all hosts for NSX and also install NSX Controllers. We can optionally change the VXLAN transport mode from Multicast to Unicast/Hybrid. We however cannot upgrade vCloud Director deployed vShield Edges to NSX Edges as that would break its compatibility with vCloud Director.

Step-by-step Procedure

  1. Create new vDS switch. Migrate management networks (vMotion, NFS, management) and vCloud external networks to it (NAT rules on Edge Gateways using external network IPs will need to be removed and re-added).
    Prepare new cluster (Cluster2) and add it to the same vDS. Optionally create a new vDS for VXLAN networks (if the first one will not be used). Prepare VXLAN fabric on Cluster2 in vCNS Manager.
    Distributed switches
    VXLAN fabric before migration
    Nexus1000V – Cluster1 VXLAN only
    dvSwitch01 — Cluster 1 and Cluster 2 external networks, vMotion, management and NFS
    dvSwtich02 – Cluster2 VXLAN only
  2. Create new Provider VDC on new cluster (GoldVDCnew)
  3. Merge new Provider VDC with the old one. Make sure the new Provider VDC is primary! Disable the secondary (Cluster1) Resource Pool.
    Merging PVDCs
    PVDC Resource Pools
  4. Migrate VMs from Cluster1 to Cluster2 from within vCloud Director. VMs connected to a VXLAN network will need to be powered off as it is not possible to do live vMotion between two different distributed switches.
    VM Migration
  5. Migrate templates (move them to another catalog). As the Cluster1 is disabled the templates will be registered on hosts from Cluster2.
  6. Redeploy Edges from within vCloud Director (not vCNS Manager). Again the disabled Cluster1 will mean that the Edges will be registered on hosts from Cluster2.
  7. Detach (now empty) Resource Pool
    Detaching Resource Pool
  8. Rename Provider VDC to the original name (GoldVDC)
  9. Unprepare VXLAN fabric on Cluster1. Remove VXLAN vmknics on all hosts from Nexus1000V.
  10. Remove Nexus1000V switch from the (now empty) cluster and extend VXLAN vDS there (or create a new one if no L2 connectivity exists between clusters). Remove Nexus VEM VIBs from all hosts. Prepare VXLAN fabric on the cluster and add it to Provider VDC.
    VXLAN fabric after migration
  11. Upgrade vCNS Manager to NSX Manager with vCNS to NSX 6 Upgrade Bundle
    vCNS Manager upgrade
    NSX Manager
  12. Update hosts in vSphere Web Client > Networking and Security > Installation > Host Preparation. This action will require host reboot.
    Host preparation for NSX - beforeHost preparation for NSX - after
  13. (Optionally) install NSX controllers
    NSX Controller installation
  14. (Optionally) change VXLAN transport mode to Unicast/Hybrid.
    VXLAN transport mode