Monitoring vSphere Replication RPO Compliance

Just a quick post to show how you can monitor Recovery Point Objective (RPO) compliance of a virtual machines protected with vSphere Replication.

Option 1: vCenter Server Alarm

When vSphere Replication Appliance is registered to vCenter Server multiple new vSphere Replication Event Types become available and can be used for creation of custom alarms.

List of all these event types can be queried with the following one-line PowerCLI command:

(get-view eventManager).get_Description()| select -expand Eventinfo |where FullFormat -like “*Hms*”

The following example will show how to set alarm for event “RPO violated”

Key:ExtendedEvent
Description: RPO violated
Category: error
FullFormat: com.vmware.vcHms.rpoViolatedEvent|Virtual machine vSphere Replication RPO is violated by [data.currentRpoViolation] minute(s)

  1. In vCenter Server go to Manager, Alarm Definitions and add new alarm
  2. Set alarm name, monitor VMs and specific events.
    new-alarm
  3. Enter the trigger (com.vmware.vcHms.rpoViolatedEvent)
    alarm-trigger
  4. Add Alarm actions (email, SNMP trap, run command etc.) as necessary.

Triggered alarm:

triggered-alarm

Note that this alarm applies only to VMs replicated from the particular vCenter Server. So it will not be triggered on VMs replicated to this vCenter Server.

Option 2: vCloud API

This options applies only for VM replications to or from a cloud provider who uses vCloud Availability add-on. The vCloud Director tenant APIs are extended with replication APIs. The state of each replication can be retrieved with:

GET /api/vr/replications/<replication-id>

and

GET /api/vr/failbackreplications/<replication-id>

Where list of all replications and their replication-ids is retrieved at org level with these two API calls:

GET /api/org/<org-id>/replications

and

GET /api/org/<org-id>/failbackreplications

An example of VM1 replication state (RPO 15 mins, not active with 16 min RPO violation):

replication-api

The following tables describes all the elements of the API response:

replication-details

vCloud Availability: Replication Traffic Deep Dive

VMware this week released vCloud Availability 1.0.1. It is a disaster recovery as a service solution that extends vCloud Director and enables VM replications between vSphere environments and a multitenant public cloud.

One of the unique features it offers is that there is no need for private networks for replication traffic between the tenant on-prem environment and the public cloud. The bi-directional replication can securely traverse the Internet and in this post I will dive deeper into how this is technically achieved.

On-Premises Components

The tenant on-prem environment can run almost any version of vSphere together with vSphere Replication (VR) appliance. The appliance must have access to the internet but does not need to have an external internet routable IP address. It can be behind a firewall with Source NATing and only one port open – TCP 443.

The appliance runs various services:

  • vSphere Replication Manager Service (vRMS) – the brain of the on-prem VR solution with an internal or optionally external database. It also provides the vSphere Web Client plugin extension for managing the replication.
  • vSphere Replication Server (vRS) – staging point for incoming (from-the-cloud) replications before they are de-staged via ESXi hosts to target datastores. This component can scale-out if needed by deploying additional appliances (cca 200 incoming replications for each vRS). It is not in the path for outgoing (to-the-cloud) replications.
  • vCloud Tunneling Agent (vCTA) – component that provides secure tunnel connections to the cloud. It also keeps control connection open so reverse replication can be initiated as well. More on that later.

That is it. There is nothing else needed to be installed by the tenant. This is because the actual replication engine (the vSphere Replication agent and vSCSI filter) is already present in the hypervisor – ESXi VMkernel. This also means no dependency what so ever on the storage hardware.

To-the-cloud replication flow:

ESXi host (VR Agent) > vSphere Replication Appliance (vCTA) > Internet > vCloud Availability public endpoint (Cloud Proxy load balanced VIP)

From-the-cloud replication flow:

vCloud Availability public endpoint (Cloud Proxy) > Internet > vSphere Replication Appliance (vCTA) > vSphere Replication Server (either embedded on VR appliance or standalone) > ESXi host

Public Cloud Components

In the cloud we need to have supported version of vCloud Director. It consists of multiple load balanced vCloud Director cells, database and resource vSphere environments.

For vCloud Availability we need vSphere Replication components – vRMS appliances for each resource vCenter Server and vRS appliances for replication staging that scale-out based on number of replications.

Additionally we need:

  • vSphere Replication Cloud Service (vRCS) – highly available appliances. The brain of the solution with extended vCloud VR APIs. It needs external Cassandra database and RabbitMQ to communicate with vCloud Director.
  • Cloud Proxies – load balanced vCloud Director cell like components with all vCloud services disabled with only the Cloud Proxy service running (multitenanted vCTA)
  • vCloud Availability Portal appliances – load balanced stateless components that provide portal to manage replications in the cloud when the on-prem vSphere Replication UI is not available (in disaster situations).

The provider can serve hundreds of customers with thousands of concurrent tunnels. To achieve such level of scalability, the Cloud Proxies are deployed in scale-out fashion with load balancer in front. The load balancer provides single endpoint for the on-prem vCTA control connection as well as for the to-the-cloud replication traffic.

To-the-cloud replication flow:

Tenant on-prem VR Appliance > Internet > Load balancer > Cloud Proxy (tunnel termination and decryption) > vRS > ESXi host.

From-the-cloud replication flow:

In order not to require public visibility of the on-prem tenant VR Appliance, the from-the-cloud replication is set up in a quite clever way. There are two options of doing this – one with load balancer with L7 application rules and the other without. The seconds approach is more scalable and recommended so let me describe it.

The following diagram shows the workflow:

from-the-cloud-replication

As was said before the connection is always initiated by the on-prem environment. That’s why we have the control connection (1) that is load balanced to one of the Cloud Proxies (in our example Cloud Proxy 2). The replicated traffic is coming from in the cloud resource ESXi host that sends it (2) via internal load balancer to one of the Cloud Proxies – in our case Cloud Proxy 1 (3). Through the control connection the on-prem vCTA endpoint is notified which Cloud Proxy is used for the particular replication (4-7). Now the on-prem vCTA can establish new connection to the correct Cloud Proxy 1 – it does not use its load balanced address, but instead a direct IP/FQDN that is DNATed 1:1 to the Cloud Proxy (9). Finally the two connections (3) and (9) can be stitched together (10) and we can start sending from-the-cloud replication traffic all the way to the on-prem environment.

To summarize, we need:

  • Cloud Proxy load balancer with CloudProxyBase VIP that is used for the Control connection and to-the-cloud replications.
  • Internal load balancer for resource ESXi to Cloud Proxy (from-the-cloud) traffic.
  • Additional public IP/FQDN for each Cloud Proxy for from-the-cloud traffic. This FQDN is configured on the Cloud Proxy cell in global.properties file (cloudproxy.reverseconnection.fqdn=FQDN:443).
  • As a consequence of using the same Cloud Proxy under different FQDNs (CloudProxyBase VIP and Cloud Proxy reverse connection) we need to take care that Cloud Proxy cell http certificate is set for both FQDNs. Probably the easiest way to achieve this is to use wild card certificate on cloud proxies (CN *.cloudproxy.example.com).

Setup Site-to-Site VPN between Azure and vCloud Director

My previous blog post was about setting up IPSec VPN tunnel between AWS VPC and vCloud Director Org VDC. This time I will describe how to achieve the same with Microsoft Azure.

vCloud Director is not among Azure list of supported IPSec VPN endpoints however it is possible to set up such VPN although it is not straightforward.

I will describe the setup of both Azure and VCD endpoints very briefly as it is very similar to the one I described in my previous article.

Azure Configuration

  • Resource Group (logical container object) – in my example RG UK
  • Virtual network (large address space similar to AWS VPN subnet) – 172.30.0.0/16
  • Subnets – at least one for VMs (172.30.0.0/24) and one for Gateway (172.30.255.0/29)
  • Virtual Network Gateway – Azure VPN endpoint with public IP address associated with the Gateway subnet above. Gateway type is VPN, VPN type is Policy-based (this is because Route-based type uses IKE2 which is not supported by NSX platform used by vCloud Director).
  • Local Network Gateway – vCloud VPN endpoint definition with its public IP address and subnets that should be reachable behind the vCloud VPN endpoint (81.x.x.x, 192.168.100.0/24)
  • Connection – definition of the tunnel:
    • Connection type: Site-to-site (IPSec)
    • Virtual network gateway and local network gateway are straightforward (those created previously)
    • Connection name: whatever
    • Shared Key (PSK): create your own 32+ character key using upper and lower case characters and numbers
  • Test VM connected to the VM subnet (IP 172.30.0.4)

azure-resources

vCloud Configuration

As explained above we created Policy Based VPN endpoint in Azure. Policy Based VPN uses IKE version 1, Diffie-Hellman Group 2 and no Perfect Forward Secrecy.

However selection of DH group and PFS is not available to tenant in vCloud Director on the legacy Org VDC Edge Gateway. Therefore the following workaround is proposed:

Tenant configures VPN on his Org VDC Edge Gateway with the following:

  • Name: Azure
  • Enable this VPN configuration
  • Establisth VPN to: a remote network
  • Local Networks: 192.168.100.0/24 (Org VDC network(s))
  • Peer Networks: 172.30.0.0/24
  • Local Endpoint: Internet (interface facing internet)
  • Local ID: 10.0.2.121 (Org VDC Edge GW internet interface)
  • Peer ID: 51.x.x.x (public IP of the Azure Virtual network gateway)
  • Peer IP: 51.x.x.x (same as previous)
  • Encryption protocol: AES256
  • Shared Key: the same as in Azure Connection definition

Now we need to ask the service provider to directly in NSX in the Edge VPN configuration disable PFS and change DH Group to DH2.

nsx-vpn

Note that this workaround is not necessary on Org VDC Edge Gateway that has been enabled with Advanced Networking services. This feature is at the moment only in vCloud Air, however soon will be available to all vCloud Air Network service providers.

If all firewall rules are properly set up we should be able to ping between Azure and vCloud VMs.

ping

Setup Site-to-Site VPN between AWS and vCloud Director

In today’s reality of multi cloud world customers are asking how to set up connection between them. In this article I am going to demonstrate how to set up IPsec VPN tunnel between AWS VPC and vCloud Director Org VDC.

IPSec is standard protocol suite which works at OSI Layer 3 and allows encrypting IP packet communication. It is supported by many software, hardware and cloud vendor implementations, however it is also quite complex to set up due to large sets of different settings which both tunnel endpoints must support. Additionally as it does not rely on TCP L4 protocol NAT traversal can be a challenge.

In my example I am using my home lab vCloud Director instance running behind NATed internet connection. So what could go wrong 🙂

The diagram below shows the set up.

 

AWS Virtual Private Cloud on the left is created with large subnet 172.31.0.0/16, a few instances, and Internet and VPN gateways.

On the right is vCloud Director Org VDC with a network 192.168.100.0/24 behind an Org VDC Edge Gateway which is connected to the Internet via my home ADSL router.

    1. We start by taking care of IPSec NAT traversal over the ADSL router. As I have dd-wrt OS on the router, I am showing how I enabled port forwarding of UDP ports 500 and 4500 to the Edge GW IP 10.0.2.121 and added DNAT for protocols 50 (AH) and 51 (ESP) to the router startup script.
      udp-port-forwardingiptables -t nat -A PREROUTING -p 50 -j DNAT –to 10.0.2.121
      iptables -t nat -A PREROUTING -p 51 -j DNAT –to 10.0.2.121
    2. Now we can proceed with the AWS VPN configuration. In AWS console, we go to VPC, VPN Connections – Customer Gateways and create Customer Gateway – the definition of the vCloud Director Org VDC Edge Gateway endpoint. We give it a name, set it to static routing and provide its public IP address (in my case the public address of the ADSL router).customer-gateway
    3. Next we define the other end of the tunnel – Virtual Private Gatway – in menu below. We give it a name and right after it is created, associate it with the VPC by right clicking on it.virtual-private-gateway
    4. Now we can create VPN Connection in the next menu below (VPN Connections). We give it a descriptive name and associate Virtual Private Gateway from step #3 with Customer Gateway from step #2. We select static routing and provide the subnet at the other end of the tunnel, which is in our case 192.168.100.0/24. This step might take some time to finish.
    5. When the VPN Connection is created we need to download its configuration. AWS will provide the configuration in various formats customized for the appliance on the other side of the tunnel. Generic format will do for our purposes. Needless to say, AWS does not allow custom setting of any of the given parameters – it is take it or leave it. download-configuration
    6. Before leaving AWS console we need to make sure that the subnet at the other side of the tunnel is propagated to the VPC routing table. This can be done in the Route Table menu, select the existing Route Table, in the Route Propagation tab find the Virtual Private Gateway from step #3 and check Propagate check box.route-table
    7. To configure the other side of the VPN endpoint – the Org VDC Edge Gateway we need to collect the following information from the configuration file obtained in the step #5.
      Virtual Private Gateway IP: 52.x.y.z
      Encryption Algorithm: AES-128
      Perfect Forward Secrecy: Diffie-Hellman Group 2
      Pre-Shared Key (PSK): 32 random characters
      MTU: 1436.Note: As was said before, none of these parameters can be changed on AWS side. So the router on the other side must support all of them. And here we hit a little issue. AWS pre-shared key is generated with number and letter (upper and lower case) characters and a special character – like dot, underscore, etc. Unfortunately vShield Edge does not support PSK with special character. NSX Edge does, but the legacy vCloud Director UI/API will not allow us to create IPsec VPN configuration with PSK containing special character. There are various ways how to solve it. One is not to use the native AWS VPN Gateway and instead use software VPN option, another is to create/edit the VPN configuration directly in NSX Manager (only Service Provider can do this) and lastly convert the Edge Gateway to Advanced Gateway and take advantage of the new networking UI and API that does not have this limitation (this functionality is currently available only on vCloud Air, but will soon be available to all vCloud Air Network providers).
    8. In vCloud Director UI go to Administration, select your Virtual Datacenter, Edge Gateways tab and right click on the correct Edge GW to select its Edge Gateway Services.edge-gw-services
    9. In The VPN tab Enable VPN by clicking the checkbox. In my NATed example I also had to configure public IP for the Edge GW (which is the address of the ADSL router).enable-vpn
    10. Finally we can create the VPN tunnel by clicking the Add button and selecting Establish VPN to a remote network pulldown option. Select local network(s) (192.168.110.0/24), in peer networks enter AWS VPC subnet (172.31.0.0/24), select internet interface of the Edge in the Local Endpoint, enter its IP address (10.0.2.121). For Peer ID and Peer IP use public address of Virtual Private Gateway from step #7. Change Encryption algorithm to AES and paste Shared Key (see the note in #7). Finally modify MTU size (1436).

If everything was set correctly then back in AWS console, under VPN Connections, Tunnel details we should see the tunnel status change to UP.

AWS offers two tunnel endpoints for redundancy, however in our case we are using only Tunnel 1.

tunnel-status-in-aws

If the firewall in Org VDC and Security Groups in AWS are properly set, we should be able to prove tunnel communication with pings from AWS instance to the Org VDC VM.

ping-test

Configure Active Directory Federation for vCloud Director Organization

Configure Active Directory Federation for vCloud Director Organization

vCloud Director tenants can federate their on-premises identity source with vCloud Director to simplify user management in their vCloud organization. I have already wrote blog post about this topic in the past; this time I will provide step by step instructions how to federate Active Directory with Active Directory Federation Services (AD FS).

Installation and configuration of AD FS is out of scope for this article as there are already very good and detailed guides on the internet.

  1. Download AD FS SAML2 metadata from your AD FS server (https://adfs.acme.com/FederationMetadata/2007-06/FederationMetadata.xml)
  2. As Org Administrator log into vCloud Director and in Administration > Settings > Federation check Use SAML Identity Provider checkbox and upload FederationMetadata file from #1.
  3. Still on the same page fill in Entity ID (must be unique for given IdP) and regenerate certificate. It will be valid for 1 year after which it must be regenerated again.
    Note: Entity ID is available only in vCloud Director 8.10 and newer.
  4. Click Apply.
  5. Download SAML metadata file from https://<vcd_url>/cloud/org/<org-name>/saml/medatadata/alias/vcd.
    Note: vCloud Director 8.10 provides link to the file just above the Certificate Expiration button.
    vcd-config-saml
  6. In Administration > Memebers > Users import SSO users from the SAML source. As user name use email address. Assign a vCloud role.
  7. It is also possible to import SSO groups from the SAML source. However, Groups menu might be missing under Users until you refresh the page (Groups menu got enabled by the step #2). For group import simply use AD group name (without domain).
    Note: If you get log out before setting up AD FS you can always go back to native authentication dialog by entering the following URL: https://<vcd_url>/cloud/org/<org-name>/login.jsp
    groups
  8. That’s all for vCloud Director configuration. The rest must be done in AD FS. Open AD FS Management Console and in Actions menu cling on Add Relying Party Trust. Go through the wizzard and import the SAML metadata file from step #5.
  9. Next we need to Edit claim rules. Under Issuance Transform Rules, using Send LDAP Atributes as Claims template create LDAP claim rule that uses Active Directory as Attribute Store and maps E-Mail-Addresses to E-Mail Address and Token-Groups – Unqualified Names to Groups.
    claim1
  10. Add another transform rule from Transform an Incoming Claim template that transforms E-Mail Address to Name ID.claim2
  11. Last thing that needs to be done is changing Relying Party Trust hashing algorithm from SHA-256 to SHA-1 (can be found in Properties > Advanced).

That’s all, now depending how your browser is configured, when you enter vCloud Director Organization URL you will be redirected to authentication dialog (either native browser or at the AD FS website). Enter AD credentials and if the user had been properly imported into vCloud Director (as individual user or member of a group) you should be logged into vCloud Director UI.

 

Edit 11/24/2016: My faithful readers (thank you Julius) have provided additional links for federations with other IdP solutions:

SafeNet Authentication Service: Integration Guide

RSA SecureID: blog post