vCloud Director on AWS

My colleague Lyubomir Lyubenov from VMware OneCloud team (OneCloud is internal huge vCloud Director based cloud for field enablement) recently published VCDonAWS CloudFormation templates with which you can deploy vCloud Director management components under 30 minutes on AWS.

I have seen customer (Service Providers) asking what is it for and what it means for the vCloud Director future. Let me give you my own view.

What is it?

vCloud Director is one of a few VMware products that is not provided in form of virtual appliances. The vCloud Director binaries can be installed on any compatible Linux virtual or physical machine and that means it can be installed anywhere – even on EC2 instances running on AWS. The VCDonAWS project in a clever way uses AWS resources (not VMC on AWS!) to deploy vCloud Director management stack from a single CloudFormation template. It leverages VPC (optionally stretched across 2 availability zones) for the networking, EC2 instances for jumphosts and vCloud Director cells, PostgreSQL RDS for vCloud Director database, S3 (S3FS) for vCloud Director transfer share (although this will be in the future replaced with Elastic File System for better performance), Elastic Load Balancers (for UI/API and ConsoleProxy cells) and even Auto Scaling Groups to automatically deploy additional VCD cells. The certificates are provided with AWS Certificate Manager.

The following picture taken from the VCDonAWS website shows the overall architecture.

What is it not?

As you can see above it only deploys the vCloud Director management components. You will still need to attach resource vCenter Servers/NSX Manager pairs and these obviously cannot be running on (native) AWS. You cannot even use VMC on AWS instances (at least not yet) as they have RBAC and VC/NSX inventory access limitations which prevent vCloud Director from working properly.

Cassandra VM metric datastore and RabbitMQ messaging bus optional components are not deployed either although I see no reason why they should not run on AWS.

Is it supported?

No. The deployment uses unsupported OS – Amazon Linux (the CentOS deployment option is not working at the time of writing).

Why?

Beside the OneCloud team use case which I cannot speak here about I see it as a very nice proof of concept of how VCD deployment can be automated. How simply it can be done with infrastructure as code approach. And obviously once VMC on AWS restrictions will be resolved these two can be used together to provide multitenant VMware platform IaaS.

Try it yourself!

If you have AWS account try it yourself and really in about an hour you can have a deployed vCloud Director instance.

Here are some tips:

  • Use only US regions as the provided templates do not have AMI mappings for other regions
  • Use Amazon Linux HVM as base operating system for Bastion and cell hosts (CentOS option is not working)
  • For VCD installation ID do not use 7-9 due to bug in verification regex.
  • You will need VCD binary uploaded in an S3 bucket. I used VCD 9.1 GA bits. You will also need working license key.
  • You will need certificate (even self signed) uploaded to Certification Manager.
  • And lastly generate key pair  for accessing bastion hosts and cells.
Cloud Formation Input Dialog

 

 

Stack Deployment

 

vCloud Director Cells

Advertisements

Limit Maximum vCPU/RAM Configuration of vCloud Director VM

Some times ago I wrote about an undocumented feature that allows to limit maximum disk size for VM in vCloud Director. I was asked numerous times if there is similar setting for vCPU and RAM maximums. Today I discovered there is, however it should be considered an experimental feature. I still find it useful as misconfigured VM with extremely large number of vCPUs or huge RAM will impact the host it is running on and cause excessive swapping or high CPU ready times so it is in best interest of the vCloud Director system administrator to prevent it. The other option is to use blocking tasks as described here: CPU and Memory Limit enforcement for vCloud Director and in a blog here.

The limit is set with cell-management-tool command on any cell. Restart of the cell is not necessary.

$VCLOUD_HOME/bin/cell-management-tool manage-config -n vmlimits.memory.limit -v 65536
$VCLOUD_HOME/bin/cell-management-tool manage-config -n vmlimits.cpu.numcpus -v 16

The settings in the example above will limit maximum size of a VM to 16 vCPUs and 64 GB RAM.

Some observations:

  • The limit is vCloud Director instance wide and also applies to system administrators
  • VM with resources set above the limit will fail to be powered on with an error:
    The operation could not be performed, because there are no more CPU resources
    or
    The operation could not be performed, because there are no more Memory resources
  • It can be cheated by using CPU or memory hot add and adding resource beyond the limits to an already powered on VM

Again, consider it an experimental feature and use at your own risk.

What’s New in vCloud Director 9.1

vCloud Director version 9.1 has been released. It has been just 6 months since the previous release (9.0) so VMware is delivering on its promise of multiple yearly releases in 6 months cadence.

In this whitepaper you can find  high level overview of some of the new features. Let me summarize them and also provide additional ones here below.

H5 UI Enhancements

In iterative process the HTML 5 UI is slowly replacing legacy Flex UI. The tenant portion now includes vApp, Catalog and Networking management functionality, OVF/ISO download/uploads without the need for Client Integration Plugin (hooray!) and support for standalone VMware Remote Console.

Associated organizations from multiple or single (new in 9.1) vCloud Director instances now have aggregated view of all Org VDCs with seamless UI redirections between instances.

SDK for UI Extensibility has been released which means the service provider can extend the UI with additional sections to provide access to new services. The SDK includes very simple example of a static page extension (e.g. terms of service, links to other services or price lists) and upcoming vCAT-SP whitepaper will show how to do more complex ones.

The H5 UI is now also used in provider context but only for new features related to vRealize Orchestrator extensibility configuration.

Both legacy UIs (provider and tenants) are still available until the full feature parity is achieved.

vRealize Orchestrator Integration

Updated vRealize Orchestrator plugin has been released. This means both providers and tenants can automate and orchestrate repeating tasks in vCloud Director.

What is completely new is the ability to integrate any vRealize Orchestrator workflows into vCloud Director UI and essential provide XaaS (anything as a service). Similar to vRealize Automation XaaS.

External Tools

Not specifically tied with vCloud Director 9.1 but fully supported now are:

  • vcd-cli Linux command line tool to easily trigger or script common vCloud Director tasks (both for provider and tenant).
  • Container Service Extension Ability to extend vCloud Director to be target for deployment of Kubernetes clusters for tenants and simple management through CLI.
  • Object Extensibility SDK
  • Security Hardening Guide (long overdue update): PDF HTML

Other Features and Changes

  • Org VDC network disconnection and connection from Org VDC Edge Gateway. This allows movements of network between Edges without impact on the connected VMs.
  • NFV features (SR-IOV, Large Page VMs, and guaranteed sensitivity support).
  • FIPS mode can be enabled on Org VDC Edge Gateways
  • vCloud Director 9.1 maximums now include support for 150 ms RTT latency between management components (vCloud cells, vSphere, NSX) and 2000 Edges in single vCloud Director instance
  • VM monitoring metrics now via API provide also roll up metrics for vApp, Org VDC and Org objects.
  • Multisite enhancements: associations of orgs within the same vCloud Director instance, multisited API queries and rollup calls
  • Removed support for old vCloud API versions (1.5, 5.1): make sure you update your scripts or applications to use at least version 5.5 APIs (e.g. Usage Meter).
  • vCloud Director no longer registers itself as an extension to resource vCenter Servers (upgraded instances will not delete the extension registration).

What’s New in vCloud Availability 2.0.1

Minor patch of vCloud Availability 2.0.1 was released last week. Besides many bug fixes, improved documentation and support for Cassandra version 3.x I want to highlight two undocumented features and add upgrade comment.

Provider vSphere Web Client Plugin

This is a return from 1.0 version of an  experimental feature, where the provider can monitor state of vSphere Replication Manager Server, vSphere Replication Servers and all incoming and outgoing replications from inside the vSphere Web Client plugin in the particular (provider side) vCenter Server. This is especially useful for quick troubleshooting.

vRMS Status
vRS Status
Replication Status

Complex vSphere SSO Domain Support

Although it is not recommended to have multiple vCloud Director / vCloud Availability instances sharing the same vSphere SSO domain, it is now possible to accommodate such scenario. The reason why it is not recommended is, that it creates unnecessary dependency between the instances, limits upgradability and scale of each instance.

Upon startup vSphere Replication Cloud Service (vRCS) is querying SSO Lookup Service for Cassandra nodes and resource vCenter Servers. In order to limit the scope of such query to only those that belong to the particular vCloud Availability instance, create text file /opt/vmware/hms/conf/sites on all vRCS nodes with SSO site names that should be queried (one line per site).

Upgrade Options

You can upgrade to vCloud Availability 2.0.1 both from version 1.0.x and 2.0, however you need to use different upgrade ISO images for upgrading of the replication components (vRMS, vRCS and vRS). The installer and UI appliances are redeployed fresh as they are all stateless.

 

Layer 2 VPN to the Cloud – Part II

Almost 3 years ago I have published an article how to set up layer 2 VPN between on-prem vSphere environment and vCloud Director Org VDC.

As both vCloud Director and NSX evolved quite a bit since to simplify the whole set up, here comes the part II.

Let me first summarize the use case:

The tenant has an application that resides on 3 different VLAN based networks running in its own (vSphere) datacenter. The networks are routed with existing physical router. The tenant wants to extend 2 of these networks to cloud for cloud bursting or DR purposes, but not the 3rd one (for example because there runs a physical database server).

The following diagram shows the setup.

The main advancements are:

  • vCloud Director natively supports NSX L2 VPN (VCD 8.20 or newer needed).
  • NSX now (since 6.2) supports configuration of unstretched networks directly (no static routes are necessary anymore)
  • This means the full setup can be done by the tenant in self-service fashion

Here are the steps:

  • The tenant will deploy freely available NSX Standalone Edge in its datacenter connected to trunk port with 2 VLANs mapped (10 and 11). Additional network configuration is necessary (forged transmits and promiscuous mode or sink port creation – see the link)
  • In the cloud Org VDC tenant deploys two routed Org VDC networks with identical subnets and gateways as networks A and B. These networks must be connected to the Org VDC Edge GW via subinterface (there can be up to 200 such networks on single Edge). The Org VDC Edge must have advanced networking enabled.
  • Tenant enables and configures L2VPN server on its Org VDC Edge GW. Note that this is a premium feature that the service provider must enable in Organization first (see this blog post).
  • Before the L2VPN tunnel is established the following must be taken into account:
    • The Org VDC Edge GW IP addresses are identical with the on-prem existing physical router. Therefore Egress Optimization Gateway addresses must be entered in the Peer Site configuration. That will prevent the Org VDC Edge GW from sending ARP replies over the tunnel.
    • The same must be performed on the Standalone NSX Edge via CLI (see egress-optimize command here).
    • The non-stretched network (subnet C) must be configured on the Org VDC Edge GW so it knows that the subnet is reachable through the tunnel and not via its upstream interface(s). This option however is not in vCloud UI, instead vCloud networking API must be used.
      Edit 3/26/2018: This does not works for standalone NSX Edges. See the end of the article for more details.
      Alternatively the provider could configure non-stretched network directly in the NSX UI:
    • Finally, the tunnel can be established by configuring L2VPN server details on the on-prem Standalone NSX Edge L2VPN client (endpoint IP, port, credentials, encryption) and providing VLAN to tunnel mappings.
    • Note to find the Org VDC network subinterface tunnel mapping vCloud API must be used again:

Edit 3/26/2018:

After multiple questions regarding unstretched networks and some testing I need to make some clarifications.

The routing of unstretched networks through the tunnel is achieved via static routes configured on the Edge GW. So in principle it still works the same way as described in the original article, the difference doing it via UI/API is that the setting of the IPs and routes is automatic.

The server Edge routing table looks like this:


show ip route

S 0.0.0.0/0 [1/0] via 10.0.2.254
C 10.0.2.0/24 [0/0] via 10.0.2.121
C 169.254.64.192/26 [0/0] via 169.254.64.193
C 169.254.255.248/30 [0/0] via 169.254.255.249
C 192.168.100.0/24 [0/0] via 192.168.100.1
C 192.168.101.0/24 [0/0] via 192.168.101.1
S 192.168.102.0/24 [1/0] via 169.254.64.194

show ip address

...

17: vNic_4094@br-sub: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:50:56:88:31:21 brd ff:ff:ff:ff:ff:ff
inet 169.254.64.193/26 brd 169.254.64.255 scope global vNic_4094
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe88:3121/64 scope link
valid_lft forever preferred_lft forever

You can see that the 169.254.64.193 IP address was autoassigned to the 4096 tunnel interface and static route was set to route the unstretched network to the other side via IP 169.254.64.194. The assignment of the .194 address on the other Edge will happen only if that Edge is managed by NSX and is actually performing routing! This is in fact not true for the use case above (with standalone Edge and existing physical router). Therefore the following manual approach must be taken:

  1. Create Org VDC transit network with arbitrary small subnet (e.g. 169.254.200.0/29) in the cloud. Assign IP .1 as the gateway on the Org VDC Edge. This network will not be used for workloads, it is used just for routing to unstretched network.
  2. Create corresponding VLAN transit network on-prem. Assign IP .2 as its gateway interface on the existing router (note the IP addresses of the routing intefaces in #1 and #2 are different).
  3. Create L2 VPN tunnel as before, however also stretch the transit network but do not optimize its GW (no need as on-prem and cloud are using different IPs).
  4. Create static routes on the Org VDC Edge GW to route to on-prem unstretched networks  via the 169.254.200.2 transit network router IP.

Note that this is approach very similar to the original blog post. The only difference is that we must create separate transit network as vCloud Director does not support multiple subnets on the same Edge GW interface.