vCloud Director on AWS

My colleague Lyubomir Lyubenov from VMware OneCloud team (OneCloud is internal huge vCloud Director based cloud for field enablement) recently published VCDonAWS CloudFormation templates with which you can deploy vCloud Director management components under 30 minutes on AWS.

I have seen customer (Service Providers) asking what is it for and what it means for the vCloud Director future. Let me give you my own view.

What is it?

vCloud Director is one of a few VMware products that is not provided in form of virtual appliances. The vCloud Director binaries can be installed on any compatible Linux virtual or physical machine and that means it can be installed anywhere – even on EC2 instances running on AWS. The VCDonAWS project in a clever way uses AWS resources (not VMC on AWS!) to deploy vCloud Director management stack from a single CloudFormation template. It leverages VPC (optionally stretched across 2 availability zones) for the networking, EC2 instances for jumphosts and vCloud Director cells, PostgreSQL RDS for vCloud Director database, S3 (S3FS) for vCloud Director transfer share (although this will be in the future replaced with Elastic File System for better performance), Elastic Load Balancers (for UI/API and ConsoleProxy cells) and even Auto Scaling Groups to automatically deploy additional VCD cells. The certificates are provided with AWS Certificate Manager.

The following picture taken from the VCDonAWS website shows the overall architecture.

What is it not?

As you can see above it only deploys the vCloud Director management components. You will still need to attach resource vCenter Servers/NSX Manager pairs and these obviously cannot be running on (native) AWS. You cannot even use VMC on AWS instances (at least not yet) as they have RBAC and VC/NSX inventory access limitations which prevent vCloud Director from working properly.

Cassandra VM metric datastore and RabbitMQ messaging bus optional components are not deployed either although I see no reason why they should not run on AWS.

Is it supported?

No. The deployment uses unsupported OS – Amazon Linux (the CentOS deployment option is not working at the time of writing).


Beside the OneCloud team use case which I cannot speak here about I see it as a very nice proof of concept of how VCD deployment can be automated. How simply it can be done with infrastructure as code approach. And obviously once VMC on AWS restrictions will be resolved these two can be used together to provide multitenant VMware platform IaaS.

Try it yourself!

If you have AWS account try it yourself and really in about an hour you can have a deployed vCloud Director instance.

Here are some tips:

  • Use only US regions as the provided templates do not have AMI mappings for other regions
  • Use Amazon Linux HVM as base operating system for Bastion and cell hosts (CentOS option is not working)
  • For VCD installation ID do not use 7-9 due to bug in verification regex.
  • You will need VCD binary uploaded in an S3 bucket. I used VCD 9.1 GA bits. You will also need working license key.
  • You will need certificate (even self signed) uploaded to Certification Manager.
  • And lastly generate key pair  for accessing bastion hosts and cells.
Cloud Formation Input Dialog



Stack Deployment


vCloud Director Cells


Limit Maximum vCPU/RAM Configuration of vCloud Director VM

Some times ago I wrote about an undocumented feature that allows to limit maximum disk size for VM in vCloud Director. I was asked numerous times if there is similar setting for vCPU and RAM maximums. Today I discovered there is, however it should be considered an experimental feature. I still find it useful as misconfigured VM with extremely large number of vCPUs or huge RAM will impact the host it is running on and cause excessive swapping or high CPU ready times so it is in best interest of the vCloud Director system administrator to prevent it. The other option is to use blocking tasks as described here: CPU and Memory Limit enforcement for vCloud Director and in a blog here.

The limit is set with cell-management-tool command on any cell. Restart of the cell is not necessary.

$VCLOUD_HOME/bin/cell-management-tool manage-config -n vmlimits.memory.limit -v 65536
$VCLOUD_HOME/bin/cell-management-tool manage-config -n vmlimits.cpu.numcpus -v 16

The settings in the example above will limit maximum size of a VM to 16 vCPUs and 64 GB RAM.

Some observations:

  • The limit is vCloud Director instance wide and also applies to system administrators
  • VM with resources set above the limit will fail to be powered on with an error:
    The operation could not be performed, because there are no more CPU resources
    The operation could not be performed, because there are no more Memory resources
  • It can be cheated by using CPU or memory hot add and adding resource beyond the limits to an already powered on VM

Again, consider it an experimental feature and use at your own risk.

What’s New in vCloud Director 9.1

vCloud Director version 9.1 has been released. It has been just 6 months since the previous release (9.0) so VMware is delivering on its promise of multiple yearly releases in 6 months cadence.

In this whitepaper you can find  high level overview of some of the new features. Let me summarize them and also provide additional ones here below.

H5 UI Enhancements

In iterative process the HTML 5 UI is slowly replacing legacy Flex UI. The tenant portion now includes vApp, Catalog and Networking management functionality, OVF/ISO download/uploads without the need for Client Integration Plugin (hooray!) and support for standalone VMware Remote Console.

Associated organizations from multiple or single (new in 9.1) vCloud Director instances now have aggregated view of all Org VDCs with seamless UI redirections between instances.

SDK for UI Extensibility has been released which means the service provider can extend the UI with additional sections to provide access to new services. The SDK includes very simple example of a static page extension (e.g. terms of service, links to other services or price lists) and upcoming vCAT-SP whitepaper will show how to do more complex ones.

The H5 UI is now also used in provider context but only for new features related to vRealize Orchestrator extensibility configuration.

Both legacy UIs (provider and tenants) are still available until the full feature parity is achieved.

vRealize Orchestrator Integration

Updated vRealize Orchestrator plugin has been released. This means both providers and tenants can automate and orchestrate repeating tasks in vCloud Director.

What is completely new is the ability to integrate any vRealize Orchestrator workflows into vCloud Director UI and essential provide XaaS (anything as a service). Similar to vRealize Automation XaaS.

External Tools

Not specifically tied with vCloud Director 9.1 but fully supported now are:

  • vcd-cli Linux command line tool to easily trigger or script common vCloud Director tasks (both for provider and tenant).
  • Container Service Extension Ability to extend vCloud Director to be target for deployment of Kubernetes clusters for tenants and simple management through CLI.
  • Object Extensibility SDK
  • Security Hardening Guide (long overdue update): PDF HTML

Other Features and Changes

  • Org VDC network disconnection and connection from Org VDC Edge Gateway. This allows movements of network between Edges without impact on the connected VMs.
  • NFV features (SR-IOV, Large Page VMs, and guaranteed sensitivity support).
  • FIPS mode can be enabled on Org VDC Edge Gateways
  • vCloud Director 9.1 maximums now include support for 150 ms RTT latency between management components (vCloud cells, vSphere, NSX) and 2000 Edges in single vCloud Director instance
  • VM monitoring metrics now via API provide also roll up metrics for vApp, Org VDC and Org objects.
  • Multisite enhancements: associations of orgs within the same vCloud Director instance, multisited API queries and rollup calls
  • Removed support for old vCloud API versions (1.5, 5.1): make sure you update your scripts or applications to use at least version 5.5 APIs (e.g. Usage Meter).
  • vCloud Director no longer registers itself as an extension to resource vCenter Servers (upgraded instances will not delete the extension registration).

Layer 2 VPN to the Cloud – Part II

Almost 3 years ago I have published an article how to set up layer 2 VPN between on-prem vSphere environment and vCloud Director Org VDC.

As both vCloud Director and NSX evolved quite a bit since to simplify the whole set up, here comes the part II.

Let me first summarize the use case:

The tenant has an application that resides on 3 different VLAN based networks running in its own (vSphere) datacenter. The networks are routed with existing physical router. The tenant wants to extend 2 of these networks to cloud for cloud bursting or DR purposes, but not the 3rd one (for example because there runs a physical database server).

The following diagram shows the setup.

The main advancements are:

  • vCloud Director natively supports NSX L2 VPN (VCD 8.20 or newer needed).
  • NSX now (since 6.2) supports configuration of unstretched networks directly (no static routes are necessary anymore)
  • This means the full setup can be done by the tenant in self-service fashion

Here are the steps:

  • The tenant will deploy freely available NSX Standalone Edge in its datacenter connected to trunk port with 2 VLANs mapped (10 and 11). Additional network configuration is necessary (forged transmits and promiscuous mode or sink port creation – see the link)
  • In the cloud Org VDC tenant deploys two routed Org VDC networks with identical subnets and gateways as networks A and B. These networks must be connected to the Org VDC Edge GW via subinterface (there can be up to 200 such networks on single Edge). The Org VDC Edge must have advanced networking enabled.
  • Tenant enables and configures L2VPN server on its Org VDC Edge GW. Note that this is a premium feature that the service provider must enable in Organization first (see this blog post).
  • Before the L2VPN tunnel is established the following must be taken into account:
    • The Org VDC Edge GW IP addresses are identical with the on-prem existing physical router. Therefore Egress Optimization Gateway addresses must be entered in the Peer Site configuration. That will prevent the Org VDC Edge GW from sending ARP replies over the tunnel.
    • The same must be performed on the Standalone NSX Edge via CLI (see egress-optimize command here).
    • The non-stretched network (subnet C) must be configured on the Org VDC Edge GW so it knows that the subnet is reachable through the tunnel and not via its upstream interface(s). This option however is not in vCloud UI, instead vCloud networking API must be used.
      Edit 3/26/2018: This does not works for standalone NSX Edges. See the end of the article for more details.
      Alternatively the provider could configure non-stretched network directly in the NSX UI:
    • Finally, the tunnel can be established by configuring L2VPN server details on the on-prem Standalone NSX Edge L2VPN client (endpoint IP, port, credentials, encryption) and providing VLAN to tunnel mappings.
    • Note to find the Org VDC network subinterface tunnel mapping vCloud API must be used again:

Edit 3/26/2018:

After multiple questions regarding unstretched networks and some testing I need to make some clarifications.

The routing of unstretched networks through the tunnel is achieved via static routes configured on the Edge GW. So in principle it still works the same way as described in the original article, the difference doing it via UI/API is that the setting of the IPs and routes is automatic.

The server Edge routing table looks like this:

show ip route

S [1/0] via
C [0/0] via
C [0/0] via
C [0/0] via
C [0/0] via
C [0/0] via
S [1/0] via

show ip address


17: vNic_4094@br-sub: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:50:56:88:31:21 brd ff:ff:ff:ff:ff:ff
inet brd scope global vNic_4094
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe88:3121/64 scope link
valid_lft forever preferred_lft forever

You can see that the IP address was autoassigned to the 4096 tunnel interface and static route was set to route the unstretched network to the other side via IP The assignment of the .194 address on the other Edge will happen only if that Edge is managed by NSX and is actually performing routing! This is in fact not true for the use case above (with standalone Edge and existing physical router). Therefore the following manual approach must be taken:

  1. Create Org VDC transit network with arbitrary small subnet (e.g. in the cloud. Assign IP .1 as the gateway on the Org VDC Edge. This network will not be used for workloads, it is used just for routing to unstretched network.
  2. Create corresponding VLAN transit network on-prem. Assign IP .2 as its gateway interface on the existing router (note the IP addresses of the routing intefaces in #1 and #2 are different).
  3. Create L2 VPN tunnel as before, however also stretch the transit network but do not optimize its GW (no need as on-prem and cloud are using different IPs).
  4. Create static routes on the Org VDC Edge GW to route to on-prem unstretched networks  via the transit network router IP.

Note that this is approach very similar to the original blog post. The only difference is that we must create separate transit network as vCloud Director does not support multiple subnets on the same Edge GW interface.

vCloud Director vApp Runtime Lease Expiration Action

In vCloud Director it is possible to configure vApp leases. The maximums are set by system admin at Organization level (in Policies), which can be lowered by Org Admin (at org level) and set by vApp owner at the vApp level. A vApp has runtime lease (for how long it will be in running state) and storage lease (for how long it will consume storage once it is not running).

vApp leases are very useful in test & dev or lab environments to make sure abandoned, unused VMs are not running and taking resources.

When vApp lease is coming to an end, its owner gets a reminder via email (how many days before expiration can be configured in User Preferences) and can optionally reset vApp lease to avoid its stopping or deletion.

By default expired running vApp is put into suspended state which means its memory content is saved to datastores. This ensures fully consistent state upon consequent power on of the vApp. This however make not be always needed especially in dev/lab situations – the memory content could take lots of storage space and for example saving 16 GB RAM VM to datastore could also create IO performance impact. As of vCloud Director 8.20 the Organization Administrator can instead change the default runtime expiry action to power off. The setting is done at Org level and must be done via API by setting the element <PowerOffOnRuntimeLeaseExpiration> of OrgLeaseSettingsType to true. The API version must be at least 25.0.

PUT /api/admin/org/eea1f10c-3fee-43d7-bd8e-be63453d6e34/settings/vAppLeaseSettings


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<VAppLeaseSettings xmlns="">


When the vApp expiry action is set to power off, the actual VM stop action power off (hard) vs shutdown (gracefull) procedure depends on the vApp’s config for each VM (tab Starting and Stopping VMs).

Also note that subsequent edit of Org policies in UI will reset the Org PowerOffOnRuntimeLeaseExpiration setting back to default (false).