Availability Zone Design in vCloud Director


Service providers with multiple datacenter often want to offer to their customers the choice of having multiple virtual datacenters from different availability zones. Failure of one availability zone should not impact different availability zone. The customer then can deploy his application resiliently in both virtual datacenters leveraging load balancing and application clustering.

Depending on the distance of datacenters and network latency between them it is possible to have multiple availability zones accessible from within single vCloud Director instance which means one single GUI or API endpoint and very easy consumption from customer’s perspective. Read vCloud Architecture Toolkit – Architecting vCloud for more detail on latency and supportability considerations.

Multiple vCenter Server Design

Typical approach in single instance vCloud Director is to have for each availability zone its own vCenter Server and vCNS Manager. vCloud Director in version 5.5 can connect up to 20 vCenter Servers.

Following diagram shows how the management and resource clusters are typically placed between two sites.

Multi vC Design

Each site has management cluster. The shared cloud management VMs (vCloud Director cells, databases, Chargeback, AMQP, etc) run primarily from site 1 with failover to site 2. Provider VDC management resources (vCenter Server, vCNS/NSX Managers, databases) are distributed to each site. There is no sharing of resource group components which makes very clean availability zone design.

One problem for the customers is that they cannot stretch organization VDC networks between the sites. The reason for this is that although VXLAN networks could be stretched over routed Layer 3 networks between sites, they cannot be stretched between different vCenter Servers. Single vCNS/NSX manager is the boundary for VXLAN network and there is 1:1 relationship between vCenter Server and vCNS/NSX Manager. This means that if the customer wants to achieve communication between VMs in each of his VDCs from different availability zones he has to create Edge Gateway IPsec VPN or provide external network connectivity between them. All that results in quite complicated routing configuration. Following diagram shows the typical example of such setup.

VDC design in multi vCenter Provider VDC

Single vCenter Server Design with Stretched Org Networks

I have come up with an alternative approach. The goal is to be able to achieve stretched OrgVDC network between two sites and have only one highly available Edge Gateway to manage. The desirable target state is shown in the following diagram.

VDC design with streched network

To accomplish this we need only one Resource group vCenter Server instance and thus one VXLAN domain while still having the separation of resources into two availability zones. vCenter Server can be made resilient with vSphere HA (stretched cluster), vCenter Server Heartbeat or Site Recovery Manager.

Could we have the same cluster design as in multi-vCenter scenario with each Provider VDCs having its own set of clusters based on site? To answer this question I first need to describe the VXLAN transport zone (VXLAN scope) concept. VXLAN network pools created by vCloud Director have only Provider VDC scope. This means that any Org VDC network created from such VXLAN network pool will be able to span clusters that are used by the Provider VDC. When a cluster is added or removed to or from Provider VDC, the VXLAN transport zone scope is expanded or reduced by the cluster. This can be viewed in vCNS Manager or in NSX – Transport Zone menu.

VXLAN - Transpost zone

There are two ways how to expand the VXLAN transport zone.

Manual VXLAN Scope Expanding

The first one is simple enough and involves manually extending the VXLAN transport zone scope in vCNS or NSX Manager. The drawback is that any reconfiguration of Provider VDC clusters or resource pool will remove this manual change. As Provider VDC reconfiguration does not happen too often this is viable option.

Stretching Provider VDCs

The second solution involves stretching at least one Provider VDC into the other site so its VXLAN scope covers both sites. The resulting Network Pool (which created the VXLAN transport zone) then can be assigned to Org VDCs needing to span networks between sites. This can be achieved with using multiple Resource Pools inside clusters and assigning those to Provider VDCs. As we want to stretch only the VXLAN networks and not the actual compute resources (we do not want vCloud Director deploying VMs into wrong site) we will have site specific storage policies. Although a Provider VDC will have access to Resource Pool from the other site it will not have access to the storage as only storage from the first site is assigned to it.

Hopefully following diagram better describes the second solution:

Stretched PVDC Design


The advantage of the second approach is that this is much cleaner solution from support perspective although the actual setup is more complex.

Highly Available Edge Gateway

Now that we have successfully stretched the Org VDC network between both sites we also need to solve the Edge Gateway site resiliency. Resilient applications without connectivity to external world are useless. Edge Gateway (and the actual Org VDC) is created inside one (let’s call it primary) Provider VDC. The Org VDC network is marked as shared so other Org VDCs can use it as well. The Edge Gateways are deployed by the service provider. He will deploy the Edge Gateway in high availability configuration which will result in two Edge Gateway VMs deployed in the same primary Provider VDC (in System VDC sub-resource pool). The VMs will use internal Org VDC network for heartbeat communication. The trick to make it site resilient is to go into vCNS/NSX Edge configuration and change the Resource Pool (System VDC RP with the same name but different cluster) and Datastore for the 2nd instance to the other site. vCNS/NSX Manager then immediately redeploys the reconfigured Edge instance to the other site. This change survives Edge Gateway redeploys from within vCloud Director without any problems.

HA Edge



21 thoughts on “Availability Zone Design in vCloud Director

  1. Hi,

    Great writeup of how vCloud works with stretching XVLAN. We are currently in the process of designing a stretch to a second site. Do you know if it’s posible to stretch the XVLAN within the same vCenter but different Datacenter objects? or must both clusters reside within the same Datacenter object?

    Kind regards,

    1. You can stretch VXLAN over multiple vSphere Distributed Switches (vDS) where each vDS is in different Datacenter object. However, you cannot stretch Provider VDC over multiple Datacenter objects (clusters residing in different Datacenters) – so this might have impact on the VXLAN networks scope / transport zone created by Provider VDC automatically.

      As vSphere Datacenter is just logical separation I would advise not to use it if you do not need to.

  2. Hi Tom,

    I’m having a hard time getting this configured. We have got two pVDC’s, configured on two separate datacenter objects, each with their own vDS (in the same vCenter). There are two VXLAN network scope’s automatically setup by vCloud. Both bound to their cluster and vDS. I’m not able to create a VXLAN network scope cross the two clusters nor reconfigure the existing ones to stretch both clusters.

    Creating two vDC’s, on each pVDC, and share one’s network to the vOrg does not get populated in the second vDC on the second pVDC.

    It’s hard to find any reference for this to confirm if the setup above is possible. we prefer to have two separate Datacenter objects in vCenter. Can you confirm if the above is possible? Any other thoughts?

    Really appreciate your help!


  3. Yes, ye mentiond in your first reply, we are not spanning a pVDC over the two datacenter objects. We got two pVDC’s, one for each cluster within a seperate datacenter object.

  4. You have two options how to span: by stretching (at least one) PVDC across all clusters – which is not an option if you use multiple datacenter objects as PVDC cannot span them. The other option is manually extend the transport zone which is possible across vDS and datacenters.

    The first option is my preferred and verified in the production.

    1. Hi Tom,

      Thanks for this. I can confirm we have a good working setup with multiple datacenter objects, one vDS per Datacenter, one pVDC per Datacenter/Cluster, and Expanded the default created VXLAN object accross the two datacenter clusters. The “trick” is to go to the Datacenter object in VSM, and choose the oposite VXLAN object, and press the Expand button to stretch the VXLAN to this cluster.

      Again, appricate your help.


  5. The article is very interesting, thank you!
    But I have a question of redundancy. In the case of “Single vCenter Server Design”, what happens if Site A is down? How VMs customers located on the “Provider VDC1” can restart on the “Site B” ?

      1. Ok I understand, but this is only valid for applications developed to be redundant.
        In vCloud Director do we have the ability to create an architecture in which a customer could choose on wich site he created his VM but in case of crash site, it switches to the site still alive ?
        Because we could create a single cluster with the 4 ESXi (both sites are close) and activate the HA / DRS but in this case, how to manage the fact that a customer can manage itself the positioning of these VMs on a site or the other?

    1. Hi Tomas,
      I have several tests to use the metadata but without success …
      In the idea I gave a metadata (key = system.service.vdc.placement) to my 2 ESXi of site 1 (value = DC1) and for my 2 ESXi of site 2 (value = DC2).
      On my vCloud client side I defined 2 VDC (VDC1 & VDC2), and then via vCloud API I configured a metadata (system.service.vdc.placement) for the VDC1 with value DC1 and a metadata for the VDC2 with value DC2.
      Unfortunately when I deploy a VM in one of the VDCs, this key seems to be ignored (despite the Domain visibility configured on “VCENTER”).
      Could this work?
      Thanks a lot

  6. I create system.service.vdc.placement attribute for the type “Global” and configure the Value just on my 4 ESXi Host.

      1. Ok I create my attribute in PowerCli for Host and VM but no difference…
        Did you test the same way by putting your variable directly on the customer VDC ?

    1. Ok, so we do not have the ability as a Service Provider to set in advance and automatically that the VM will be placed on the hosts in a Datacenter1 (part of an extended cluster on the 2 DCs )?
      This modification from an API point of view is only valid for a client who creates his own VM via the APIs, right ?
      Thanks again.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.