Tag Archives: vCloud Director

Allocation Pool Organization VDC Changes in vCloud Director 5.1.2

This is a follow up article to the original one Allocation Pool Organization vDC Changes in vCloud Director 5.1 to reflect what has changed regarding the subject in the recently released vCloud Director 5.1.2.

One of the new features of vCloud Director v 5.1 was elastic Allocation pool VDC. Elastic means that the VDC can span multiple clusters which simplifies providers capacity management.

The feature required some changes how now elastic VDC maps to vSphere Resource Pools. And these changes were disruptive for some customers upgrading from vCloud Director 1.5. Therefore both vCloud Director 5.1.1 and 5.1.2 tweaked the feature to make those customer happy.

For deep dive how Org VDC allocation types relate to vSphere resource management go to Massimo Re Ferre post here: vCloud Director 5.1(.1) Changes in Resource Entitlements (Updated).

I will just concentrate on the Allocation Pool VDC differencies.

vCloud Director 5.1.0

Allocation pool VDC require new parameter: vCPU speed, which is used to define how much CPU reservation and limit is applied to Org VDC resource pools that can span multiple clusters. Each such resource pool gets reservation and limit based on sum of all vCPUs of deployed vApps in that particular resource pool.

Example: If vCPU parameter is set to 1GHz and I have deployed 3 VMs each with 2 vCPUs and one is placed into one resource pool and the rest to the other, the first resource pool will get 4 GHz limit and the second 8 GHz (reservation is set as a percentage of the limit).

This means that you cannot overallocate Org VDC in terms of vCPUs (max #of vCPUs x vCPU speed = Org VDC CPU allocation) in very similar way the memory could not be overallocated in vCloud Director 1.5.

vCloud Director 5.1.1

As mentioned above some customers complained that the vCloud Director tenants are now constrained in how many vCPUs they can deploy into their Org VDC. Providers tried to fight this with setting very small vCPU speeds, but the problem is that if you have only a few VMs deployed the resource pool limit was very low compared to the allocated Org VDC CPU GHz.

vCloud Director 5.1.1 came with a quick fix. The CPU limit of Allocation pool resource pools was no longer based on number of vCPUs deployed in the resource pool as in 5.1.0, but was the whole Org VDC CPU allocation instead. This means that even the first (and only) deployed vCPU can utilize the full Org CPU Allocation (obviously limited by the physical speed of the core). The downside is that if the Org VDC spans multiple resource pools, the tenant will get more CPU resources then he is entitled to. However as long the provider designed all his Provider VDCs to be backed by only one cluster/resource pool and set low vCPU speed the behavior was very similar to vCloud Director 1.5.

vCloud Director 5.1.2

The problem with the previous approach was that if you upgraded to 5.1.1 you could not revert to 5.1.0 with the truly elastic VDCs if you wanted. That has changed now with 5.1.2.

There is a new “Make Allocation pool Org VDCs elastic” configuration option in System Settings > General > Miscellaneous which gives you the possibility to choose the Allocation Pool behavior.

Allocation Pool Elasticity

When upgrading from vCloud Director 5.1.0 that used Allocation Pool Org VDC spanning multiple clusters this option will be enabled, otherwise it will always be disabled by default.

If it is disabled then the Allocation Pool Org VDCs behave exactly as in vCloud Director 1.5. That means no vCPU speed setting, no spanning of multiple clusters and easy vCPU overallocation.

If the option is disabled the the Allocation Pool Org VDCs behave exactly as in vCloud Director 5.1.1!  So beware – it does not revert to 5.1.0 way of setting the resource pool CPU limit, but uses the 5.1.1 way which results in possibility that tenant will use more CPU resources than is his Org VDC CPU allocation.

Personally I have hoped that the elastic behavior would be exactly as in 5.1.0 which is not the case, but could happen in the future releases.

vCloud Director 5.1 Features and their vSphere Dependency

I see more and more customers are migrating from vCloud Director 1.5 to vCloud Director 5.1. One question they have is: “Do we have to migrate to vSphere 5.1 at the same time”? The answer is definite no. vCloud Director 5.1 supports vCenter 5.0 and ESXi 5.0 and even ESX(i) 4.0U2 if managed by vCenter 5.

I always recommend to upgrade vCloud Director in two phases.

Phase 1 (vCloud Director Upgrade)

  • vCloud Director Cell operating system (RHEL). RHEL 5 is still supported but if customer wants to use RHEL 6 he will need to deploy a new cell as RHEL 5 to RHEL 6 upgrade is not possible.
  • vCloud Director runtime upgrade
  • vCloud Director database schema upgrade
  • vShield Manager upgrade
  • vShield Edges upgrade

Phase 2 (vSphere Upgrade)

  • Installation of SSO
  • Installation of Inventory Service
  • Installation/upgrade of Web Client
  • vCenter Server upgrade
  • ESX hosts upgrade
  • distributed virtual switches upgrade

As the phases can be spread out in time this brings the main topic of the article – which new vCloud Director 5.1 features depend on vSphere 5.1 and will not be available during the time between Phase 1 and Phase 2? I have compiled a table which lists the new vCloud Director features and if that feature will be available with vSphere 5.0 (vCenter 5.0 + ESX 5.0. Note: I don’t dare to consider ESX 4).

Feature

vSphere 5.0

Note
VM Snapshots

Storage Profiles

Elastic VDC

Allocation pool Org VDC type can span multiple clusters. Online migrations and merging of Provider VDCs.
Provider Single Sign On vCenter SSO required
Customer Single Sign On

SSPI, SAML2
VXLAN Networks vSphere 5.1 vmkernel module is required
Storage clusters (SDRS)

VM placement engine leverages SDRS. Migration of linked clones supported. Difference in shadow VM handling¹
New Edge Gateway Features

Performance, HA, Load balancing, DNS relay, Rate limits, Multiple interfaces, IP allocations, SNAT and DNAT rules
Virtual Hardware 9 Requires vSphere 5.1 (64 vCPUs)
Additional Guest OS Support

possibly

Depends on ESX version (Windows 8/2012 requires ESXi 5.0 U1), but Virtual Hardware 9 is recommended (KB 2034491)
NFS VAAI Fast Provisioning Requires vSphere 5.1 (hardware accelerated linked clones)
Clustered database support

¹) With vSphere 5.0 vCloud Director does not use SDRS recommendation for linked clone placement (Fast Provisioning). vCloud Director picks individual datastore and optionally deploys shadow VM. With vSphere 5.1 vCloud Director fully leverages SDRS recommendations, shadow VMs are deployed by vSphere SDRS.

Table in PNG format.

Disclaimer: I don’t claim this table is complete and that it is an official VMware document. If you think something is missing, please comment and I will edit the table.

Edit 27 April 2013: Explained difference in linked clone placement.

vCloud Connector: Public Cloud Transfers with no Private vSphere Environment

I have already blogged quite extensively about vCloud Connector – a tool for transfers of VMs, vApps and catalogs between public and private vSphere or vCloud Director based clouds. This post is dedicated to one particular use case, let’s call it ‘Developer Use Case‘. Here the user is not VI Admin, but a developer that wants to deploy his apps to public clouds. He has no access to private vSphere environment and obviously to vCenter Server.

vCloud Connector (vCC) consists of vCloud Connector Server, vCloud Connector Nodes and vCloud Connector Client. In the past the vCloud Connector was accessible either through vCenter Server plugin or through web interface at http://vcloud.vmware.com/connector. However as of April 2nd 2013 VMware discontinued the web interface access which means that vCenter Server plugin is currently the only option to use vCloud Connector.

The web portal was great for the Developer Use Case. The developer did not need to have access to vSphere, vCenter, vSphere client or even a Windows machine and could use vCloud Connector from his Mac or Linux browser.

What now follows in this article is basically a description how the Developer Use Case can still be fulfilled even after April 2nd. The idea is to use vCenter Server Appliance and deploy it to the cloud together with vCloud Connector Server. This vCenter Server Appliance will not manage any ESX hosts and will be basically used only for the instantiation of the vCloud Connector Client interface. The thick (.NET) vSphere Client is still needed (as at the moment there is no vCC plugin for vSphere Web Client), this also means a Windows OS, so the developer will need either physical Windows desktop, or a virtual one on his PC or running also in the cloud and accessible via RDP.

How to deploy vCenter Server to the public vCloud Director cloud

  1. From VMware website download vCenter Server Appliance. I have used the version 5.1.0b which comes as one large OVA file.
  2. As we cannot import OVA file to vCloud Director we first need to unzip the file to get the OVF format. This can be done easily by adding .zip extension to the downloaded file and using WinZip or similar utility.
  3. Import the OVF file into your organization catalog.
  4. As I also had vCC Server and vCC Node in the catalog I deployed them together into one vApp.
    vCloud Connector vApp
  5. After accepting EULA, selecting Storage Profile and setting hostnames it is important to put the VMs on one internet routable network and manually assign IP addresses from the static pool of the organization VDC network. We cannot just rely on Guest Customization as the IP address assignment is part of the vApp property which is applied when the vApp is deployed. So in my case I used (the default) 192.168.1.0/24 subnet for the org VDC network, where IP 192.168.1.1 was used for the internal interface of the Edge Gateway and IPs 192.168.1.2-192.168.1.4 were used for vCenter Server, vCloud Connector Server and vCloud Connector Node.IP Assignment Static - Manual
  6. On the next page we are presented with the vApp properties. Here we have to again manually assign the correct IP addresses as specified in the previous step. As I am deploying 3 imported vApps at once it looks quite confusing as the properties of each are merged into one screen. The default gateway address is the Edge Gateway internal interface and subnet is 255.255.255.0.Imported vApp Network Properties
  7. After the deployment is finished we will get vCloud Connector vApp which looks like this:vCloud Connector vApp Networking Diagram
  8. We will need to access all three VMs from the internet to configure their Virtual Appliance Management Interface (VAMI) which runs on TCP port 5480. If you have 3 external IP addresses you can set up destination NAT (DNAT) rules for each VM on the Edge Gateway. vCC Node and vCC Server will also need to access internet therefore source NAT (SNAT) rule must be created for them. We could actually get away with just one external IP address: we could use port forwarding for the VAMI interface of each VM runnning on port 5480 (or we could even configure them over console from another VM with supported browsed deployed in the cloud). Please refer to my other post linked at the beginning of the article for the advanced networking information. In my lab I have luxury of 3 external IP addresses represented by 10.0.2.151-10.0.2.153 range.
    NAT rules
  9. Next we need to create firewall rules. As I already mentioned we need TCP port 5480 for VAMI interface. We also need TCP port 443 for vSphere Client connectivity to vCenter and TCP port 443 for incoming and outgoing traffic for vCloud Connector Node.
    Firewall Rules
  10. Now we can start the vApp and start configuring the VMs. I will skip the vCC Server and Node configuration and will focus on the vCenter Appliance part.
  11. The inital configuration of vCenter Appliance is done via browser pointing to the 5480 port. In my case I am accessing the external NATed IP: https://10.0.2.151:5480. Default login is ‘root’ and password ‘vmware’.
  12. After accepting EULA on the Configure Options screen I set custom configuration
    vCenter Configure Options
  13. The I chose embedded database and embedded SSO and did not enable Active Directory.
  14. After the vCenter Server service (together with database and SSO) is started (which takes a while) do not forget to change the default password and optionally disable not needed services.
    vCenter Services
  15. Now we can register vCC Server with this newly deployed vCenter Server. As they are on the same org VDC network I am using vCenter internal IP address (192.168.1.2).
    vCC Server vCenter Server registration
  16. That’s it. Now we can download vSphere Client and connect to the external IP address of vCenter Server and access the vCloud Connector Plugin in the Solutions and Applications section of the vCenter home page.
    vCC Plugin

Licensing

There is one catch. Unlike the standard edition of vCloud Connector, vCenter Server is a licensed product. We can use the evaluation version for 60 days but what happens then? It turns out that even with expired license of vCenter Server you can still access the vCloud Connector plugin. So from technical standpoint it is possible to use vCenter Server without a license with vCloud Connector. Not sure what are the legal implications though (IANAL).

Expired vCenter Server License

VCAP-Cloud Infrastructure Administration Exam Experience

VMware has just now released the second vCloud VCAP (which stands for VMware Certified Advanced Professional) Exam - Cloud Infrastructure Administration (VCAP-CIA). I have blogged about the other one - Cloud Infrastructure Design (VCAP-CID) 7 months ago so I thought it would be good to write about my experience from the beta exam I took 2 months ago as well.

If you are familiar with the vSphere equivalent exam VCAP Datacenter Administration (VCAP-DCA) then you will find it very similar. It is 100% hands on exam with a series of tasks that have to be done on live vSphere/vCloud/Chargeback infrastructure. You start at a control desktop and then from there you get to the lab environment using any client of your choice (SSH, RDP, web browser, vSphere Client).

It is important to note that the lab environment is based on vCloud Director 5.1 (contrary to VCAP-CID where all the questions were based on vCloud Director 1.5). The questions follow the exam blueprint (as usually) quite well so I would recommend to review the blueprint before the exam. As I do vCloud for living it is hard for me to recommend training or studying resources – I thing very good hands on experience with vCloud Director, vShield Manager, Chargeback and RHEL is necessary to be successful. Only hard prerequisite is to be VCP and it does not matter if VCP-DCV (the former vSphere VCP-DV), VCP-DT (View) or VCP-Cloud. If you are not VCP-Cloud passing VCAP-CIA (or VCAP-CID) will give you VCP-Cloud certification by default.

The 29 questions are supposed to be solved in sequence as they are sometimes related and it is not possible to easily jump from one to another. You have to go forward or backward one question at a time. Some questions might be skipped though. Standard VMware documentation in PDF form is accessible during exam however VMware KB is not.

I had major problem with latency which showed its ugly head when typing “vmware1!” password which was universally used. The last character ‘!’ requires two keystroke pressed at the same time which did not work most of the time. That was pretty annoying during tasks that took a few minutes and failed because of wrong password. At the end I used copy – paste for the password – knowing this upfront it would save me quite a lot of time. Besides confusing wording of another question I had no real issues with the lab environment.

I have to say I really enjoyed the exam tasks and was impressed how the lab was set up. It is obvious it required some effort from the exam makers and I applaud this. I commented on my issues and as it was a beta it the final release might be different.

I am now eager for the result in order to take the next step to VCDX-Cloud.

vCloud Director and Single-Sign-On (SAML)

In December last year i wrote a blog post about vCloud Director and SSPI Authentication. In the post I stated that besides using SSPI – which is Microsoft proprietary interface on top of Active Directory, the tenants can use Security Assertion Markup Language (SAML) standard to integrate with their identity provider. VMware has tested SAML2 integration with OpenAM (described in detail in vCloud Architecture Toolkit Implementation Examples) and Active Directory Federation Services (ADFS). However just recently there appeared another supported identity provider – our own VMware Horizon Workspace. The following whitepaper describes the integration in detail: Using VMware Horizon Workspace to Enable SSO in VMware vCloud Director 5.1.

In this post I will provide short step-by-step description of all the necessary steps that you as the vCloud Organization Administrator must take. The assumption is that you have on premise Horizon Workspace integrated with company Active Directory and want to use it for connecting private or public vCloud Director organizations.

  1. Download Horizon Identity provider metadata XML file from: https://<horizon_workspace_URL>/SAAS/API/1.0/GET/metadata/idp.xml
  2. In the target cloud go to Administration > Settings > Federation menu and check Use SAML Identity Provider and upload the idp.xml file
  3. Still on the same page regenerate the certificate and click apply
  4. Download the certificate from the url: https://<vcloud_URL>/cloud/org/<orgname>/saml/metadata/alias/vcd
  5. Log out from the cloud
  6. Log back in, you will need to change the URL to go directly to the local authentication: https://<vcloud_URL>/cloud/org/<orgname>/login.jsp
  7. In the Administration > Members > Users (or Groups) import Users (or Groups) by clicking the icon with arrow. Change the Source to SAML and type the user names or group names.
  8. Back in Horizon Workspace admin interface create a new Web Application in the catalog
  9. Fill in the following data:
    • Authentication Profile: SAML 2.0 POST profile
    • Login Redirection URL: https://<vcloud_URL>/cloud/org/<orgname>/
    • Check: Include Destination
    • Check: Sign Response
    • Check: Sign the Assertion
    • Configure via Metadata XML
    • Paste the certificate from point 4 into the Meta-data XML box
    • Add Attribute Mapping as seen in the screenshot
      Attribute Mapping
    • Save the page
  10. Edit the newly created Web Application and assign Entitlements (either specific users or a group). These should be the same users as in step 7.
  11. Now log into the Horizon as the entitled user and click the application icon. You should now get direct access into the vCloud Director.

Horizon Workspace

vCloud Connector 2.0 Observations

I have been playing with vCloud Connector 2.0 which is already a third release of the tool that enables VM (vApp) transfers between various VMware clouds based on vSphere or vCloud Director. Here follows a bunch of notes that I came up with that might help others. Note I expect that the reader is familiar with basic functionality and architecture of vCC.

Compatibility

vCloud Connector 2.0 (vCC) is backward compatible with vCloud Director 1.5, however some advanced feature like Content Sync will work in such deployments with limitations (modified templates are not removed, but added instead with timestamp in the name) or will not work at all (Stretch Deploy). vCloud Connector 1.5 does not work with vCloud Director 5.1.

Architecture and Network Flows

The architecture is similar to vCC 1.5, but the ports have changed (port 8443 not needed anymore). Here is the picture from the Installing and Configuring vCC 2.0 guide and let me dive deeper into some of the network flows (black circled numbers).

vCC Data Flows

Flow 1

Although vCC Server has a web interface (VAMI) available at port 5480, the interface can be used only for appliance configuration and setting up connectivity to vCC Nodes. Internet Explorer is recommended to use here as Firefox or Chrome did not display some VAMI interface parts properly. The vCC Advanced Edition license key is entered here and SSL certificates related to the vCC Client – vCC Server communications can be enabled or generated and uploaded here as well. VAMI interface always uses https (on port 5480) with self signed certificate by default that cannot be replaced via the GUI, but if required it can be replaced in the following file on the appliance:

/opt/vmware/etc/lighttpd/server.pem

vCC end-users will use either vSphere client (the .NET version, as web client is not supported yet) or vcloud.vmware.com portal for actual management of the vApp transfers and other vCC features.

Although the picture shows port 80, if SSL is enabled, 443 will be used instead. If the replaced certificates do not use intermediate CA, the web interface cannot be used for their import and java keytool command must be used instead from appliance CLI as described in the installation guide. For some reason I was not able to create the private key with the GUI, but java keytool did the job.

Flow 2

vCC Server needs to be able to reach all vCC Nodes so the arrow should be also between vCC Server and the vCC Node in the destination cloud. Again the communication can go over port 80 or 443 depending on SSL configuration – this time on the vCC Nodes. The same caveat as with Server applies when replacing the certificates.

Note: Enabling SSL is recommended here as discussed below.

Flows 3 and 6/7

Prior attaching vCC Nodes to vCC Server, the Nodes need to be configured to connect to a cloud (vCenter or vCloud Director). SSL (port 443) is always used, but if you do not want to enable the Ignore SSL Cert checkbox, vCenter or vCloud need to have CA signed certificates. If you are using enterprise CA, you have to import the CA root certificate to a different keystore than the one available in the GUI as described in KB 2045007.

Flow 5

For the inter node communication, one node is designated as the Controller. The Controller initiates the connection. Which node is picked as the Controller depends on the Public checkbox setting in the vCC Node registration at vCC Server.

Public vCC Node

Public vCC Node

It is expected that the non Public vCC node is unreachable from outside and therefore has to be the initiator of the communication between nodes and is therefore the Controller. So the Controller Node works either in push or pull mode. If the other Node has SSL enabled (see Flow 2), https port 443 is used for the transfer between nodes. If the other Node does not have SSL enabled the transfer will fail.

Shared Node

This is a new feature for vCloud Director deployments when the provider can deploy shared (multitenant) node and connect it to provider’s cloud. The tenants then do not have to deploy their own nodes inside their organization VDCs with all the troubles of securing connectivity and appropriate transfer storage.

In highly secure public clouds it gets tricky where and how to deploy the shared node. It cannot be load balanced, but provider can deploy multiple shared nodes and give their IP addresses only to specific groups of tenants. The node should be as close as possible to the vCloud API somewhere in DMZ accessible from the internet but if Web Application Firewall is used it should still be in-between the node and API as it could be used as an attack vector to other organizations. The node does not keep any vCloud credentials for communication with vCloud API. Those are transferred by vCC Server so again SSL should be enabled (see Flow 2).

Content Sync

This is a new feature, great for maintenance of catalogs in various clouds or might be also used by the provider for management of public catalog directly from vSphere where a vSphere folder is automatically synced with a vCloud Director catalog. Note however that vCC Advanced Edition license is needed which currently cannot be obtain with provider VSPP license, but only through vCloud Suite bundles.

The default polling interval for synchronization is 6 hours. It can be changed but the change is unsupported. Following file can be edited on the vCC Server:

/usr/local/tcserver/vfabric-tc-server-standard/server/webapps/agent/WEB-INF/spring/appServlet/task.xml

Look for <property name=”jobExecutionIntervalInMinutes” value=”360″ />.

Although the documentation states that ports 8443 and 8080 are used for Content Sync, my understanding is that they are used only internally on the vCC Server.

Stretch Deploy

This is again a new but licensed feature which enables migration of VMs between clouds without needing to change their IPs or MAC addresses thanks to a VPN connection (not VXLAN as often confused) which is established between the original and destination cloud. The SSL VPN is established between Edges on the vApp networks so this is quite different from the Site-to-Site IPSec VPN between Edge Gateways even though it shows up in vShield Manager in the IPSec VPN section. Its configuration is not exposed in the vCloud Director GUI at all.

Stretch Deploy Site-to-Site SSL VPN

Stretch Deploy Site-to-Site SSL VPN

There is quite a long list of prerequisites to get this working and some of them are out of tenant’s controls as they relate to the provider’s vCloud Director architecture – e.g. vSphere and vCloud Network and Security versions and type of distributed virtual switch used. Stretch Deploy will not work with Cisco Nexus 1000V switch. The tenant most likely will not know which switch the provider is using until he will experience following error:

Unable to update network “Stretched_VM_network”.
java.util.concurrent.ExecutionException: com.vmware.vcloud.fabric.nsm.error.VsmException: VSM response error (100): A specified parameter was not correct.
selectionSet.dvsUuid

The source and transferred destination VMs need to be connected to vSphere Distributed Switch 5.1 as the unclaimed traffic needs to be sent over the SSL tunnel and this is not currently supported with other virtual switches.

The Stretch Deploy process is relatively complicated with many actions that are happening in the background. This is all abstracted from the user as he can start the process easily from vCC GUI. However if you want to end the stretch deploy by removing the remote VM, or bring it home back this must be done manually.

Stretch Deploy activities as seen by the vCenter (note both source and destination clouds were managed by the same vCenter)

Stretch Deploy activities as seen by the vCenter (note both source and destination clouds were managed by the same vCenter)

Delete Stretch Deployed VM

  • stop the remote vApp, this will terminate the VPN connection and destroy the vApp Edge
  • delete the remote vApp
  • delete the IPSec configuration in the vSphere cloud Edge in vShield Manager
  • delete vCenter custom attributes of the VM which was stretched deployed (DatacenterExtendedEntityId, DatacenterExtensionRole)

Bring Home the Stretch Deployed VM

This must be done by running a script from the vCC Node managing the private cloud. So an access to the Node is needed, the script needs to be untared and quite a lot of information must be typed in when executed and their correctness is not verified until they are all typed in. This is definitely not task for an average user and I expect this part might be improved in later releases.

Graceful Shutdown of vCloud Director Cell

I have been challenged by one of my customers how to properly shutdown vCloud Director cell without any disruption of the service if multiple cells are used. Although we have KB article 2034994 about this particular subject it omits some important details.

When vCenter is connected to vCloud Director a VC Proxy service is started on one of the cells. The service is responsible for monitoring of active vCenter tasks and inventory updates which are then shared with other cells. Unless there is a network partition between the cells there is always one vCenter proxy service for one vCenter. Multiple VC Proxies can run on one cell. You can see which cell is running the VC Proxy service at the vCenter screen in the vCloud Director Admin interface.

vCenter Proxy

The screenshot shows two vCenters connected to vCloud Director with one having its vCenter proxy on vcloud1 cell and the second on vcloud2 cell.

If the VC Proxy service is not running most of the activities in the vCloud Director that require vCenter will not work properly. For example simple creation of a vApp with one VM will fail with message:

Folder vApp_system_34 (8ce90b57-da8b-4714-914b-5073457155b0) does not exist in our inventory, but vCenter Server claims that it does.

This is because the inventory listener on the VC Proxy was not running and vCloud Director could not verify successful creation of vApp folder in vCenter. When a cell with VC Proxy service dies the service fails over to a surviving cell. However that failover takes 5 minutes which is govern by vcloud:vcloud.heartbeat.failoverTimeoutMsecs property (stored in vCloud Director database). I am not aware if it is supported to change this value.

Anyway in order to shutdown a cell gracefully we need to move the VC Proxy service to another cell. This can be done by simple reconnect of vCenter and the move is very quick without any disruption of the running tasks.

Reconnect vCenter

Reconnect vCenter

I have observed that if possible different cell then the original and the least loaded (in terms of number of VC Proxy services) is chosen. This is also good for manually distributing the load if there are multiple vCenters and multiple cells (good practice is to have at least N+1 cells, where N is number of vCenters).

So what should be the correct graceful cell shutdown procedure?

  1. Make sure the cell is not running any VC Proxy service. No checkmark should be in the vCenter column of the Cloud Cells inventory in the vCloud Director Admin interface.Cloud CellsIf yes, then reconnect vCenters that have VC Proxy running on the cell.
  2. Quiesce the cell with the cell-management-tool:

    $VCLOUD_HOME/bin/cell-management-tool -u <user> cell –quiesce true

    where <user> is vCloud administrator username

  3. Monitor the number of outstanding active tasks on the cell and wait until it reaches 0.

    $VCLOUD_HOME/bin/cell-management-tool -u <user> cell –status
    Job count = 0
    Is Active = false

  4. Shutdown the cell. This can be done also with cell-management-tool. What I noticed is that it takes multiple attempts (usually two), as the first time only the watchdog service is terminated.

    # $VCLOUD_HOME/bin/cell-management-tool -u <user> cell –shutdown

    # service vmware-vcd status

    vmware-vcd-watchdog is not running
    vmware-vcd-cell is running

    # $VCLOUD_HOME/bin/cell-management-tool -u <user> cell –shutdown
    # service vmware-vcd status

    vmware-vcd-watchdog is not running
    vmware-vcd-cell is not running

Rate Limiting of External Networks in vCloud Director and Nexus 1000V

There is a new feature in vCloud Director 5.1 which was requested a lot by service providers – configurable limits on routed external networks (for example Internet) for each tenant. Limits can be set both for incoming and outgoing directions by vCloud Administrator on tenant’s Edge Gateway.

Edge Rate Limit Configuration

Edge Rate Limit Configuration

However this feature only works with VMware vSphere distributed switch – it does not work with Cisco Nexus 1000V or VMware standard switch. Why? Although the feature is provided by the Edge Gateway, what is actually happening in the background is that vShield Manager instructs vCenter to create a traffic shaping policy on the distributed vswitch port used by the Edge VM.

vSphere Distributed Switch Traffic Shaping

vSphere Distributed Switch Traffic Shaping

Standard switch does not allow port specific traffic shaping and Nexus 1000V management plane (Virtual Supervisor Module) is not accessible by the vShield Manager/vCenter. The rate limit could be applied on the port of the Cisco switch manually, however any Edge redeploy operation, which is accessible by the tenant via GUI would deploy a new Edge and use different port on the virtual switch and tenant could thus easily disable the limit.

For the standard switch backed external network vCloud Director GUI will not even present the option to set the rate limit, however for the Nexus backed external network the operation will fail with similar error:

Cannot update edge gateway “ACME_GW”
java.util.concurrent.ExecutionException: com.vmware.vcloud.fabric.nsm.error.VsmException: VSM response error (10086): Traffic shaping policy can be set only for a Vnic connected to a vmware distributed virtual portgroup configured with static port binding. Invalid portgroup ‘dvportgroup-9781′.

Nexus 1000V Error

Nexus 1000V Error

Btw the rate limit can be set on the Edge (when not using vCloud Director) also via vShield Manager or its API – it is called Traffic Shaping Policy and configurable in the vSM > Edge > Configure > Interfaces > Actions menu.

vShield Manager Traffic Shaping

vShield Manager Traffic Shaping

Do not forget to consider this when designing vCloud Director environments and choosing the virtual switch technology.

How to Change vCloud Director GUI Language

Just a brief reminder mostly for myself how to change vCloud Director GUI Language without changing the web browser language.

Just append /?locale=xx_XX at the end of the URL, where xx_XX is derived from the following supported languages:

  • English: /?locale=en_US
  • German: /?locale=de_DE
  • French: /?locale=fr_FR
  • Japanese: /?locale=ja_JP
  • Korean: /?locale=ko_KR
  • Chinese: /?locale=zh_CN

Example: https://vcloud.fojta.com/cloud/org/tom/?locale=ja_JP

vCloud Director: Online Migration of Virtual Data Center

vCloud Director completely abstracts underlying virtual resources from the consumers who get compute and storage resources represented by virtual datacenters (VDC) with given tiered profile (e.g. gold – silver – bronze). However the provider must care about the actual physical hardware and from time to time is facing the issues of upgrades and migrations.

Fortunately it is possible to migrate whole Provider VDCs non disruptively from the old hardware to a new one with no or minimal impact on the vCloud customers with no downtime or their VMs running in the cloud.

vCloud Director 5.1 has two features that help to accomplish this: elastic VDC (VDCs spanning multiple vSphere clusters) and merging of Provider VDCs. I already wrote about elastic VDCs in the post about Allocation Pool changes so please read that article first if that concept is new to you.

The online migration process from the old to the new hardware in high level works like this:

  • Let’s say that the Provider VDC (PVDC) called GoldVDC is backed by Cluster1 consisting of old hardware.
  • A new Cluster2 is created with new hardware.
  • A new PVDC is created – let us call it GoldVDCnew from the Cluster2 and merged with Cluster1. Although we could add the new Cluster2 directly to the GoldVDC this would not allow us to retire the old hardware as it is not possible to detach the primary resource pool from a PVDC.
    Merge PVDCs
  • We can now rename PVDC GoldVDCnew to GoldVDC and disable Cluster1. This has no impact on running VMs however any newly deployed or power on VMs are already placed to Cluster2.
    Disable Resource Pool
  • Now we have to migrate all the workloads from Cluster1 to Cluster2 and then detach the Cluster1 from the PVDC.
    Detach Resource Pool

The actual migration between Clusters (or resource pools) has to take into account following 5 resources that exist in the VDCs – vApps, catalog templates, catalog images, Edge Gateways and vApp Edges.

vApps

vCloud Director actually does not use vSphere vApp objects. vCloud vApps are from the point of view of vSphere infrastructure just a collection of VMs. So we just need to migrate the VMs. This cannot be done from within vSphere because vCloud Director keeps track in which resource pool each VM is placed. Additionally vCloud Director also needs to apply proper resource pool reservations and limits based on the org VDC allocation type. There is however migrate option in vCloud Director that can be used. This can be done from GUI or with API (see the end of this article). Note: the migration leverages vMotion with shared storage. It is not possible to migrate this way between clusters without shared storage even though vSphere 5.1 has so called Enhanced vMotion (aka shared nothing vMotion).

Migrate VMs

Catalog Templates

Migration of catalog templates is more difficult. Again the vCloud template is quite different from the vSphere template. vCloud templates are basically powered off VMs. Although migration at the vSphere level seems not as harmful as in the previous case, because no resource pool settings must be configured (catalog VMs are never powerd on), we would still encounter a problem when we would try to detach Cluster1 as vCloud Director keeps track of VM to Resource Pool associations.

Unfortunately the GUI migration process from the step 1 cannot be used. The GUI workaround is to open each catalog and move each catalog VM to the same catalog. This basically creates a clone of the VM which gets registered to a Cluster2 host and the original VM is deleted. This is however very expensive operation from the storage perspective. The cloning operation needs temporarily extra space and creates quite a lot of I/O storage traffic. Fast provisioning (linked clones) can help here.

The second alternative is to use the same API call as in the first case. Although this is not documented it works (see the example at the end of the article).

Catalog Media

ISO or FLP images are stored on vSphere datastores in special folder <VCD Instance Name>/media/<org ID>/<org VDC ID>/media-<media ID>.<ISO|FLP>. vSphere (vCenter) does not keep track of these objects in any way. vCloud Director stores the datastore moref and the media folders in its database. The media upload is done by the vCloud Director Cells with NFC – VMware Network File Copy protocol via any ESX host that is connected to the datastore. Therefore as long as the Cluster2 has access to the media datastores nothing needs to be migrated.

vShield Edges

Gateway and vApp Edges are always running VMs placed in System vDC resource pool in Cluster1. If a vApp with routed vApp network is powered off the particular vApp Edge is destroyed. When the vApp is started again a new vApp Edge is deployed with identical configuration and would be placed into the new Cluster2. Simple vMotion between cluster seems to work at first but is definitely not recommended. vShield Manager keeps track to which cluster is each Edge deployed. Any major Edge configuration change (upgrade to new version or upgrade ot full configuration) would try to deploy the Edge to the original cluster.

Edge redeploy is an operation with minimum impact on the network flows going through the virtual router. A new Edge VM is deployed by vShield Manager, identical configuration is pushed to it and then networks are disconnected from the old Edge and connected to the new one. This might have impact on loosing an IPsec VPN connection or load balanced session otherwise the disruption is minimal. The Edge redeploy however cannot be done directly from vShield Manager (too bad as there is a nice script for this: see KB 2035939) because vShield Manager knows nothing about restrictions made in vCloud Director on the PVDC (the disabled Cluster1). Edge Gateway redeploy and routed/fenced network reset must be done from vCloud Director. This can be done from the GUI (however it is not trivial to find all the running vApp Edges) or with vCloud API.

Other Considerations

There are some limitations or considerations that need to be taken into account:

  • VDC elasticity currently (version 5.1) works only within vCenter Datacenter domain and all clusters need to use the same distributed switch for external networks and network pool.
  • Reservation allocation type Org VDCs do not currently support elasticity of VDC (those workloads cannot be migrated).
  • Both clusters should have access to the same storage. If storage migration is required do it as independent second step.
  • vSphere vMotion restrictions apply: if the new hardware has newer generation CPU leverage EVC and lower the compatibility of the new cluster to the old hardware. Once the old hardware is retired the EVC mode can be changed and any restarted (full power cycle required) or new VMs can take advantage of it. Obviously migrations between different CPU architectures is not possible (AMD vs Intel),
  • 1 GHz of old CPU is not equal to 1 GHz of a newer generation CPU. Therefore do not mix them in elastic VDCs unless for above mentioned migration reasons. This could also impact Chargeback – customer will get different (higher) performance for the same cost.

vCloud API Examples

As mentioned above vCloud VMs can and vCloud template should be migrated with an API call. The request looks like this:
POST API-URL/admin/extension/resourcePool/id/action/migrateVms with Request body containing MigrateParams.

VM migration example:

Migrate VM API

Template migration example

Migrate Template API