vCloud Director Object Storage Extension Reference Design

Just a quick announcement that a vCloud Director Object Storage Extension Reference Design that I wrote is now available at this link.

It deep dives into the use cases, architecture, includes recommended deployment options and description of the new feature of 1.0.1 related to multisite deployments. There are also results of performance tests of the overhead the extension adds over the direct use of the native storage platform.

vCloud Director H5 UI Error: 431 Request Header Fields Too Large

This is just a short blog post to describe an issue you might get with the tenant or portal HTML UI in vCloud Director where you will see errors in the browser related to request header fields too large.

You will see it more likely with Chrome browser and if your cloud domain is shared with other services. The root cause is that the browser API calls will stop working once the request header gets larger than 8 KBs. While 8 KBs seems like big enough size especially as the request headers vCloud Director uses contain only session ID, JWT token and possibly load balancers headers it also includes all the browser cookies applicable to the vCloud Director domain stored by other web services.

The temporary fix is for the end-user to delete her browser cookies. But is there something the provider could do?

In our case we saw the situation where the vCloud Director instance was on *.vmware.com domain and the browser contained lots of large OAM cookies related to VMware Single Sign-On solution. While those cookies are essential for multiple VMware internal applications, there is no reason for vCloud Director to receive them in every API request. One way how to block the cookies and thus decrease the request header size is to remove them at the load balancer. With NSX-V load balancer this can be accomplished by utilizing SSL L7 termination and an application rule (see my older blog post how to configure NSX-V Edge Load balancer).

In my case the application rule I use is:

Update 2019/10/24: The initial rule would remove all Cookies. I have now amended it with another rule that removes all but vcloud_session_id and vcloud_jwt cookies if they are present.

reqirep ^Cookie:\s.*(vcloud_session_id=[^;]*)|(vcloud_jwt=[^;]*) Cookie:\ \1;\ \2
reqidel ^Cookie:.*OAM*

which deletes all cookies from the request header starting with OAM string

 

vCloud Director 10: NSX-T Integration

Intro

vCloud Director relies on NSX network virtualization platform to provide on-demand creation and management of networks and networking services. NSX for vSphere has been supported for long time and vCloud Director allows most of its feature to be used by its tenants. However as VMware slowly shifts away from NSX for vSphere and pushes forward modern, fully rewritten NSX-T networking platform, I want to focus in this article on its integration with vCloud Director.

History

Let me start with highlighting that NSX-T is evolving very quickly. It means each release (now at version 2.5) adds major new functionality. Contrast that with NSX-V which is essentially feature complete in a sense that no major functionality change is happening there. The fast pace of NSX-T development is a challenge for any cloud management platforms as they have to play the catch up game.

The first release of vCloud Director that supported NSX-T was 9.5. It supported only NSX-T version 2.3 and the integration was very basic. All vCloud Director could do was to import NSX-T overlay logical segments (virtual networks) created manually by system administrator. These networks were imported into a specific tenant Org VDC as Org VDC networks.

The next version of vCloud Director – 9.7 supported only NSX-T 2.4 and from the feature perspective not much had changed. You could still only import networks. Under the hood the integration however used completely new set of NSX-T policy based APIs and there were some minor UI improvements in registering NSX-T Manager.

The current vCloud Director version 10 for the first time brings on-demand creation of NSX-T based networks and network services. NSX-T version 2.5 is required.

NSX-T Primer

While I do not want to go too deep into the actual NSX-T architecture I fully expect that not all readers of this blog are fully familiar with NSX-T and how it differs from NSX-V. Let me quickly highlight major points that are relevant for topic of this blog post.

  • NSX-T is vCenter Server independent, which means it scales independently from vCenter domain. NSX-T essentially communicates with ESXi hosts directly (they are called host transport nodes). The hosts must be prepared with NSX-T vibs that are incompatible with NSX-V which means a particular host cannot be used by NSX-V and NSX-T at the same time.
  • Overlay virtual networks use Geneve encapsulation protocol which is incompatible with VXLAN. The concept of Controller cluster that keeps state and transport zone is very similar to NSX-V. The independence from VC mentioned in the previous point means vSphere distributed switch cannot be used, instead NSX-T brings its own N-VDS switch. It also means that there is concept of underlay (VLAN) networks managed by NSX-T. All overlay and underlay networks managed by NSX-T are called logical segments.
  • Networking services (such as routing, NATing, firewalling, DNS, DHCP, VPN, load balancing) are provided by Tier-0 or Tier-1 Gateways that are functionally similar to NSX-V ESGs but are not instantiated in dedicated VMs. Instead they are services running on shared Edge Cluster. The meaning of Edge Cluster is very different from the usage in NSX-V context. Edge Cluster is not a vSphere cluster, instead it is cluster of Edge Transport Nodes where each Edge Node is VM or bare metal host.
  • While T0 and T1 Gateways are similar they are not identical, and each has specific purpose or set of services it can offer. Distributed routing is implicitly provided by the platform unless a stateful networking service requires routing through single point. T1 GWs are usually connected to single T0 GW and that connection is managed automatically by NSX-T.
  • Typically you would have one or small number of T0 GWs in ECMP mode providing North-south routing (concept of Provider Edge) and large number of T1 GWs connected to T0 GW, each for a different tenant to provide tenant networking (concept of Tenant Edge).

vCloud Director Integration

As mentioned above since NSX-T is not vCenter Server dependent, it is attached to vCloud Director independently from VC.

(Geneve) network pool creation is the same as with VXLAN – you provide mapping to an existing NSX-T overlay transport zone.

Now you can create Provider VDC (PVDC) which is as usual mapped to a vSphere cluster or resource pool. A particular cluster used by PVDC must be prepared for NSX-V or NSX-T and all clusters must share the same NSX flavor. It means you cannot mix NSX-V clusters with NSX-T in the same PVDC. However you can easily share NSX-V and NSX-T in the same vCenter Server, you will then just have to create multiple PVDCs. Although NSX-T can span VCs, PVDC cannot – that limitation still remains. When creating NSX-T backed PVDC you will have to specify the Geneve Network Pool created in the previous step.

Within PVDC you can start creating Org VDCs for your tenants – no difference there.

Org VDCs without routable networks are not very useful. To remedy this we must create external networks and Org VDC Edge Gateways. Here the concept quite differs from NSX-V. Although you could deploy provider ECMP Edges with NSX-V as well (and I described here how to do so), it is mandatory with NSX-T. You will have to pre-create T0 GW in NSX-T Manager (ECMP active – active is recommended). This T0 GW will provide external networking access for your tenants and should be routable from the internet. Instead of just importing external network port group how you would do with NSX-V you will import the whole T0 GW in vCloud Director.

During the import you will also have to specify IP subnets and pools that the T0 GW can use for IP sub-allocation to tenants.

Once the external network exist you can create tenant Org VDC Edge Gateways. These will be T1 GWs instantiations into the same NSX-T Edge Cluster as the T0 GW they connect to. Currently you cannot chose different NSX-T Edge Cluster for their placement. T1 GWs are always deployed in Active x Standby configuration, the placement of active node is automated by NSX-T. The router interlink between T0 and T1 GWs is also created automatically by NSX-T.

During the Org VDC Edge Gateway the service providers also sub-allocates range of IPs from the external network. Whereas with NSX-V these would actually be assigned to the Org VDC Edge Gateway uplink, this is not the case with NSX-T. Once they are actually used in a specific T1 NAT rule, NSX-T will automatically create static route on the T0 GW and start routing to the correct T1 GW.

Tenant Networks

There are four types of NSX-T based Org VDC networks and three of them are available to be created via UI:

  • Isolated: Layer 2 segment not connected to T1 GW. DHCP service is not available on this network (contrary to NSX-V implementation).
  • Routed: Network that is connected to T1 GW. Note however that its subnet is not announced to upstream T0 GW which means only way to route to it is to use NAT.
  • Imported: Existing NSX-T overlay logical segment can be imported (same as in VCD 9.7 or 9.5). Its routing/external connectivity must be managed outside of vCloud Director.
  • In OpenAPI (POST /1.0.0/OrgVdcNetwork) you will find one more network type:  DIRECT_UPLINK. This is for a specific NFV use case. Such network is connected directly to T0 GW with external interface. Note this feature is not officially supported!

Note that only Isolated and NAT-routed networks can be created by tenants.

As you can see it is not possible today to create routed advertised Org VDC network (for example for direct connect use case when tenant wants to route from on-prem networks to the cloud without using NAT). These routed networks would require dedicated T0 GW for each tenant which would not scale well but might be possible in the future with VRF support on T0 GWs.

Tenant Networking Services

Currently the following T1 GW networking services are available to tenants:

  • Firewall
  • NAT
  • DHCP (without binding and relay)
  • DNS forwarding
  • IPSec VPN: No UI, OpenAPI only. Policy and route based with pre share key is supported. (Thanks Abhi for the correction).

All other services are currently not supported. This might be due to NSX-T not having them implemented yet, or vCloud Director not catching up yet. Expect big progress here with each new vCloud Director and NSX-T release.

Networking API

All NSX-T related features are available in the vCloud Director OpenAPI (CloudAPI). The pass through API approach that you might be familiar with from the Advanced Networking NSX-V implementation is not used!

Feature Comparison

I have summarized all vCloud Director networking features in the following table for quick comparison between NSX-V and NSX-T.

Update 2019/10/09: Added two entries related to external network metering and rate limiting.

What’s New in vCloud Director 10

With clockwork efficiency after less than 6 months there a is new major release of vCloud Director – version 10. As usual, I will try to summarize all the new functionality compared to the previous release 9.7. I have similar posts about 9.7, 9.5 and 9.1 so you can get quickly up to speed if you are not familiar with them as well.

User Interface

From the tenant UI perspective the HTML5 UI (/tenant) has been evolving to add missing legacy (Flex) UI functionality. You can now customize VM network adapter during VM creation, change user password and user settings.

The top ribbon bar now provides more information and new search option.

New universal tenant login page (/login) was added:

Tenant UI also provides new functionality such as NSX-T network management.

The provider HTML5 UI now contains all the actions the cloud service provider needs to do (various Settings screens, tenant migration, …), so the legacy Flex UI is actually disabled by default. There are still however some missing features like direct VM import from vCenter Server, Org VDC template creation or edit of VM guest properties.

If necessary, you can enable Flex UI with this command (run on any cell and reboot them all):

cell-management-tool manage-config -n flex.ui.enabled -v true

Among some of the new Provider UI features are:

  • compute policy management (VM Sizing Policies and Provider VDC specific VM Placement Policies).

  • NSX-T provider actions such as Geneve network pool creation, import of T0 for external networks and Org VDC Edge Gateway management including quite useful quick external IP addresses sub-allocation (available for NSX-V Edge Gateways in API as well).
  • SDDC Proxy and token management (CPOM feature)

NSX-T Support

As hinted above, NSX-T integration has been improved massively. I am going to deep dive into the topic in a separate article, so let me cover it here very quickly.

In the previous vCloud Director releases the system administrator could only import NSX-T based networks (overlay logical segments) as tenant Org VDC networks and that was it. In the current release the tenants now can create NAT-routed and isolated networks with firewalling, DHCP and DNS forwarding services provided by NSX-T T1 Gateways. The vCloud Director networking objects did not change much which means there should not be major difference between NSX-V backed and NSX-T backed Org VDC from the usability perspective. However, there is not full feature parity between NSX-V and NSX-T functionality; sometimes it is due to NSX-T not providing these features (SSL VPN), sometimes due to vCloud Director not yet caught up. Expect more in the future as this is a journey.

Note: Only NSX-T version 2.5 is supported by vCloud Director 10.0.

API

  • API version has been bumped up to 33.0, while versions 27.0-32.0 are still supported but 27.0 and 28.0 are marked for deprecation.
  • There is a new API authentication mechanism. The OpenAPI provides two different authentication endpoints (one for provider: /cloudapi/1.0.0/sessions/provider the other for tenants /cloudapi/1.0.0/sessions). You can disable for API version 33.0 the old authentication mechanism (/api/sessions) with the following command:cell-management-tool manage-config -n vcloud.api.legacy.nonprovideronly -v trueThis means it is now quite easy with Web Application Firewall to protect the provider API authentication from the internet.
  • OpenAPI provides new (faster) way to collect audit events from vCloud Director via AuditTrail API call. Note that vCloud Director now stores audit events only for limited time in order to keep the database size and query speed manageable.
  • The NSX-T related networking APIs are not pass-through as was the case with NSX-V and instead use the OpenAPI calls.
  • vCloud Director Appliance API: each appliance node now provides its own appliance API to get database state provide by replication manager. It is also possible to remotely execute database standby node promotion  and thus automate database failover with external tooling or load balance to the active database node for 3rd party database usage.
    GET https://<appliance IP>:5480/api/1.0.0/is_primary
    GET https://<appliance IP>:5480/api/1.0.0/nodes

    POST https://<appliance IP>:5480/api/1.0.0/nodes/<node name>/promote

Other Features

  • Improved vRealize Orchestrator (vRO) integration. Two more custom properties vcd_sessionToken and _vcd_apiEndpoint can be passed from vCloud Director to vRO workflow so the workflow during its execution can connect in the particular user context via the vCloud Director Plugin to vCloud Director and provide access only to those objects the user has access to.
    The spelling of two other custom properties was fixed from _vdc_userName and _vdc_isAdmin to _vcd_userName and _vcd_isAdmin (but is still backwards compatible).
    The new vRO vCloud Director Plugin now also supports vRO Clustering so the vCloud Director connection is automatically shared across vRO nodes.
  • RBAC support for NSX-V Edge ECMP and DNS features. The former was asked by many providers in order to keep NSX-V licensing at Advanced edition and not to get accidentally bumped to Enterprise edition if tenant enabled ECMP on its Org VDC Edge Gateway.
  • Legacy Org VDC allocation models can now be changed to flex allocation model which allows for switching allocation models of existing Org VDCs.
  • When system administrator enables Distributed Firewall via UI it is possible to choose if the new tenant firewall section should be created at the bottom (and not on top by default). This was before possible only via API.

  • MS SQL is no longer supported as vCloud Director database. To use vCloud Director version 10.0 you must either use the appliance form factor with its embedded PostgreSQL database or an external PostgreSQL. Migration is supported.
  • Compatible VCD-CLI version 22.0 and pycloud 21.0 SDK were released as well.

Custom Links in the H5 vCloud Director Portal

Just a quick post to elaborate on the feature to add custom links into the H5 vCloud Director Portal that I very briefly mentioned in my What’s New in vCloud Director 9.7 blog post.

The custom links are visible in the drop down under the user name in top right corner. They are specified as part of the branding customization that can be system wide or tenant specific. I have already talked about branding here, so this is just evolution of this feature since version 9.0.

Additionally default links to help, about (under the questionmark icon) and VMRC download (in VM UI element) can be replaced as well through a single API call.

Custom links can contain dynamic elements such as ${TENANT_NAME}${TENANT_ID} and ${SESSION_TOKEN} which enables for example easy redirection to legacy (Flex) portal as is show in my example.

You can also see Section elements and Separators to make the links more organized.

The actual API call that I used in my example:

PUT /cloudapi/branding

Headers:

Content-Type: application/json
x-vcloud-authorization: …

Payload:

{

    "portalName": "vCloud Director 9.7",
    "portalColor": "#323843",
    "selectedTheme": {
        "themeType": "BUILT_IN",
        "name": "Default"
    },
    "customLinks": [
        {
            "name": "Legacy Portal",
            "menuItemType": "link",
            "url": "https://vcloud.fojta.com/cloud/org/${TENANT_NAME}"
        },
        {
            "menuItemType": "separator"
        },
        {
            "name": "Additional Services",
            "menuItemType": "section",
            "url": null
        },
        {
            "name": "Backup",
            "menuItemType": "link",
            "url": "https://fojta.wordpress.com/backup"
        },
        {
            "name": "Monitoring",
            "menuItemType": "link",
            "url": "https://fojta.wordpress.com/monitor"
        },
        {
            "name": "Billing",
            "menuItemType": "link",
            "url": "https://fojta.wordpress.com/bill"
        },
        {
            "menuItemType": "separator"
        },
        {
            "name": "Tools",
            "menuItemType": "section",
            "url": null
        },
        {    "name": "VCD CLI",
            "menuItemType": "link",
            "url": "http://vmware.github.io/vcd-cli/"
        },
        {    "name": "API Documentation",
            "menuItemType": "link",
            "url": "https://code.vmware.com/apis/553"
        },
        {
            "name": "help",
            "menuItemType": "override",
            "url": null
        },
        {
            "name": "about",
            "menuItemType": "override",
            "url": null
        },
        {
            "name": "vmrc",
            "menuItemType": "override",
            "url": null
        }
    ]
}

Tenant specific branding is achieved with the following call:

PUT /cloudapi/branding/tenant/acme