vCloud Director 8.20 introduces the possibility to create granular roles at tenant and system level. This is important for service providers who want to differentiate which tenants have access to specific features (for example advanced networking services). This also gives opportunity to tenants to create their own roles that correspond to their team structure (e.g. network administrator). And lastly, system administrator can create additional roles in system context with access to subset of features.
A role is a set of rights which can be assigned to a user or a group. There are many new rights in vCloud Director 8.20. A few examples:
Access to Distributed Firewall:
Enable / Disable Distributed Firewall
Gateway Advanced Services
Configure IPSEC VPN
Configure Load Balancer
Configure BGP Routing
Configure OSPF Routing
Configure SSL VPN
Configure L2 VPN
Configure Static Routing
Or system level rights like:
Migrate Host VMs
Open a Host in vSphere
Enable / Disable a Host
Prepare / Unprepare a Host
Prior vCloud Director 8.20
Only global roles could be created by system administrator next to handful of predefined roles (vApp Author, Organization Administrator, …).
Every organization would have access to the global and predefined roles.
Organization administrator could assign the roles to organization users.
Service provider could not differentiate access to features among different tenants.
There was only one system administrator role with access to everything.
vCloud Director 8.20
Roles are no longer global, but instead are organization specific.
Former global and predefined roles become role templates.
Service provider can create new role templates.
Role templates are used to instantiate organization specific roles.
Service provider can selectively grant rights to specific organizations.
Organization administrator can create own organization specific roles from subset of granted rights.
New roles can be created in the system context from subset of system administrator rights.
The transition from pre-vCloud Director 8.20 role management happens during upgrade to 8.20. Existing roles are transferred to role templates and each organization gets its own roles instantiation based on the role templates. The UI has changed and now includes Organization column and filter. A new System organization is added with default System Administrator role.
Tenant Rights and Role Management
When a new organization is created it will have access to all rights that are used in role templates. System administrator can grant additional rights to the organization with vCloud API only:
GET /api/admin … get references to all rights in VCD instance
GET /api/admin/org/<org-id>/rights … get references to all rights in the organization
PUT /api/admin/org/<org-id>/rights …edit rights in the organization
System administrator or Organization Administrator can create new roles in its organization with vCloud API only:
Note: While system administrator can edit tenant roles in the UI, editing of a role based on role template would change the role template and thus change it for all organizations (more below).
How to Create Global Role
The UI no longer allows creation of global roles, only organization specific roles can be created that way.
However, there is a way to create global role (actually role template) with the legacy API (e.g. version 9.0, 20.0 but not 27.0). Here is an example:
Addition and removal of rights from a role template:
In UI add/remove the right from the role which is based on role template from any organization.
To add a new right, the organization needs to have access to the right. If it does not have, add it first with the API calls mentioned above.
Adding or removing rights to a role based on role template will affect all other organizations.
Adding right: other organizations will see the new right if their instance of role template has been granted the right. If the organization did not have access to the right, the right will not be added!
Removing right: in other organizations the right will be removed from the role based on the role template
The post was written with kind support of John Kilroy.
NSX Distributed Firewall (DFW) is the most popular feature of NSX which enables microsegmenation of networks with vNIC level firewalls in hypervisor. For real technical deep dive into the feature I recommend reading Wade Holmes free e-book available here.
vCloud Director 8.20 provides this feature to tenants with brand new HTML5 UI and API. It is managed at Org VDC level from Manage Firewall link. This opens new tab with the new user interface.
vCloud Director now offers three different firewalls types for tenants, which might be confusing. So let me quickly compare them.
The picture above shows two Org VDCs each with different network topologies. Org VDC 1 is using Org VDC Edge Gateway that provides firewalling as well as other networking services (load balancing, VPNs, NAT, routing, etc.). It has also brand new UI and Network API. Firewalling at this level is enforced only on packets routed through the Edge Gateway.
One level below we see vApps with vApp Edges. These provide routing, firewalling and NAT between routed vApp Network and Org VDC network. There is no change in firewall capability of vApp Edge in vCloud Director 8.20 and old flash UI and vCloud API can be used for its configuration. Firewalling at vApp Edge level is enforced only on packets routed between Org VDC and vApp networks.
Distributed firewall is applied at the vNIC level of virtual machines. It means it can inspect every packet and frame coming and leaving VM and is therefore completely independent from the network topology and can be used for microsegmentation of layer 2 network. Both layer 3 and layer 2 rules can be created.
Obviously all three firewall types can be combined and used together.
Managing Access to Distributed Firewall
There are four new access rights related to DFW in vCloud Director.
Configure Distributed Firewall Rules
View Distributed Firewall Rules
Enable / Disable Distributed Firewall
The last right is by default available only to system administrators, therefore the provider can control which tenant can and cannot use DFW and it can thus be offered as a value added service. The provider can either enable DFW selectively for specific Org VDCs or alternatively grant Enable/Disable Distributed Firewall right to a specific organization via API. The tenant can enable DFW by himself.
Distributed Firewall under the Hood
Each tenant is given a section in the NSX firewall table and can only apply rules to VMs and Edge Gateways in his domain. There is one section for each Org VDC that has DFW enabled and it is created always on top.
Edit 3/14/2017: In fact it is possible to create the section at the bottom just above the default section. This allows provider to create its own section on the top which will be always enforced first. The use case for this could be service network.
To force creation of the section at the bottom the firewall must be enabled with API call with ?append=true at the end.
As tenants could have overlapping IPs all rules in the section are scoped to a security group with dynamic membership of tenant Org VDC resource pools and thus will be applied only to VMs in the Org VDC.
Tenants can create layer 3 (IP based) or layer 2 (MAC based) rules while using the following objects when defining them:
IP address, IP/MAC sets
Org VDC Network
Note that using L3 non-IP based rules requires NSX to learn IP address(es) of the guest VM. One of the following mechanism must be enabled:
VMware Tools installed in guest VM
DHCP Snooping IP Detection Type
ARP Snooping IP Detection Type
IP Detection Type is configured in NSX at Cluster Level in Host Preparation tab.
Scope for each rule can be defined in Applied To column. As mentioned before by default it is set to the Org VDC, however tenant can further limit the scope of the rule to a particular VM, or Org VDC network (note that vApp network cannot be used). It is also possible to apply the rule to Org VDC Edge Gateway, in such case the rule is actually created and enforced on the Edge Gateway as pre-rule which has precedence over all other firewall rules defined at that Edge Gateway.
Tenant can enable logging of a specific firewall rule with API by editing <rule … logged=”true|false”> element. NSX then logs the first session packet matching the rule to ESXi host log with tenant specific tag (Org VDC UUID subset string). The provider can then filter such logs and forward them to tenants with its own syslog solution.
vCloud Director architecture consist of multiple cells that share common database. The upgrade process involves shutting down services on all cells, upgrading them, upgrading the database and starting the cells. In large environments where there are three or more cells this can be quite labor intensive.
vCloud Director 8.20 brings new feature – an orchestrated upgrade. All cells and vCloud database can be upgraded with a single command from the primary cell VM. This brings two advantages. Simplicity – it is no longer needed to login to each cell VM, upload binaries and execute upgrade process manually. Availability – downtime during the upgrade maintenance window is reduced.
Set up ssh private key login from the primary cell to all other cells in the vCloud Director instance for user vcloud.
On the primary cell generate private/public key (with no passphrase):
Copy public key to each additional cell in the instance to authorized_keys file. This can be done with one line command ran from the primary cell or with this ssh-copy-id. Use IP/FQDN it is registered with in VCD
Optionally a maintenance cell can be specified with –maintenance-cell option.
For troubleshooting, the upgrade log is located on the primary cell in $VCLOUD_HOME/logs/upgrade-<date and time>.log
For no-prompt execution you can add –unattended-upgrade option.
This is the workflow that is automatically executed:
Quiesce, shutdown and upgrade of the primary cell. Does not start the cell.
If maintenance cell was specified, it is put into maintenance mode.
Quiescing and shut down of all the other cells.
Upgrade of the vCloud Database (a prompt for backup)
Upgrade and start of all other cells (except the maintenance cell)
If maintenance cell was specified, it is upgraded and started.
Start of the primary cell
What is the difference between a quiesced cell and a cell in the maintenance mode?
finishes existing long running operations
answers to new requests and queues them
does not dequeue any operations (they will stay in the queue)
VC lister keeps running
Console proxy keeps running
Cell in maintenance mode
waits for finish of long running but fails all queued operations
answer to most requests with HTTP Error code 504 (unavailable)
still issues auth token for /api/sessions login requests
No VC listener
No Console proxy
Interoperability with vCloud Availability
vCloud Availability uses Cloud Proxies to terminate replication tunnels from the internet. Cloud Proxies are essentially stripped down vCloud Director cells and are therefore treated as regular cells during the orchestrated upgrade.
Quiesced Cloud Proxy has no impact on replication operations and traffic. Cloud Proxy in the maintenance mode still preserves existing replications however new replications cannot be established.
2/27/2017: Multiple edits based on feedback from engineering. Thank you Matthew Frost!
vCloud Network Isolation (VCDNI or VCNI) is legacy mechanism to create overlay logical networks independently from physical networking underlay. It was originally used in VMware vCenter Lab Manager (where it was known as Cross Host Fencing). vCloud Director offers it as one of many mechanisms for creation of logical networks (next to VXLAN, VLAN and port group backings). VCDNI uses VMware proprietary MAC-in-MAC encapsulation done by vCloud Agent running in ESXi host vmkernel.
It has been for some time superseded by VXLAN technology which is much more scalable, provides better performance and is industry standard technology. VXLAN network pools have been available in vCloud Director since version 5.1.
VCDNI is consumed by manual creation of a vCloud Network Isolation backed Network Pool that is mapped to an underlay VLAN network with up to 1000 logical networks for each pool (VLAN).
As a deprecated and obsolete technology it is no longer supported in vSphere 6.5 and vCloud Director 8.20 is the last release that will support such network pools. vCloud Director 8.20 also provides simple mechanism to perform low-disruption migrations for Org VDC and vApp networks to VXLAN backed networks. Such migration must be done before upgrade to vSphere 6.5 (see more in KB 2148381).
The migration can be performed via UI or API by system administrator with Org VDC granularity.
Migration via UI
For an Org VDC using VCDNI network pool open in the System tab – Manager & Monitor, Org VDC properties (note that doing the same from Org tab will not work).
Go to Network Pool & Services tab and change VCDNI backed network pool to VXLAN backed one and click OK.
Again open Network Pool & Services tab of the Org VDC. Migrate to VXLAN button will now appear.
Click the button, confirm the message and start the migration.
After while the Org VDC status will change from busy to ready and the migration is finished. Details (and possible errors) can be reviewed in the Recent Tasks of the Audit Log.
Migration with vCloud API
Org VDC network migration is triggered by single API POST call at the Org VDC level.
POST /api/admin/vdc/<org VDC UUID>/migrateVcdniToVxlan Content Type: application/vnd.vmware.admin.vdcnitovxlanmigration+xml
The following happens in the background when migration is triggered for each VCDNI backed network in an Org VDC:
‘Dummy’ VXLAN logical switch is created
All VMs connected to VCDNI network are reconnected to the new VXLAN logical switch
Edge Gateways connected to VCDNI network are connected to the new VXLAN logical switch
Org VDC/vApp network backing is changed in vCloud DB to use the new VXLAN logical switch
Original VCDNI port group is deleted
Small network disruption is expected during VM and Edge Gateway reconnections. The following Recent Tasks picture from vSphere Client shows what is happening at vCenter Server level and how much time each task could take. In the example there was one Org VDC network and one vApp network migrated with VM1 and Edge Gateway ACME-GW2 involved.