Collect vCloud Director Cell Logs with Log Insight Agent

vcenter-log-insight-logoWhile it is possible to redirect vCloud Director cell logs by editing log4j.properties file to remote syslog server (see KB 2004564) there is an alternative agent based method utilizing vRealize Log Insight.

Log Insight agent is installed on each cell and then remotely managed from Log Insight server. Here are some advantages of this approach:

  • no manual edits of log4j file which gets overwritten with each upgrade
  • as we do not rely log4j logger we are able to collect also API request log files which are generated by Jetty
  • agent uses reliable TCP communication as opposed to unreliable UDP
  • we no longer rely on source IP to identify sender; cells can use source NAT (with single IP) to communicate with Log Insight server and we can still distinguish them
  • we can remotely change which logs we want to monitor (info vs debug)
  • and much more

Here is quick configuration how to:

  1. Download Log Insight Agent from Log Insight Server. It is already customized installation for your vRLI server. Administration > Agents > scroll down > Download Log Insight Agent Version 3.6.0 > pick rpm package
  2. Upload rpm file to each cell and install it with rpm -i VMware-Log-Insight-Agent-3.6.0-4148343.noarch_XXX.rpm
  3. Back in Agents configuration create active agent group from vCloud Director Cell Server template (copy template icon)
  4. Create hostname filter (use ? for any character substitution, you can add multiple entries in one line for ‘logical or’ or multiple lines for ‘logical and’
  5. Optionally edit agent configuration to include additional files or directories

agent-config

Multitenant Service Network in vCloud Director

Service providers often have to provide additional services to their cloud tenants. An example is providing licensing services (KMS) for Windows VMs deployed from provider managed catalog or RHEL Satellite servers for licensing and patching Red Hat VMs. The questions is then where to deploy these shared services virtual machines so they are securely available in multitenant environment?

In my older blog post Centralized Logging in vCloud Director Environments I described how a shared vCloud Director external logging network can be used to collect logs from Edge Gateways. So the idea is to use the same network for connection to the shared services VMs (KMS/Satellite) running in Administration Organization. The Edge Gateway can have only 10 interfaces so it is good that we do not waste another one. Let’s have a look at following diagram:

Edge GW Logging and admin services

We have 3 organizations and one Org VDC in each – Customer 1, Customer 2 (the tenants) and Admin Organizations (managed by the provider). The tenants connect their vApps to the shared internet network (yellow) via the Edge Gateways by using sub-allocated public addresses (8.8.8.x) utilizing source or destination NAT of their Org VDC network. Each Edge Gateway is connected to another vCloud external network (black) that is using both for Edge logging and access to shared services running in the Admin Organization.

Notice that there are two IP subnet ranges assigned to the service external network. The 10.0.5.0/24 is used solely for the Edge logging. The syslog server sits in this network (10.0.5.254) and firewall infront of it ensures that only Edge logs get there. The Edge Gateway IP from this network (10.0.5.1 and 10.0.5.2) is not sub-allocated for tenant use so they cannot create NAT rules with it. They could only route (one way) from their Org VDC networks and send UDP packets but the syslog firewall denies such traffic as it is coming from internal (192.168.1.2) IPs.

The second IP subnet range of the service network (172.16.254.0/24) is used for the communication to the service VMs running in Admin Organization. So how is this achieved securely?

  1. The provider sub-allocates the Edge IP to the tenant so he can create NAT rules. So 172.16.254.1 is sub-allocated to Customer 1, 172.16.254.2 is sub-allocated to Customer 2.
  2. The provider pre-creates SNAT rule for each deployed Edge Gateway. The rule must be applied on the Service network, original IP range is everything 0.0.0.0/0 and translated IP is the sub-allocated IP of the Edge.
    SNAT ruleThe tenant has to be told not to delete or alter the rule otherwise his access to shared services will not work anymore.
  3. The provider creates destination NAT rule for his service VMs running in Admin Organization. To do this he first needs to have sub-allocated IP addresses (in my example 172.16.254.3 and 172.16.254.4) and then DNATs them to the VM internal IPs 192.168.1.2 and 192.168.1.3. Obviously port forwarding could be used as well to save some IPs as long the port numbers of the services are not the same.

That’s it. Any traffic from the tenant’s VM to the external IP address of the service VM (e.g. 172.16.254.3) will be SNATed by the tenant Edge GW and DNATed by the Admin Edge GW and securely delivered without the tenants being able to contact each other (unless the create DNAT rules as well which could be prevented by MAC ACLs on the external network).

I would also advise to use some obscure IP ranges for the service network so they do not overlap with customer defined Org VDC network ranges.

Centralized Logging in vCloud Director Environments

I have been researching what the options for centralized logging in vCloud Director environments. Such deployments consist of large number of systems and to have all their logs in one place is the best practice for their additional processing, archiving and troubleshooting. If the logs are scattered in different locations it is very hard to correlate them and apply consistent policies.

I am going to describe the logging configurability and other considerations for following systems from the VMware vCloud stack:

  • vSphere ESXi hosts
  • vCenter Server
  • vCloud Director (cells)
  • vShield (vCloud Network and Security) Edge Gateways
  • vShield (vCloud Network and Security) Manager
  • vCenter Orchestrator

This article is not about what should be used as the central system that collects those logs and how to mine useful information out of them. Syslog compatible system is the essential requirement here however other enterprise features as high availability, log indexing, load balancing, reporting, archiving and extensibility for non-syslog sources should be considered.

vSphere ESXi Hosts

Log collection options on ESXi hosts are the most advanced out of all considered components. ESX logs are by default stored locally either on local datastore or in memory. They can be redirected to any other datastore however only useful option in large deployments is to redirect them to syslog server. This is done by changing the Syslog.global.logHost variable. Here we can put multiple destination syslog servers. We can also choose the protocol (TCP/UDP/ssl) and port number. The logs are then sent over vmkernel management network.

Note: ESXi produces large number of logs (vmkernel, hostd, vpxa, fdm, …) which are stored in separate files but combined into one syslog stream. Hostd and vpxa logs are set by default to verbose level and are quite chatty, so I usually change them to info level. This can be done by changing Vpx.vpxa.config.log and Config.HostAgent.log.level advanced variables.

vCenter Server

The Windows installable vCenter Server unfortunately cannot be easily configured to send its logs to an external syslog server. Therefore additional agent must be installed. The vCenter Server Appliance (vCSA) edition which runs on linux OS on the other hand utilizes syslog-ng which can be easily reconfigured to forward its logs to remote syslog. William Lam wrote excellent post how to do that here.

vCSA 5.1 when used with the internal PostgreSQL DB is currently supported while managing only up to 4 ESX hosts which might be enough for management cluster. For the vCloud resource clusters either vCSA with external Oracle database or Windows installable vCenter Server must be used.

vCloud Director

vCloud Director cells produce number of logs. There is an audit log which persists in the database for 90 days. It can be forwarded to an external syslog server by changing the $VCLOUD_HOME/etc/response.properties file on each vCloud Director cell. However only one syslog target can be defined with UDP/514 port. Therefore it is recommended to redirect this log to local RHEL syslog (127.0.0.1) and then use its forwarding features. The detail description how to do this is in vCAT 2.0 Public VMware vCloud Implementation Example which can be downloaded from here.

Additional troubleshooting and API access logs are stored locally in $VCLOUD_HOME/logs directory. These logs are produced by Apache log4j logging utility and can be forwarded by creating additional appender in the $VCLOUD_HOME/etc/log4j.properties directory. This is described in the KB article 2004519.

EDIT 1/10/2013: A new KB article has been posted about editing log4j.properties file: KB article 2004564.

vShield Edge Gateways

vShield Edge Gateway is multi-interface vShield Edge virtual device that connects vCloud Director organization VDC networks to external networks. These Edge devices are deployed as needed by vCloud Director through vShield Manager. vCloud Director holds all their configuration details which means that if their properties are edited directly through vShield Manager, vCloud Director is not aware of those changes and could revert them back. Why am I mentioning this? Each Edge device can be configured to send its logs to two remote Syslog servers. Additionally protocol (UDP/TCP) can be specified as well. However vCloud Director only allows to set up two default syslog servers (System > Administration > System Settings > General > Networking). And these will be applied to every Edge device deployed (either Edge Gateways or vApp Edges). So no per tenant or per network configuration is out-of-the-box possible (unless orchestrated with blocking tasks and vCenter Orchestrator).

Multi-interface Edge Gateway allows us to create dedicated logging external network on which the Syslog server (or forwarder to the Syslog server) can be placed. If we would want to collect logs also from vApp Edges the tenant would have to make sure that the vApp Edge can route the syslog traffic through the Edge Gateway to the Syslog server. If it will not (or cannot because it is isolated from Edge Gateway) then the vApp Edge logs are lost.

The provider could decide that he wants to collect only Edge Gateway logs and prohibit any other devices (vApp Edges or any other VM) to send the logs to provider’s Syslog server. This can be accomplished with design shown in the following picture:

Edge GW Logging

Each Edge Gateway has two external networks. One connected to the internet with Sub-allocate IP Pool with external IP addresses which the customer can use for Network Address Translation (NAT) to enable internet access to or from his VMs. The second external interface is connected to the logging external network. Customer cannot create any NAT rules because Sub-allocate IP Pool on that network is not populated. To block routed traffic from VMs behind the Edge Gateway firewall rule “allow only logging external network” is created (allow 10.0.5.0/24).

vShield Manager

vShield Manager (vSM) produces audit and system event logs. These can be forwarded to only one remote syslog server which can be configured in the vSM Settings&Reports > Configuration > General menu. Additionally port can be configured as well, however only UDP protocol is used.

vCenter Orchestrator

Similarly to vCenter Server, vCenter Orchestrator comes in two flavors – Windows installable or virtual appliance. KB article 1010956 describes the location of the files in the Windows installable edition. The same logging redirection can be used as described in the vCenter Server section above.

Note: This articles was written for the following versions: vSphere 5.1, vCloud Director 5.1, vCloud Network And Security 5.1 and vCenter Orchestrator 5.1.