Tag Archives: vSphere

vCloud Director 5.1 Features and their vSphere Dependency

I see more and more customers are migrating from vCloud Director 1.5 to vCloud Director 5.1. One question they have is: “Do we have to migrate to vSphere 5.1 at the same time”? The answer is definite no. vCloud Director 5.1 supports vCenter 5.0 and ESXi 5.0 and even ESX(i) 4.0U2 if managed by vCenter 5.

I always recommend to upgrade vCloud Director in two phases.

Phase 1 (vCloud Director Upgrade)

  • vCloud Director Cell operating system (RHEL). RHEL 5 is still supported but if customer wants to use RHEL 6 he will need to deploy a new cell as RHEL 5 to RHEL 6 upgrade is not possible.
  • vCloud Director runtime upgrade
  • vCloud Director database schema upgrade
  • vShield Manager upgrade
  • vShield Edges upgrade

Phase 2 (vSphere Upgrade)

  • Installation of SSO
  • Installation of Inventory Service
  • Installation/upgrade of Web Client
  • vCenter Server upgrade
  • ESX hosts upgrade
  • distributed virtual switches upgrade

As the phases can be spread out in time this brings the main topic of the article – which new vCloud Director 5.1 features depend on vSphere 5.1 and will not be available during the time between Phase 1 and Phase 2? I have compiled a table which lists the new vCloud Director features and if that feature will be available with vSphere 5.0 (vCenter 5.0 + ESX 5.0. Note: I don’t dare to consider ESX 4).

Feature

vSphere 5.0

Note
VM Snapshots

Storage Profiles

Elastic VDC

Allocation pool Org VDC type can span multiple clusters. Online migrations and merging of Provider VDCs.
Provider Single Sign On vCenter SSO required
Customer Single Sign On

SSPI, SAML2
VXLAN Networks vSphere 5.1 vmkernel module is required
Storage clusters (SDRS)

VM placement engine leverages SDRS. Migration of linked clones supported. Difference in shadow VM handling¹
New Edge Gateway Features

Performance, HA, Load balancing, DNS relay, Rate limits, Multiple interfaces, IP allocations, SNAT and DNAT rules
Virtual Hardware 9 Requires vSphere 5.1 (64 vCPUs)
Additional Guest OS Support

possibly

Depends on ESX version (Windows 8/2012 requires ESXi 5.0 U1), but Virtual Hardware 9 is recommended (KB 2034491)
NFS VAAI Fast Provisioning Requires vSphere 5.1 (hardware accelerated linked clones)
Clustered database support

¹) With vSphere 5.0 vCloud Director does not use SDRS recommendation for linked clone placement (Fast Provisioning). vCloud Director picks individual datastore and optionally deploys shadow VM. With vSphere 5.1 vCloud Director fully leverages SDRS recommendations, shadow VMs are deployed by vSphere SDRS.

Table in PNG format.

Disclaimer: I don’t claim this table is complete and that it is an official VMware document. If you think something is missing, please comment and I will edit the table.

Edit 27 April 2013: Explained difference in linked clone placement.

Hypervisor Overhead – Reservable vs Raw Compute Resources

While working on capacity planning for one of my clients I encountered one not very well documented fact about vSphere cluster reservable resources. Common practice when calculating available compute resources (CPU and RAM) takes physical values of CPU and RAM of one host, multiplies them with the number of hosts in the cluster and subtracts the HA fail over capacity. However this is not correct as it does not take into account resources that vmkernel processes are reserving for themselves and that cannot be reserved by user workloads – which is important in service provider cloud environment where tenants pay for they allocated and reserved resources. Resources that cannot be reserved cannot be sold and mean lower ROI.

Real life example: 8 host cluster where each host has two 8 core @ 2.899 GHz CPUs and 384 GB RAM. Theoretically this should result in 371072 MHz CPU capacity and 3072 GB RAM however Resource Allocation tab of the cluster in vSphere client shows that only 330424 MHz and 2988 GB RAM is the total cluster capacity.

Cluster capacity

There is a KB article 1033443 describing the behavior with a title almost as long as the whole article: Cluster level memory capacity on Resource Allocation tab is less than the sum of the memory available for virtual machines for ESX hosts in the cluster, that unfortunately does not explain why and how much resources are missing.

As already hinted above, vmkernel processes are reserving some resources for themselves. If you select in vSphere Client a single host and go to Configuration > System Resource Allocation you will see value for System Resource Reservation – by default 301 MHz for CPU and 0 MB RAM.

System Resource Allocation

However this view does not show the whole story – if you change from Simple to Advanced view (top right) you will be presented with a tree of resource pools each with their own reservations.

System Resource Allocation - Advanced

There is a host resource pool at the root of the tree which has all the theoretical physical resource available as reservation which also equals its limit. However then there are 4 children resource pools at the same level:

  • idle, which is always empty
  • system (this is the one we could edit on the previous page) containing low level kernel, driver and similar sub-resource pools for each process
  • vim, which contains sub-resource pools for host management processes (hostd, vpxa, DCUI, …) which used to run in Console OS in the ESX 4 classic times.
  • user, which is available for the VMs deployed on the host

All these 2nd level resource pools and all their children resource pools have expandable reservations, which means that if one of the children will request more resources that are available in the resource pool the resource pool will try to get more resources from its parent. And the top parent is the host resource pool. The system processes and management (VIM) processes are started immediately when the host boots up before VM workloads are placed on the host therefore take the part of the available host resources for themselves.

You can easily see that some processes like hostd or vpxa reserve relatively significant amount of resources. The relativity depends on the size of the host – in my small lab environment 36% of CPU and 20% of RAM resources were not available to be reserved for VMs. In big environments as was in the example above, the CPU overhead is about 11% but memory only 3%.

It should be also noted that with more and more intelligent hypervisor (VXLAN VTEP, vApp firewalling, antivirus inspection, vSAN, etc.) the overhead will go up and capacity planning should include it.

Rate Limiting of External Networks in vCloud Director and Nexus 1000V

There is a new feature in vCloud Director 5.1 which was requested a lot by service providers – configurable limits on routed external networks (for example Internet) for each tenant. Limits can be set both for incoming and outgoing directions by vCloud Administrator on tenant’s Edge Gateway.

Edge Rate Limit Configuration

Edge Rate Limit Configuration

However this feature only works with VMware vSphere distributed switch – it does not work with Cisco Nexus 1000V or VMware standard switch. Why? Although the feature is provided by the Edge Gateway, what is actually happening in the background is that vShield Manager instructs vCenter to create a traffic shaping policy on the distributed vswitch port used by the Edge VM.

vSphere Distributed Switch Traffic Shaping

vSphere Distributed Switch Traffic Shaping

Standard switch does not allow port specific traffic shaping and Nexus 1000V management plane (Virtual Supervisor Module) is not accessible by the vShield Manager/vCenter. The rate limit could be applied on the port of the Cisco switch manually, however any Edge redeploy operation, which is accessible by the tenant via GUI would deploy a new Edge and use different port on the virtual switch and tenant could thus easily disable the limit.

For the standard switch backed external network vCloud Director GUI will not even present the option to set the rate limit, however for the Nexus backed external network the operation will fail with similar error:

Cannot update edge gateway “ACME_GW”
java.util.concurrent.ExecutionException: com.vmware.vcloud.fabric.nsm.error.VsmException: VSM response error (10086): Traffic shaping policy can be set only for a Vnic connected to a vmware distributed virtual portgroup configured with static port binding. Invalid portgroup ‘dvportgroup-9781′.

Nexus 1000V Error

Nexus 1000V Error

Btw the rate limit can be set on the Edge (when not using vCloud Director) also via vShield Manager or its API – it is called Traffic Shaping Policy and configurable in the vSM > Edge > Configure > Interfaces > Actions menu.

vShield Manager Traffic Shaping

vShield Manager Traffic Shaping

Do not forget to consider this when designing vCloud Director environments and choosing the virtual switch technology.

VCAP-DCD 5 Exam

On Monday I sat the VCAP-DCD 5 beta exam. I thought I would write my experience of preparing and taking the exam, but there is actually not much to write about that is different from my VCAP-DCD 4 experience.

So just some bulletpoints:

  • For version 4 I prepared for about month, including Design course, this time during two evenings I just reviewed the vSphere 5 Design course manual.
  • It was a beta exam which usually means long and with errors: 131 questions, 4+ hours. Couple questions were missing some words, in one particular question it seemed important. I had also a crash after two hours, but the examination lady restarted the program and fortunately all of my 80+ answers were there intact.
  • I rushed through the exam as quickly as possible, did not read the lengthy scenarios, concentrated on the important points that were asked for and skipped the 5 Visio type questions to do at the end and I still had only an hour to do those. The Visios seemed to be same or very similar as in previous version so they did not take me so much time and I finished about 10 minutes early.
  • The exam questions seemed much the same as in version 4, only a few related to the new vSphere 5 features.
  • Based on that my recommended study path would be: read all the whitepapers about vSphere 5 (What’s new and best practices) and the design methodology from the Design course and you should ace the exam.
  • If you pass the exam and you are not VCP5 you get VCP5 automatically.

I obviously do not know my result yet. I am still waiting (more than 2 months) for the result from my other beta exam (VCP5-DT) so I do not expect it to arrive any time soon. I feel spending 4+ hours for something that is less than 10% different from the previous exam is not very effective. I wish there would be a shorter web only delta exam or better some kind of continuous training requirement (online webex courses) similar to PMI Project Management Professional certificaton re-examination which requires collection of certain amount of points during 3 year time period to keep the certification. The points are awarded by taking official courses, online courses, writing a blog or for the actual on the job experience.

Another way to go in the future would be to extend the design certification beyond vSphere and include SRM, vCD and possibly vCOps – designing a datacenter nowadays requires more than just vSphere…

My VCAP-DCD Preparation and Exam Experience

Today I have taken and passed the VMware Certified Advanced Professional – Datacenter Design (VCAP-DCD) exam with a score of 427 out of 500. I want to describe not only my experience from the exam but also how I prepared for it. When studying or taking an exam I always look for blog posts of other people who already took the exam to learn from their experiences so this is my way to help the others.

The Preparation

VMware publishes an exam blueprint guide that describes the exam, recommends training courses and lists the exam topics with relevant documentation. For VCAP-DCD there are two recommended courses. The VMware vSphere Design Workshop which I attended in January and described my experience here: http://fojta.wordpress.com/2011/01/29/vmware-vsphere-design-workshop/. In my opinion the student manual from this course is very valuable for the exam. Although the course is related to vSphere 4.0 and the exam is 4.1 many questions were related to topics from the course (design methodology, terms, best practices). Another recommended course is e-learning training DRBC Design – Disaster Recovery and Business Continuity Fundamentals. It is free for VMware partners. It takes about 3 hours and discusses in general terms disaster recovery concepts and how they map to VMware products. It is interesting course but for the exam is not essential.

As I already said the exam is about vSphere 4.1 so I recommend to read all the What’s New in 4.1 documents from the curriculum. NetIOC, SIOC, HA, DRS and FT improvements and best practices are tested concepts. For better understanding of these new features I recommend the following two books:

Scott Lowe’s book VMware vSphere Design gives very good overview of all design best practices and also the reasons behind them. Duncan Epping and Frank Denneman’s HA and DRS Technical Deepdive is brief but very informative book about HA and DRS. For the exam it is too deep dive but on the other hand it helps to understand the reasons behind recommendations and best practices. There were a few questions related to HA and DRS and anybody who read the book can get these right easily.

The Exam

As a non native English speaker I had additional 30 minutes for the exam which totalled to 255 minutes – 4 hours and 15 minutes! That is very long time. However I used almost all of it. At the end I had only 10 minutes for review of the marked questions and I took only one 5 minute break after 2 hours. The reasons why it takes so much time to go through the 113 question are:

  • Most of the questions are situational and have very long description with too much information which is mostly not needed. At this point it is very similar to PMI exam (my review here: http://fojta.wordpress.com/2010/12/10/my-pmi-project-management-professional-certification-lessons-learned/)
  • There are 5 questions that require to draw the design with Visio-like tool. These have very long description and take a lot of time just to place all the elements on the drawing board and then to connect them. I wonder how these questions are scored. Each can easily take 15 minutes of your time.
  • There are many drag and drop questions. These also tak more time to answer then regular multiple choice questions.

Because of all the different type of questions it is very difficult to plan and manage your time. I definitelly recommend taking a break during the exam however a strategy to skip the time consuming questions is hard to recommend as it is not known how they are scored.

My overall impression of this exam is that it is a difficult one. It helps to study for it but just reading books or white papers is not enough. One has to have a good experience in making decisions based on given design requirements. It helps to read virtualization blogs and to have experience with networking and storage. The exam format makes it impossible to cheat with braindumps so my guess there will not be many VCAP-DCD certified people as is the case with VCP.

Now to study for VCAP-DCA…

VMware vSphere: Design Workshop

I set a goal for myself to become VCAP-DCD (VMware Certified Advanced Professional – Datacenter Design) certified. There are two recommended traning courses for this certification: DRBC Design – Disaster Recovery and Business Continuity Fundamentals (E-learning course available for partners free of charge on Partner Central) and VMware vSphere: Design Workshop. The workshop used to be obligatory for VMware partners to obtain Enterprise level partnership but can now be substituted by VCAP-DCD certification. This week I used the opportunity and took part in the 3 day VMware vSphere Design Workshop.

There are many reviews of this course on the Internet. Here are some links:

http://www.yellow-bricks.com/2010/03/24/vmware-design-workshop/

http://www.boche.net/blog/index.php/2010/10/27/vmware-vsphere-design-workshop-day-1/

http://www.seancrookston.com/2011/01/19/vmware-vsphere-design-workshop-review/

All the reviews are very similar. They are all positive praising the interactive workshop format. Therefore I was very much looking forward to it as there can be nothing wrong talking VMware for three straight days.

However after finishing the course my opinion about it is mixed. The biggest issue is that the course should be interactive and therefore very much depends on the instructor’s ability to facilitate discussion and on the level of the experience and knowledge of the participants. In my case it failed on both of these points.  We went very quickly through the theory. The labs were two real world examples – small company and a big enterprise. The instructor also distributed PDFs with supposed solutions to these labs, but these documents were very poorly written, brief and did not match the given labs.

The theory was based on vSphere version 4.0. Version 4.1 is out for almost half a year and has many new features that have impact on the design (VAAI, SIOC, NIOC, higher limits)!

As expected the workshop focused only on vSphere features. Some important parts of design were skipped or mentioned only briefly. Some exmaples: disaster recovery, backups, deployment, monitoring, how to virtualize tier1 applications. Also hardware specific info was missing (blades vs rack servers, 10 Gb NICs, FCoE). No recommendation where to get that info was given, neither were mentioned any of the best VMware blogs (Yellow-Bricks, Frank Denneman).

Among the good points is the design methodology and the text book. It will be useful while studying for the certification and I will go through it in more detail.

Although I mentioned many negative things I am still glad that I participated. It is always good to listen to other people experiences. Also there is no other way to obtain good text book or e-learning on the subject.

vSphere on Celeron CPU

I am building a new ESX whitebox for my home lab. I will write a blog post about it sometimes later. This post is about my Celeron experience.

I have all the parts for the whitebox but the CPU which should arrive soon. I wanted to test if all motherboard, NICs and other parts are OK as they are all from Ebay or similar shops. I have a spare Intel Celeron 420 CPU lying around, which is one core 1.6 GHz processor with 512 kB cache. No VT extensions. I expected that the system will just post to BIOS and then the ESXi boot from flashdisk will complain about incompatible CPU. To my surprise the ESXi 4.1 booted all the way.

I was even able to add the host to a DRS cluster with Enhanced vMotion Compatibility Mode Intel Xeon Core 2 and soon two 32bit linux virtual machines were automigrated to it from another host with Xeon processor. I tried to vMotion Windows XP machine and it run there as well. The Celeron host cannot run 64bit virtual machines though but for some testing or fooling around this info could be useful.

vSphere: Cannot remove empty virtual switch

A few days ago I was trying to migrate running virtual machines from one virtual switch to another one without any downtime. When I thought I was done I tried to remove the vacated virtual switch, but instead was greeted with the following error:

Error: A specified parameter was not correct.

A specified parameter was not correct.

Well after scratching my head for a while I discovered nasty bug in vSphere. If you rename a port group, the configuration files of VMs using this portgroup are not updated. If you create a new port group with the old name, vSphere client then shows the VMs in the new port group, however in reality they are still residing in the old one and using its connectivity. 

To reproduce my steps that lead to the above image:

  1. I created ‘Test’ virtual machine port group on new vSwitch2 and placed there a running VM1
  2. I renamed the port group to ‘Test2′ – VM1 disappeared
  3. I created new virtual machine port group ‘Test‘ on vSwitch0. The VM1 immediately jumped into this new port group.
  4. I tried to delete vSwitch2 and got the error.

Running esxcfg-vswitch -l I received this output:

The supposedly empty port group ‘Test2‘ is using 1 port and the new ‘Test‘ port group shows 0 used ports even though vSphere client shows running VM1 in it.

So what is really happening? Actually this is not a bug of vSphere client as it relays on info provided by SDK of ESX server. Running commands

vmware-vim-cmd vmsvc/get.networks <vmid>

vmware-vim-cmd hostsvc/net/vswitch_info

gives wrong info about the port group names. vSphere client incorrectly assumes that the VM was migrated to the other switch, but in fact it still resides on the old switch. The only way out of it is open VM settings and change the network connection to a different port group and back again. The network adapter info gets updated and now finally the VM is migrated to the new switch and the old one can be removed.

This was tested on ESX build 4.0.0,236512.