iSCSI and ESXi: multipathing and jumbo frames

For my home lab I have decided to use ESXi running from a 2 GB USB flash disk without any local storage. The main reason is that now the ESXi host does not produce as much noise and heat as before. VMware claims that they will retire ESX classic in the future so I take it as an opportunity to learn and be prepared for the switch to service-consoleless hypervisor.  

For the shared storage I am currently using Openfiler with iSCSI protocol. I am dedicating pair of 1Gb NICs on the ESXi host and on the Openfiler server for the iSCSI traffic. To get advantage of all the available bandwidth not so trivial set up is need which I am going to describe below.  

iSCSI multipathing

In order to be able to use iSCSI multipathing in vSphere, we need to create two VMkernel ports, bind them to two different uplinks and attach them to software iSCSI HBA.
My ESXi host has 4 NICs. Two are assigned to vSwitch0 which has Management VM Port group and three VMkernel ports. One for Management and two for iSCSI. Following picture shows vSwitch0 in the networking tab of the vSphere Client:  

  

The management traffic in untagged, iSCSI traffic is on VLAN 1000. As I also wanted to use jumbo frames (yes, ESXi supports jumbo frames, despite official documentation claiming for a long time otherwise), I had to create the VMkernel ports from CLI. The binding of the iSCSI VMkernel ports to sw iSCSI HBA must also be done from CLI. ESXi does not have service console, therefore first step is to install vMA (VMware Management Assistant) which replaces the service console.

 VMware Management Assistant (vMA)

The VMware approach to service-console-less hypervisor is based on following rationale. If we have many ESX hosts in a datacenter each has its own service console, which needs to be maintained, patched and consumes host resources. The patching often requires host restarts, which means we have to vMotion workloads or accept downtime. vMA basicaly offers the same functionality as service console, is detached from the host (in fact can run virtual or physical) and can control many hosts. vMA comes as a 560 MB OVF package that can be downloaded from VMware website. I deployed it in the VMware Workstation running on my laptop. It is Red Hat Enterprise Linux 5 VM which takes about 2.5 GB hard drive space. The setup is quite straightforward with network and password questions.  

There are various ways how to connect vMA to ESXi host. I decided to use vSphere FastPass. First we define the connection and then we can initialize it anytime with one command.  

sudo vifp addserver esxi.fojta.com –username root –password <password>
vifpinit esxi.fojta.com  

The prompt now shows which ESX host we are connected to.  

VMkernel ports with jumbo frames

Following commands create iSCSI1 and iSCSI2 port groups on vSwitch0, then add them to appropriate VLANs and create vmknic ports with the right IP address and MTU=9000 for the jumbo frames.  

esxcfg-vswitch –add-pg iSCSI1 vSwitch0
esxcfg-vswitch –add-pg iSCSI2 vSwitch0
esxcfg-vswitch –vlan 1000 -p iSCSI1 vSwitch0
esxcfg-vswitch –vlan 1000 -p iSCSI2 vSwitch0
esxcfg-vmknic -add vmk1 –ip 10.0.100.200 –netmask 255.255.255.0 –portgroup iSCSI1 –mtu 9000 vSwitch0
esxcfg-vmknic -add vmk2 –ip 10.0.100.220 –netmask 255.255.255.0 –portgroup iSCSI2 –mtu 9000 vSwitch0

The result:

esxcfg-vmknic -l
Interface Port Group/DVPort IP Family IP Address Netmask MAC Address MTU Type
vmk0 Management Network IPv4 10.0.4.200 255.255.255.0 00:30:48:79:7a:59 1500 STATIC
vmk1 iSCSI1 IPv4 10.0.100.200 255.255.255.0 00:50:56:74:56:90 9000 STATIC
vmk2 iSCSI2 IPv4 10.0.100.220 255.255.255.0 00:50:56:7a:51:69 9000 STATIC
  

VMkernel port bindings

Now we have to add different uplink to each VMkernel iSCSI port. This can be done through the vSphere Client interface. Go to ESXi host Configuration tab, Networking, vSwitch0 properties select iSCSI1 or iSCSI2 port, click edit and in the NIC Teaming tab make only one vmnic active and the other unused. Following pictures show the result:  

  

Next we have to bind the vmk ports to software iSCSI HBA. First find the number of your iSCSI software adapter in host Configuration, Storage Adapters. In my case it was vmhba39. Then run in vMA:

esxcli swiscsi nic add -n vmk1 -d vmhba39
esxcli swiscsi nic add -n vmk2 -d vmhba39
 

 
Reboot the ESXi server and check the result:
 

esxcli swiscsi nic list -d vmhba39  

vmk1
  pNic name: vmnic0
  ipv4 address: 10.0.100.200
  ipv4 net mask: 255.255.255.0
  ipv6 addresses:
  mac address: 00:02:a5:4c:c4:2c
  mtu: 9000
 

  toe: false
  tso: true
  tcp checksum: false
  vlan: true
  link connected: true
  ethernet speed: 1000
  packets received: 1924
  packets sent: 2088
  NIC driver: e1000
  driver version: 8.0.3.1-NAPI
  firmware version: N/A
 

vmk2
  pNic name: vmnic3
  ipv4 address: 10.0.100.220
  ipv4 net mask: 255.255.255.0
  ipv6 addresses:
  mac address: 00:30:48:79:7a:59
  mtu: 9000
  toe: false 
  tso: true
  tcp checksum: false
  vlan: true
  link connected: true
  packets received: 2919
  packets sent: 7106
  NIC driver: e1000e
  driver version: 0.4.1.7-NAPI
  firmware version: 2.1-12 
 

 

In order to successfully use jumbo frames the whole network infrastructure from vmk port to the Openfiler NIC port must support jumbo frames. It means all the virtual and physical NICs and virtual and physical switches must support jumbo frames. Therefore we must enable jumbo frames on vSwitch0 by running: 

 esxcfg-vswitch –mtu 9000 vSwitch0 

  

To check if jumbo frames work we can generate vmkping of the right size. This must be run from the busybox console of the ESXi host. To get there press F1 on the actual physical server, type unsupported and log in. 

 vmkping –s 9000 <openfiler IP address> 

 

Multipathing

Now we can set up the iSCSI HBA on ESXi host. Discover the targets and see how the host findsall the available paths. In my case I had two targets, each presenting two LUNs with two paths. Altogether we can see eight different paths.  

  

In order to load balance between them select for each LUN Round Robin path selection. This can be done this way: Configuration / Storage / Select datastore LUN / Properties / Manage Paths / Path Selection: Round Robin (VMware).  

  

We can check if the load balancing works correctly by generating storage traffic and monitoring the vmk ports usage. This can be done for example by creating Fault Tolerance compatible virtual hard drive (this creates eager zero thick drive which writes zeros to all disk blocks) and running resxtop (equivalent of esxtop) from vMA. Press ‘N’ key (network) and see if you get similar numbers in the vmk1 and vmk2 rows.  

  

Openfiler setup

I also created NIC team with jumbo frames on my Openfiler server. For some reason I was not able to do it from the GUI, so here are the commands I run. The NIC team (bond0) is created from eth1 and eth2. The link aggregation is 802.3ad with xmit_hash policy layer2+3. The NICs are connected to switch with LACP enabled port group.  

nano ifcfg-eth1
 

DEVICE=eth1
USERCTL=no
ONBOOT=yes
BOOTPROTO=none
SLAVE=yes
MASTER=bond0
 

nano ifcfg-eth2 

DEVICE=eth2
BOOTPROTO=none
USERCTL=no
ONBOOT=yes
SLAVE=yes
MASTER=bond0
 

nano ifcfg-bond0
 

DEVICE=bond0
MTU=9000
USERCTL=no
ONBOOT=yes
BOOTPROTO=none
BROADCAST=10.0.100.255
NETWORK=10.0.100.0
IPADDR=10.0.100.90
NETMASK=255.255.255.0
BONDING_MASTER=”yes”
BONDING_SLAVE0=”eth1″
BONDING_SLAVE1=”eth2″
 

nano /etc/modprobe.conf 

alias scsi_hostadapter mptbase
alias scsi_hostadapter2 ata_piix
alias usb-controller uhci-hcd
alias usb-controller1 ehci-hcd
alias eth0 e1000e
alias eth2 e1000e
alias eth1 e1000e
alias scsi_hostadapter1 sata_sil
alias bond0 bonding
options bond0 max_bonds=8 mode=802.3ad xmit_hash_policy=layer2+3 miimon=100 downdelay=0 updelay=0
 

service network restart

13 thoughts on “iSCSI and ESXi: multipathing and jumbo frames

    1. I think I was using version 2.3 at the time of the post. I see that the current Openfiler version is 2.99 but I have not tested it. I am not using Openfiler any more as I prefer VSA appliances from Falconstor or HP LeftHand.

      1. I have also only used Openfiler 2.3, so it would be interesting to see what is new in 2.9. I will assume that for example HP Lefthand would be much better in an production environment of course, however for a test or lab Openfiler performs good. Do HP or Falconstor have any “free for lab” options or only short term evaluation licenses?

        Here is an article I wrote earlier about setting up Openfiler running inside VMware Workstation for a lab environment: http://rickardnobel.se/archives/26

        1. Not only HP and Falconstor offer trial versions (with replications, snaphosts and other enterprise features) but they also provide free limited editions of their virtual storage appliances. HP Lefthand VSA (now called Storageworks V4000) is the only free VSA I know of that provides VAAI integration with VMware vSphere (see my other post: http://wp.me/pG062-2D). It can also be used to create VMware Site Recovery Manager lab environment in a box.

  1. Just thought I’d say thanks, this is by far the most concise and straightforward article on getting ESXi to talk mpio with jumbo frames I’ve seen. Maybe that’s because I’ve managed to wrap my mind around it now or something, but… either way, I appreciate the thorough explanation.

    I did one minor change to this – I created two virtual switches instead and put one vmnic per vswitch. Apparently there is no best practices to choose between that and having just one vswitch with one vmnic assigned to each vmk – it just seems clearer to me, vswitch1 has vmk1 and vmnic2, and vswitch2 has vmk2 and vmnic3, round-robining away.

    One thing you don’t cover is how much traffic the round robin thing sends across each line – that’s tunable, as I found in another very useful post elsewhere:

    http://virtualgeek.typepad.com/virtual_geek/2009/09/a-multivendor-post-on-using-iscsi-with-vmware-vsphere.html

    Specifically, question 3 in that post covers tuning how frequently the round robin part uses the separate links. I’ve set mine down quite a bit from the standard of 1000 commands.

    Thanks again. Great post for those of us still wrapping our heads around vmware’s thought processes… 😉

  2. This is a great post and was very helpful understanding the multipathing setup for vSphere. I have been using openfiler for a couple of years now and continue to learn it’s subtleties and ways to make it more efficient. I was wondering though if you could point out where the free version of both the HP software and Falconstor could be found. I have located the trial versions but not the free versions and I would like to put them through the paces and implement them if possible in my test lab.

    1. Maybe I just found the answer…
      Am I correct in understanding that the demo version will continue to work after the 60 day demo period, just without the advanced features?

  3. Thanks for writing this up. I followed these instructions (although I’m using OF 2.99.2 and had no problem creating the bond interface via the GUI), and everything *appears* to be set up correctly — at least ESXi shows two paths to the iSCSI target — but I’m noticing something rather odd:

    Both the OF server *and* the ESXi server are showing traffic being evenly-divided between interfaces when data is TRANSMITTED from itself, but is showing all of the RECEIVED data on one interface (the first).

    For example, after booting up two Ubuntu guests on ESXi, I see the following on OF:

    eth0 […]
    RX packets:202315 errors:0 dropped:0 overruns:0 frame:0
    TX packets:131035 errors:0 dropped:0 overruns:0 carrier:0

    eth1 […]
    RX packets:812 errors:0 dropped:0 overruns:0 frame:0
    TX packets:131059 errors:0 dropped:0 overruns:0 carrier:0

    …and on ESXi:

    vmk0
    packets received: 167925
    packets sent: 63127

    vmk1
    packets received: 3254
    packets sent: 67427

    …so ESXi interleaves its iSCSI write requests between the two interfaces, but the OF box only sees them come in on the first interface. Likewise, when ESXi requests data to be read from the volume, OF sends the data back to ESXi that it reads interleaved over its two interfaces, but the ESXi box only sees all of that data come into it via the first interface.

    One thought that I had was that perhaps it would be better to not bond the interfaces on the OF server, but give them two separate IP addresses as well? The iSCSI target name advertised by OF would be identical, so hopefully ESXi would recognize that it is the same target even though it is being seen from two separate IP addresses.

    Another possibility that crossed my mind is that perhaps this is an “optical illusion” of sorts, and that it is working just fine. I’m not sure if there is a good way to benchmark this, though.

    Both servers are plugged into a Dell 5324. The two OF interfaces are in a LACP channel-group together, and the two ESXi interfaces are in a static (non-LACP) channel-group together.

    Thanks for any suggestions you might have.

  4. what if we do a multipathing on two different switches. like one iSCSI on one switch and one on another switch. can we do and if yes please suggest.

  5. That’s because NIC bonding/pairing only works for failover not for loadbalancing ISCSI, period.
    You need MPIO (Multipath I/O) on both sides if you want to use multiple NICs for loadbalancing.

    ISCSI isn’t a random data generator and expects a certain order to sessions/streams/commands received/sent.
    Anything other than MPIO will always only use max bandwidth of 1 NIC because ISCSI first chooses a network path and then starts a session over it..
    The session can not be split up over multiple NICs otherwise ISCSI might receive commands/data out of order because it is not aware the data might be coming in over different NICs with different latencies etc..
    MPIO sits below the network and thus takes care ordering/loadbalancing before it gets put out on the wire..

    This setup might be of some benefit if multiple initiators (vmware machines) connect to the target where each initiator might end up being routed to target over a different NIC.

    Imho this article was probably glued together from pieces of info out there on the internet without the author actually understanding what he is doing..
    The fact that no thruput tests were done kinda says enough.

    Just so u know..

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.