Wednesday, 11 February 2009

Configuring vSwitch0 for Service Console and VMkernel portgroups

Following on from a previous post on how to configure vSwitch0 with active/passive network interfaces using vmware-vim-cmd for the two port groups, I have included some diagrams explaining how it all works.

The reason that Link Aggregation will not give any performance benefit if both the Service Console and the VMkernel share two active uplinks on a virtual switch configured with Route based on IP hash load balancing policy can be summarised as follows.

802.3ad/LACP aggregates physical links, but the mechanisms used to determine whether a given flow of information follows one link or another are critical.

You’ll note several key things in this document that are useful in understanding 802.3ad/LACP.
  • All frames associated with a given “conversation” are transmitted on the same link to prevent mis-ordering of frames. So what is a “conversation”? A “conversation” is the TCP connection.
  • The link selection for a conversation is usually done by doing a hash on the MAC addresses (Route based on source MAC hash) or IP address (Route based on IP hash).
  • Link Aggregation achieves high utilisation across multiple links when carrying multiple conversations, and is less efficient with a small number of conversations (and has no improved bandwith with just one).
Link Aggregation applies between two network devices only. Link aggregation can load balance efficiently – but is not particularly efficient or predictable when there are a low number of TCP connections, hence it is not useful when there are only two TCP connections, the Service Console IP address and the VMkernel IP address. Using link aggregation with VMware ESX and Cisco networking requires the use of Route based on IP hash and static LACP/802.3ad.

So how does the VMkernel distribute traffic over the Link Aggregation Group (LAG)?
The VMkernel distributes the load across the Link Aggregation Group by selecting an uplink to the physical network based on the source and destination IP addresses together. Each source / destination IP conversation gets treated as a unique route, and is distributed across the LAG accordingly. Using an IP-based load balancing method allows for a single-NIC virtual machine to possibly utilise more than 1 physical NIC. Returning traffic may come in on a different NIC, so Link Aggregation must be supported on the physical switch.

The diagram below shows what you would want to achieve with vSwitch0 when using a single virtual switch with two network interfaces with two port groups configured for the Service Console networking and the VMkernel networking.

The Service Console uses vmnic0 as the active uplink and vmnic2 as the passive uplink. The VMkernel uses vmnic2 as the active uplink and vmnic0 as the passive uplink. If one of the network interfaces fail or the corresponding physical switch fails or a cable fails, then the portgroup will utilise the standby network interface.

The configuration settings seen within the VI Client are shown in the figures below.

The vSwitch0 configuration
The vSwitch uses Route based on IP hash with both adapters set as active.

The Service Console portgroup configuration

The Service Console port group uses vmnic0 as the active adapter and vmnic2 as the standby adapter and the load balancing policy is set to Route based on virtual port ID. This overrides the configuration inherited by the vSwitch.

The VMkernel portgroup configuration
The VMkernel port group uses vmnic2 as the active adapter and vmnic0 as the standby adapter and the load balancing policy is set to Route based on virtual port ID. This overrides the configuration inherited by the vSwitch.

Network Utilisation
The graph below shows the current utilisation of the network interfaces assigned to vSwitch0, under normal operations only vmnic0 is used because the Service Console traffic uses vmnic0.

Let's see what happens when I initiate a VMotion to the same server.

The VMotion traffic uses the active network interface vmnic2 as expected.

It is obviously apparent that using override active/passive uplinks for the Service Console and VMkernel port groups has significant advantages. By doing this we restrict the VMotion traffic from flooding the Service Console uplink adapter that can be experienced when using an active/active configuration.

Monday, 2 February 2009

What to do when an ESX host shows not responding?

Steps in order to progress

1) Login in the affected ESX server using Putty

2) service mgmt-vmware restart

If this doesn't work then the vmware-hostd daemon has to be killed.

3) ps -e | grep vmware-hostd
Look for the process_id associated with vmware-hostd

4) kill process_id
i.e. if 3) returned:
32470 ? 00:01:12 vmware-hostd
the command would be:
kill 32470

5) service mgmt-vmware status
if the service is started use
service mgmt-vmware restart
if it's stopped use:
service mgmt-vmware start

Using ESX 3.5 vmware-vim-cmd instead of vimsh


For those of you familiar with vimsh and used it to configure a scripted install of ESX 3.5, have you noticed that the following error would occur when launching commands using /usr/bin/vimsh ?

/usr/bin/vimsh -n -e "hostsvc/maintenance_mode_enter

Alternatively, by using the wrapper developed for ESX 3.5, vmware-vim-cmd, you would get the following:

/usr/bin/vmware-vim-cmd hostsvc/maintenance_mode_enter

The two commands are detailed in the Xtravirt whitepapers, vimsh and vimsh for ESX 3.5. I would recommend at least having a quick browse to see what can be achieved with these commands. Using vmware-vim-cmd in conjunction with esxcfg- can achieve some very interesting results, especially if you love to create the perfect KickStart build script.

If only it is possible to launch vmware-vim-cmd commands using the RCLI just as esxcfg- can be launched using vicfg-. Anyone have an idea?

A few more examples

Refreshing the network settings
/usr/bin/vmware-vim-cmd hostsvc/net/refresh

Refreshing the storage
/usr/bin/vmware-vim-cmd hostsvc/storage/refresh

The all important enabling VMotion
/usr/bin/vmware-vim-cmd hostsvc/vmotion/vnic_set vmk0

And how about setting vSwitch1 to use Route Based on IP Hash?
/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy --nicteaming-policy=loadbalance_ip vSwitch1

And setting vSwitch0 to use Route Based on the Originating Virtual PortID. (vSwitch0 has two portgroups using VLAN tagging, 1 for Service Console and 1 for VMotion, we wish to use active-passive nic teaming policy)

Set active vmnic0 and standby vmnic2 for Service Console
/usr/bin/vmware-vim-cmd hostsvc/net/portgroup_set --nicorderpolicy-active=vmnic0 vSwitch0 'Service Console'
/usr/bin/vmware-vim-cmd hostsvc/net/portgroup_set --nicorderpolicy-standby=vmnic2 vSwitch0 'Service Console'

Set active vmnic2 and standby vmnic0 for VMkernel network
/usr/bin/vmware-vim-cmd hostsvc/net/portgroup_set --nicorderpolicy-active=vmnic2 vSwitch0 VMkernel
/usr/bin/vmware-vim-cmd hostsvc/net/portgroup_set --nicorderpolicy-standby=vmnic0 vSwitch0 VMkernel

Set vSwitch overide load balancing policy
/usr/bin/vmware-vim-cmd hostsvc/net/portgroup_set --nicteaming-policy=loadbalance_srcid vSwitch0 'Service Console'
/usr/bin/vmware-vim-cmd hostsvc/net/portgroup_set --nicteaming-policy=loadbalance_srcid vSwitch0 VMkernel

Let's not forget to refresh our network settings
/usr/bin/vmware-vim-cmd hostsvc/net/refresh
/usr/bin/vmware-vim-cmd internalsvc/refresh_network