Saturday, 23 February 2008

Er... problem with HS21 XM (7995) and ESX 3.5

This is a bit of an issue. I've just test installed ESX 3.5 onto a HS21 XM (7995) blade BIOS v 1.07, everything is fine and the server boots fine and runs stable but everytime I reboot from the console or restart using VI-Client I get a purple screen of death.

Now I know that there is an issue with quad-core Xeons and HS21 blades, but wasn't this fixed with the latest BIOS versions? I believe it was fixed with BIOS 1.06 on the normal HS21 but was this same fix applied to HS21 XM (7995) v 1.07?

IBM and VMware support tickets have been opened, but any working fixes out there?

8 comments:

Aaron Delp said...

Make sure the processor stepping levels are matched on the 2 CPU's. Check the VPD in the BIOS for the stepping levels. There is a known problem with 1.07b (the one with a December release date) and stepping levels with Red Hat. It would make sense if it bleeds over into VMWare since there is some Red Hat in there.

Check my site for more information on the bug in the BIOS.

www.bladevault.info.

Aaron

Hugo said...

Aaron,

Customer has 1.07 31/01/2008.

I will ask the customer to check CPU settings on Monday.

I have a theory which I will test this week: this problem wouldn't be an issue if ESX 3i is used? Since there is no Service Console....

I will let you know.

If it works I'll have to persuade them to use 3i on the blades.

The other thought I had was to remove all of the CPUs in Socket 2 from the 8 blades, and also remove the CPUs in Socket 1 from the second set of blades and install these into the first set of blades in Socket 2. The remaining CPUs that were all in Socket 2 of all of the blades would then be installed into the second set of blades. That would hopefully alleviate the processor mismatch...

Hugo

hai said...

I have the same exact problem. Did you ever find the solution? I tried bios version 1.04, 1.05 and, 1.07 with the same result.

Hugo said...

Hai,

Installing ESX 3i gets around the problem, as I suspected in my previous post.

I'm still pushing IBM to release a fix for ESX 3.5 though...

Hugo

Hugo said...

ESXi is just one of the solutions to this problem. It appears that the problems that I was experiencing were not due to the 2 x quad core CPUs but were down to the USB KVM module that is loaded by ESX.

To get around this issue on ESX 3.5 build 64607, run

chkconfig gpm off

then reboot.

If you are running ESX 3.5 Up 1 or ESXi, then you will not get this issue.

Anonymous said...

We're completely running an up to date load of ESX 3.5 on our SuperMicro barebones servers and still receive a purple screen error on shutdown. We believe the problem is caused by an interaction between gpm and the ipmi card w/usb kvm functionality.

Like with you, our fix was to simply turn off gpm.

Anonymous said...

ESX3.5 on our IBM HS21 blades did crash at shutdown too, but ESX3.5 update3 helped us and we stopd get purple screen. Now everything works fine on IBM BladeCenter.

Anonymous said...

I had the same problem, BIOS v1.13 solved it.