A few minutes ago On November 12th VMware issued a Support Alert regarding an issue that affects users with IBM hardware, running ESX/ESXi 4.1.

The symptoms are mentioned below.

When using IBM x3650 M3 or BladeCenter HS22V servers, you may experience these symptoms:

  • HBAs stop responding
  • Other PCIs devices may also stop responding
  • You see an an illegal vector shortly before an HBA stops responding to the driver. For example: vmkernel: 6:01:34:46.970 cpu0:4120)ALERT: APIC: 1823: APICID 0x00000000 – ESR = 0x40
  • The HBA stops responding to commands. For example: vmkernel: 6:01:42:36.189 cpu15:4274)<6>qla2xxx 0000:1a:00.0: qla2x00_abort_isp: **** FAILED **** vmkernel: 6:01:47:36.383 cpu14:4274)<4>qla2xxx 0000:1a:00.0: Failed mailbox send register test
  • The HBA card gets marked offline. For example: vmkernel: 6:01:47:36.383 cpu14:4274)<4>qla2xxx 0000:1a:00.0: ISP error recovery failed – board disabled

The issue is currently under investigation by VMware engineering. At this time, downgrading to ESX/ESXi 4.0 by performing a fresh install is the only resolution.

VMware has created a Knowledgebase article 1030265 – HBAs and other PCI devices may stop responding in ESX 4.1 when using IBM servers. This KB article may be updated with new information if it becomes available. Bookmark the KB, or subscribe to its rss feed here to receive updates.

Update 17-11-2010: VMware found a resolution workaround for this problem.

ESX 4.1 introduces interrupt remapping code that is enabled by default. This code is incompatible with some IBM servers .
You can work around this issue by manually disabling interrupt remapping on these two IBM server models:
  • IBM Server x3650 M3
  • IBM BladeCenter HS22V
To disable interrupt remapping, perform one of these options:
  • Run the commands:# esxcfg-advcfg -k TRUE iovDisableIR # reboot To check if interrupt mapping is set after the reboot, run the command: # esxcfg-info -c iovDisableIR=TRUE
  • In vSphere Client:
    1. Click Configuration > (Software) Advanced Settings > VMkernel.
    2. Select VMkernel.Boot.iovDisableIR and click OK.
    3. Reboot the ESX host.