I have two M605 servers (tag GQ3GQH1 is one of them) - configured the same way. (exact specs below). They used to have Win 2k8R2 installed and they worked fine.
Last week i did a fresh install of Windows 2012, added Commvault backup client, APC PowerChute and turned on File Dedupe and added the MPIO role. Then I clustered the servers (i'm calling it Cluster F for now :) and all is well... until..
the next day, the active node failed .. the day after that, the other node failed.. back and forth.. both servers are getting the same error message:
From the DRAC:
PCIE Fatal Err: Critical Event sensor, bus fatal error (Bus 0 Device 4 Function 0) was asserted
From OME:
A bus fatal error was detected on a component at bus 0 device 4 function 0.
I'm assuming device 4 is the HBA ? In the Windows Device Manager, i see that "Intel(R) 5000 Series Chipset PCI Express x8 Port 4-5 - 25F8" -- go to Properties - Location shows Bus 0 , Device 4, Function 0.. Then under that line, i see the two HBAs connected to that PCIe object -- their locations are PCI Slot 1 (PCI bus 9, dev 0, func 0) and PCI Slot 1 (PCI bus 9, device 0, func 1)
So it seems it has something to do either with the PCI bus .. or the HBAs..
I can't imagine what could be wrong with this.. on both nodes.. both servers are in the same chassis slots as they were when running Win 2k8R2.. and using the same SAN storage/luns.. so nothing changed there.. The HBAs are using the driver from Win2012 which is Qlogic 9.1.9.205 (5/21/2012).
Server Specs:
2x E5450's, 8x 2GB DIMMS and a Qlogic QME2472 HBA on fabric B (slot MEZZ_B1 and MEZZ_C1). BIOS is v2.4.0, Broadcom FW 7.4.8 (driver 7.0.1.36),