Yesterday I gracefully rebooted our PowerEdge T710 server to apply updates, unfortunately id did not come backup - today when I arrived the computer was on and sitting at prompt "The system halted because system power exceeds capacity." press F1 to continue or F2 to enter setup. I booted the machine as per normal and looked at the OpenManger logs;
Sun Dec 22 09:24:23 2013 | Cannot communicate with power supply 2. | |||
Mon Dec 23 20:12:02 2013 | Cannot communicate with power supply 1. | |||
Mon Dec 23 20:12:04 2013 | Communication has been restored to power supply 1. | |||
Tue Dec 24 08:32:36 2013 | Cannot communicate with power supply 1. | |||
Tue Dec 24 08:32:37 2013 | Communication has been restored to power supply 1. | |||
Thu Dec 26 08:44:46 2013 | Communication has been restored to power supply 2. | |||
Thu Dec 26 08:44:46 2013 | Cannot communicate with power supply 2. | |||
Sat Dec 28 23:37:32 2013 | Cannot communicate with power supply 1. | |||
Sat Dec 28 23:37:33 2013 | Communication has been restored to power supply 1. | |||
Mon Dec 30 02:42:50 2013 | Cannot communicate with power supply 1. | |||
Mon Dec 30 02:42:51 2013 | Communication has been restored to power supply 1. | |||
Tue Dec 31 17:23:16 2013 | Communication has been restored to power supply 1. | |||
Tue Dec 31 17:23:16 2013 | Cannot communicate with power supply 1. | |||
Tue Dec 31 22:46:53 2013 | Communication has been restored to power supply 2. | |||
Tue Dec 31 22:46:53 2013 | Cannot communicate with power supply 2. | |||
Fri Jan 17 02:32:53 2014 | Communication has been restored to power supply 2. | |||
Fri Jan 17 02:32:53 2014 | Cannot communicate with power supply 2. | |||
Mon Jan 27 20:23:16 2014 | An OS graceful shut-down occurred. | |||
Mon Jan 27 20:23:17 2014 | OEM software event. | |||
Mon Jan 27 20:24:37 2014 | The system halted because system power exceeds capacity. | |||
Tue Jan 28 10:04:21 2014 | OEM software event. | |||
Tue Jan 28 10:04:21 2014 | C: boot completed. |
Both PSU seem to be running fine there are not amber lights on either to indicate a fault, and oddly in the logs PSU1 loses communication and come back then PSU2 loses communication and comes back. That machine has been working flawlessly that I hadn't even checked the logs. The UPS reports no problem that the PSUs are plugged into and they are not on any kind of "Y" cable. Is this malfunctioning sensor? a lot of what read seem to blame iDRAC or BMC firmware but if that were the case I'd assume it's get fixed. Any ideas anyone?