Hi,
We have a Poweredge R715 with Xenserver 6.2 and it crashes almost every time the raid battery learning cycle almost completes.
This is a server with a 4hour support contract but this problem is already present for months.
DELL sent an engineer to change the raid controller, raid battery and I/O board (and some other small components related to it, but NOT the CPU board with the memory or the PSU's).
The OS is Xenserver 6.2 with all patches and all the DELL hardware is patched with the latest firmware.
The C1 parameters in the BIOS are also disabled.
At the moment of the crash, this shows up in the open manage log:
Status: OK 2334 Fri Mar 21 05:22:37 2014 Storage Service Controller event log: Current capacity of the battery is above threshold: Controller 0 (PERC H700 Integrated)
Status: OK 2334 Fri Mar 21 05:22:37 2014 Storage Service Controller event log: BBU enabled; changing WT virtual disks to WB: Controller 0 (PERC H700 Integrated)
Status: OK 2334 Fri Mar 21 05:22:37 2014 Storage Service Controller event log: Policy change on VD 00/0 to [ID=00,dcp=0d,ccp=0d,ap=0,dc=2,dbgi=0,S=0|0] from [ID=00,dcp=0d,ccp=0c,ap=0,dc=2,dbgi=0,S=0|0]: Controller 0 (PERC H700 Integrated)
Status: Non-Critical 2335 Fri Mar 21 05:22:36 2014 Storage Service Controller event log: BBU disabled; changing WB virtual disks to WT, Forced WB VDs are not affected: Controller 0 (PERC H700 Integrated)
Status: OK 2334 Fri Mar 21 05:22:36 2014 Storage Service Controller event log: Policy change on VD 00/0 to [ID=00,dcp=0d,ccp=0c,ap=0,dc=2,dbgi=0,S=0|0] from [ID=00,dcp=0d,ccp=0d,ap=0,dc=2,dbgi=0,S=0|0]: Controller 0 (PERC H700 Integrated)
Status: OK 2334 Fri Mar 21 05:22:36 2014 Storage Service Controller event log: Battery relearn completed: Controller 0 (PERC H700 Integrated)
Status: Non-Critical 2335 Fri Mar 21 05:22:36 2014 Storage Service Controller event log: Current capacity of the battery is below threshold: Controller 0 (PERC H700 Integrated)
Status: OK 2334 Fri Mar 21 05:22:35 2014 Storage Service Controller event log: Battery relearn started: Controller 0 (PERC H700 Integrated)
Status: OK 2334 Fri Mar 21 05:22:35 2014 Storage Service Controller event log: Battery is discharging: Controller 0 (PERC H700 Integrated)
Status: OK 2334 Fri Mar 21 05:22:35 2014 Storage Service Controller event log: Battery relearn in progress: Controller 0 (PERC H700 Integrated)
Status: Non-Critical 2335 Fri Mar 21 05:22:35 2014 Storage Service Controller event log: Current capacity of the battery is below threshold: Controller 0 (PERC H700 Integrated)
Status: OK 2334 Fri Mar 21 05:22:34 2014 Storage Service Controller event log: Unexpected sense: Encl PD 20 Path 6001f0f0d4ed5b00, CDB: 1c 01 a0 00 04 00, Sense: 5/24/00: Controller 0 (PERC H700 Integrated)
They tell me that these warnings are the result of something but not the cause.
Citrix also checked the server and logs and couldn't find anything related to a software crash (it also only happens when the raid battery learning cycle almost completes).
DSET doesn't show any errors, tests are done with changing the raid policy to write-back and write-through but nothing happens then.
any idea ??
Ph.