Hello,
I am runnintg vSphere 5.1 on 5 Dell Poweredge R715s with a NetApp 2040 over NFS. Our cluster services both VMware View (5.0.1) and virtual servers. All Servers have identical hardware and were purchased at the same time. Ever since upgrading to 5.0 and installing VMware View, we have experienced ESX hosts just randomly restarting.This is happening to all 5 servers in the cluster. The reboots can happen anywhere from a week or a couple of months.
There are no errors in the logs at all. VMware reports the host just becomes unresponsive then the host goes down, restarts, and then continues like nothing ever happened. I have worked with both VMware and Dell server support over the last few months to resolve the issue.
I have had VMware (up to level 3 support) go over the logs and have found nothing (all from Syslog, VMware support bundles, and DRAC logs). They just suggest going back to Dell assuming its a hardware component issue. The logs literally show nothing no errors or anything.
I have had Dell reviewed my BIOS settings on each server. We have confirmed C states are disabled, virtualization support is enabled, and power settings are set to maximum performance. All VMware patches and Dell firmware is up to date or on stable versions. All Dell hardware checks out from the system services console. Dell has shipped me new PSUs for all the servers and that hasn't solved the problem. I am at the stage where I am going to make Dell change out the remaining parts such as system board, RAM, and CPU or give me new servers.
However I am just at a loss as to what could be causing this problem and everyone else who should have an idea doesn't seem to know either. I can't be the only person who has experienced this type of behavior with Dell servers. Can anyone tell me what I might be missing?