I have 3 R720 servers in my production cluster in a VMWare 5.5 environment (vm6, vm7 and vm8). All 3 servers are identical. Same hardware and firmware versions, BIOS, etc. The switches are the same (Dell N3024 stacked). I have the onboard BMC5720 with 4 ports for my LAN, 2 ports going to one switch and 2 going to another. This is my issue. Every VM I create or migrate to vm8, cannot ping the host. When I do a ping, the first packet sent will get a reply, but the rest will timeout. When I open a console for the VM, if I do not use it for minute, the screen goes blank and I lose the connection. I spent several hours on the phone with VMWare last week and they verified all the software settings, checked the logs, checked drivers, etc. and they are currently at a loss for the cause. They verified the tg3 vmnic version since there are versions that cause packet loss. I upgraded to the lastest version back in September since I was experiencing high availability issues which the upgrades resolved. But since I am having issues now, I cannot use the host for any servers.
Now here is another interesting fact that is making this harder to resolve. If I ping vm8 from my workstation or any other physical server, the first packet times out and I get a reply on everyone after that. If I start another ping immediately, no time outs. If I wait 60 seconds or longer and start another ping, the first one times out again.
Here's some info on the servers. These are the same for all 3 servers and only 1 has been having issues and the issues started on the day it was installed.
BIOS 2.2.2
BCM5720 Firmware version 7.8.16
BCM5719 Slot 5 Firmware version 7.8.16
The 4 ports on BCM5720 go to the LAN
First 2 ports on BCM5719 go to DMZ and the remaining 2 go to my Compellent SAN.
I have tested every port individually on the 5720 to the LAN and still had issues.
Has anyone ever seen an issue like this? I have run out of things to try. Also, this happened when we used CISCO switches. I have been working on the issue for months and need to get it resolved since we are moving our Citrix farm to VMWare.