Hi,
We're currenting deploying a number of R620, and have started to integrate the iDRAC snmp features in to the monitoring system. However we're experiencing some oddities we suspect could be bugs:
Packet loss on ping
We regularily see packetloss when the the monitoring system pings the iDRAC. Not much or often, but enough to trigger alarms randomly. We've ruled out the network infrastructure as a source, as we get the same results connecting the iDRAC directly to the monitoring system using high-quality CAT6 cable(s). Also we don't get any packet loss to other equipment running Linux/Windows/embedded RTOS os'es using the same link and switches. It's seem to be consistent to all iDRAC interfaces and version we're running (including v 1.57.57).
After some diagnostic I've found a plausible cause; when two simultanous "ping-sessions" are initiated from the same host to a iDRAC interface, one of the sessions stop receiving replies after the first 5-10 packets:
ping 10.100.104.15
PING 10.100.104.15 (10.100.104.15) 56(84) bytes of data.
64 bytes from 10.100.104.15: icmp_seq=1 ttl=64 time=0.295 ms
64 bytes from 10.100.104.15: icmp_seq=2 ttl=64 time=0.341 ms
64 bytes from 10.100.104.15: icmp_seq=3 ttl=64 time=0.291 ms
64 bytes from 10.100.104.15: icmp_seq=4 ttl=64 time=0.342 ms
64 bytes from 10.100.104.15: icmp_seq=5 ttl=64 time=0.332 ms
64 bytes from 10.100.104.15: icmp_seq=6 ttl=64 time=0.352 ms
64 bytes from 10.100.104.15: icmp_seq=7 ttl=64 time=0.349 ms
64 bytes from 10.100.104.15: icmp_seq=8 ttl=64 time=0.341 ms
64 bytes from 10.100.104.15: icmp_seq=9 ttl=64 time=0.362 ms
64 bytes from 10.100.104.15: icmp_seq=10 ttl=64 time=0.344 ms
64 bytes from 10.100.104.15: icmp_seq=11 ttl=64 time=0.335 ms
^C
--- 10.100.104.15 ping statistics ---
11 packets transmitted, 11 received, 0% packet loss, time 9998ms
rtt min/avg/max/mdev = 0.291/0.334/0.362/0.032 ms
which is the ok one, but on the second session:
ping 10.100.104.15
PING 10.100.104.15 (10.100.104.15) 56(84) bytes of data.
64 bytes from 10.100.104.15: icmp_seq=1 ttl=64 time=0.395 ms
64 bytes from 10.100.104.15: icmp_seq=2 ttl=64 time=0.302 ms
64 bytes from 10.100.104.15: icmp_seq=3 ttl=64 time=0.312 ms
64 bytes from 10.100.104.15: icmp_seq=4 ttl=64 time=0.355 ms
64 bytes from 10.100.104.15: icmp_seq=5 ttl=64 time=0.318 ms
64 bytes from 10.100.104.15: icmp_seq=6 ttl=64 time=0.425 ms
64 bytes from 10.100.104.15: icmp_seq=7 ttl=64 time=0.297 ms
64 bytes from 10.100.104.15: icmp_seq=15 ttl=64 time=0.265 ms
64 bytes from 10.100.104.15: icmp_seq=16 ttl=64 time=0.264 ms
64 bytes from 10.100.104.15: icmp_seq=17 ttl=64 time=0.264 ms
^C
--- 10.100.104.15 ping statistics ---
17 packets transmitted, 10 received, 41% packet loss, time 15996ms
rtt min/avg/max/mdev = 0.264/0.319/0.425/0.057 ms
When stopping the "ok" ping session, the second "stalled" session recovers. For reference the TCPDump of the session:
15:25:33.574152 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14532, seq 1, length 64
15:25:33.574531 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14532, seq 1, length 64
15:25:34.573153 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14532, seq 2, length 64
15:25:34.573438 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14532, seq 2, length 64
15:25:35.572154 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14532, seq 3, length 64
15:25:35.572449 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14532, seq 3, length 64
15:25:35.588938 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14533, seq 1, length 64
15:25:35.589219 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14533, seq 1, length 64
15:25:36.573325 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14532, seq 4, length 64
15:25:36.573662 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14532, seq 4, length 64
15:25:36.588858 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14533, seq 2, length 64
15:25:36.589180 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14533, seq 2, length 64
15:25:37.572325 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14532, seq 5, length 64
15:25:37.572626 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14532, seq 5, length 64
15:25:37.587857 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14533, seq 3, length 64
15:25:37.588131 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14533, seq 3, length 64
15:25:38.571329 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14532, seq 6, length 64
15:25:38.571737 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14532, seq 6, length 64
15:25:38.587015 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14533, seq 4, length 64
15:25:38.587337 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14533, seq 4, length 64
15:25:39.570972 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14532, seq 7, length 64
15:25:39.571250 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14532, seq 7, length 64
15:25:39.586984 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14533, seq 5, length 64
15:25:39.587297 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14533, seq 5, length 64
15:25:40.571005 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14532, seq 8, length 64
15:25:40.586967 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14533, seq 6, length 64
15:25:40.587300 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14533, seq 6, length 64
15:25:41.571003 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14532, seq 9, length 64
15:25:41.587007 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14533, seq 7, length 64
15:25:41.587336 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14533, seq 7, length 64
15:25:42.571002 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14532, seq 10, length 64
15:25:42.587005 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14533, seq 8, length 64
15:25:42.587327 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14533, seq 8, length 64
15:25:43.571009 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14532, seq 11, length 64
15:25:43.587009 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14533, seq 9, length 64
15:25:43.587351 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14533, seq 9, length 64
15:25:44.570981 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14532, seq 12, length 64
15:25:44.586969 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14533, seq 10, length 64
15:25:44.587294 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14533, seq 10, length 64
15:25:45.571004 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14532, seq 13, length 64
15:25:45.586996 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14533, seq 11, length 64
15:25:45.587312 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14533, seq 11, length 64
15:25:46.570971 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14532, seq 14, length 64
15:25:47.570997 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14532, seq 15, length 64
15:25:47.571242 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14532, seq 15, length 64
15:25:48.571005 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14532, seq 16, length 64
15:25:48.571248 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14532, seq 16, length 64
15:25:49.571001 IP 10.100.104.254 > 10.100.104.15: ICMP echo request, id 14532, seq 17, length 64
15:25:49.571246 IP 10.100.104.15 > 10.100.104.254: ICMP echo reply, id 14532, seq 17, length 64
Do any other see similar behavoir?
Inconsistent SNMP replies between iDRACS:
Strangely, identical servers running the same version (1.57.57) and to my knowledge identical configuration, returns different SNMP replies. More precise, some interfaces does not respond to all OIDS, e.g.:
snmpwalk -v2c -c public -On 10.100.104.13 .1.3.6.1.4.1.674.10892.5.4.1100.30
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.1.1.1 = INTEGER: 1
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.2.1.1 = INTEGER: 1
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.3.1.1 = INTEGER: 0
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.4.1.1 = INTEGER: 2
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.5.1.1 = INTEGER: 3
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.7.1.1 = INTEGER: 3
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.8.1.1 = STRING: "Intel"
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.9.1.1 = INTEGER: 3
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.10.1.1 = INTEGER: 21
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.11.1.1 = INTEGER: 3600
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.12.1.1 = INTEGER: 2100
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.13.1.1 = INTEGER: 6400
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.14.1.1 = INTEGER: 1200
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.16.1.1 = STRING: "E5"
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.17.1.1 = INTEGER: 6
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.18.1.1 = INTEGER: 6
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.19.1.1 = INTEGER: 12
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.20.1.1 = INTEGER: 4
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.21.1.1 = INTEGER: 29
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.22.1.1 = INTEGER: 29
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.23.1.1 = STRING: "Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz"
.1.3.6.1.4.1.674.10892.5.4.1100.30.1.26.1.1 = STRING: "CPU.Socket.1"
and
snmpwalk -v2c -c public -On 10.100.104.15 .1.3.6.1.4.1.674.10892.5.4.1100.30
.1.3.6.1.4.1.674.10892.5.4.1100.30 = No Such Object available on this agent at this OID
On iDRACS which do not respond, we also miss some other information like the Bios version, but we do get information of voltages, fans, temperatures, disk status. Rebooting/resetting/disconnecting power does not seem to help.
Any thoughts on what might be wrong?
Thanks in advance
staale