Errors with ESXi and iSCSI - no ping reply (NOP-out) after 5 seconds

Status
Not open for further replies.

dvg_lab

Dabbler
Joined
Jan 27, 2015
Messages
11
Hi Everyone,

I've been succesfully runing FreeNAS iSCSI transport with ESXi for abut two years. We have two ESXi 5.5 update 1 (HP DL380 G8) hosts and two FreeNAS 10.3 iSCSI storages, based on some Asus and Supermicro 2U servers. So, a week ago we added 3-rd ESXi host HP DL380 G9 which require 5.5 update 3 and we had to update all other hosts to 5.5 update 3 too, as well as vcenter. After that iSCSI transport turned to hot mess. We didn't do anything with FreeNAS hosts, even we didn't reboot they, we just do upgrade ESXi 5.5 update 1 to 5.5 update 3 and got troubles. One FreeNAS host we use as backend for vSphere Data Protection appliance. So when it starts backup job iSCSI connection breaks after few hours and we got 50% packet loss between iSCSI target ethernet interface and vmkernel ESXi adapter. FreeNAS starts logging:

Code:
Oct  1 01:06:02 freenas0 WARNING: 10.250.100.31 (iqn.1998-01.com.vmware:esxi01-2207b970): no ping reply (NOP-Out) after 5 seconds; dropping connection
Oct  1 01:06:21 freenas0 WARNING: 10.250.100.31 (iqn.1998-01.com.vmware:esxi01-2207b970): no ping reply (NOP-Out) after 5 seconds; dropping connection
Oct  1 01:06:47 freenas0 WARNING: 10.250.100.31 (iqn.1998-01.com.vmware:esxi01-2207b970): no ping reply (NOP-Out) after 5 seconds; dropping connection
Oct  1 01:07:04 freenas0 WARNING: 10.250.100.31 (iqn.1998-01.com.vmware:esxi01-2207b970): no ping reply (NOP-Out) after 5 seconds; dropping connection


Vcenter at the same time don't shows any visible errors. ESXi host that runing VDP appliance freezes all virtual machines runing on both FreeNAS hosts and we have to reboot this ESXi host to restore iSCSI connection. I've just tried to disable broken Active (I/O) path and it restoring normal iSCSI session on 2nd MPIO path. So I think the problem maybe somewhere in ESXi, or in advanced ESXi iSCSI settings, but I don't know which settings can help.

My config:
FreeNAS hosts has 2-port 10G intel 82599ES controller, we use it for MPIO ( for the first freenas ip settings is 10.250.100.15/24 and 10.250.101.15/24 )
ESXi hosts has 2-port 10G Solarflare 9020 controller (for the first esxi host ip settings is 10.250.100.31/24 and 10.250.101.31/24)
iSCSI traffic separate to isolated vlan15 and vlan16 on Cisco Nexus 5548UP. MTU is 1500 on all interfaces.

So, as I said before this scheme worked like a charm before update to ESXi 5.5u3 and as far as freenas is not oficially supported by VMWare I have to start this thread here to try to resolve this issue.

Any thoughts?
 

dvg_lab

Dabbler
Joined
Jan 27, 2015
Messages
11
When iSCSI breaks then tcpdump on freenas shows this

Code:
	10.250.101.15.iscsi-target > 10.250.101.31.23985: Flags [.], cksum 0xe028 (incorrect -> 0x3f69), seq 1035321:1051249, ack 4384, win 16384, options [nop,nop,TS val 2918905205 ecr 20461335], length 15928
08:17:03.781414 IP (tos 0x0, ttl 64, id 64329, offset 0, flags [DF], proto TCP (6), length 52)
	10.250.101.31.23985 > 10.250.101.15.iscsi-target: Flags [.], cksum 0x8f7e (correct), seq 4384, ack 1051249, win 368, options [nop,nop,TS val 20461335 ecr 2918905205], length 0
08:17:03.781424 IP (tos 0x0, ttl 64, id 36153, offset 0, flags [DF], proto TCP (6), length 15980, bad cksum 0 (->8f30)!)
	10.250.101.15.iscsi-target > 10.250.101.31.23985: Flags [.], cksum 0xe028 (incorrect -> 0xcb73), seq 1051249:1067177, ack 4384, win 16384, options [nop,nop,TS val 2918905205 ecr 20461335], length 15928
08:17:03.781886 IP (tos 0x0, ttl 64, id 64331, offset 0, flags [DF], proto TCP (6), length 52)
	10.250.101.31.23985 > 10.250.101.15.iscsi-target: Flags [.], cksum 0x7344 (correct), seq 4384, ack 1058489, win 354, options [nop,nop,TS val 20461335 ecr 2918905205], length 0
08:17:03.781894 IP (tos 0x0, ttl 64, id 36154, offset 0, flags [DF], proto TCP (6), length 7292, bad cksum 0 (->b11f)!)
	10.250.101.15.iscsi-target > 10.250.101.31.23985: Flags [.], cksum 0xe028 (incorrect -> 0xc0c3), seq 1067177:1074417, ack 4384, win 16384, options [nop,nop,TS val 2918905206 ecr 20461335], length 7240
08:17:03.793694 IP (tos 0x0, ttl 64, id 36155, offset 0, flags [DF], proto TCP (6), length 52, bad cksum 0 (->cd64)!)
	10.250.101.15.iscsi-target > 10.250.101.33.50287: Flags [.], cksum 0xe04a (incorrect -> 0x8239), seq 48, ack 49, win 16384, options [nop,nop,TS val 412093456 ecr 172234321], length 0
08:17:03.793706 IP (tos 0x0, ttl 64, id 36158, offset 0, flags [DF], proto TCP (6), length 52, bad cksum 0 (->cd63)!)
	10.250.101.15.iscsi-target > 10.250.101.31.34608: Flags [.], cksum 0xe048 (incorrect -> 0xdffb), seq 48, ack 49, win 16384, options [nop,nop,TS val 3388088616 ecr 20461326], length 0
08:17:03.793710 IP (tos 0x0, ttl 64, id 36159, offset 0, flags [DF], proto TCP (6), length 52, bad cksum 0 (->cd60)!)
	10.250.101.15.iscsi-target > 10.250.101.33.11990: Flags [.], cksum 0xe04a (incorrect -> 0x1ed6), seq 48, ack 49, win 16384, options [nop,nop,TS val 3757705449 ecr 172234321], length 0
08:17:03.793720 IP (tos 0x0, ttl 64, id 36162, offset 0, flags [DF], proto TCP (6), length 52, bad cksum 0 (->cd5e)!)
	10.250.101.15.iscsi-target > 10.250.101.32.43267: Flags [.], cksum 0xe049 (incorrect -> 0x2fd9), seq 48, ack 49, win 16384, options [nop,nop,TS val 1410287613 ecr 33844209], length 0
08:17:03.793727 IP (tos 0x0, ttl 64, id 36164, offset 0, flags [DF], proto TCP (6), length 52, bad cksum 0 (->cd5d)!)
	10.250.101.15.iscsi-target > 10.250.101.31.28913: Flags [.], cksum 0xe048 (incorrect -> 0x0f93), seq 96, ack 49, win 16384, options [nop,nop,TS val 2345204728 ecr 20461326], length 0
08:17:03.793730 IP (tos 0x0, ttl 64, id 36165, offset 0, flags [DF], proto TCP (6), length 52, bad cksum 0 (->cd5b)!)
	10.250.101.15.iscsi-target > 10.250.101.32.42019: Flags [.], cksum 0xe049 (incorrect -> 0x5eeb), seq 48, ack 49, win 16384, options [nop,nop,TS val 2223709534 ecr 33844209], length 0
08:17:03.928326 IP (tos 0x0, ttl 255, id 36168, offset 0, flags [none], proto UDP (17), length 467, bad cksum 0 (->dbcc)!)
	10.250.101.15.mdns > 224.0.0.251.mdns: [bad udp cksum 0x52d5 -> 0x3b58!] 0*- [0q] 14/0/5 freenas0._ssh._tcp.local. (Cache flush) [1h15m] TXT "", _services._dns-sd._udp.local. [1h15m] PTR _ssh._tcp.local., _ssh._tcp.local. [1h15m] PTR freenas0._ssh._tcp.local., freenas0._sftp-ssh._tcp.local. (Cache flush) [1h15m] TXT "", _services._dns-sd._udp.local. [1h15m] PTR _sftp-ssh._tcp.local., _sftp-ssh._tcp.local. [1h15m] PTR freenas0._sftp-ssh._tcp.local., freenas0._http._tcp.local. (Cache flush) [1h15m] TXT "", _services._dns-sd._udp.local. [1h15m] PTR _http._tcp.local., _http._tcp.local. [1h15m] PTR freenas0._http._tcp.local., freenas0._ssh._tcp.local. (Cache flush) [2m] SRV freenas0.local.:26 0 0, freenas0._sftp-ssh._tcp.local. (Cache flush) [2m] SRV freenas0.local.:26 0 0, freenas0._http._tcp.local. (Cache flush) [2m] SRV freenas0.local.:80 0 0, 15.101.250.10.in-addr.arpa. (Cache flush) [2m] PTR freenas0.local., freenas0.local. (Cache flush) [2m] A 10.250.101.15 ar: freenas0._ssh._tcp.local. (Cache flush) [1h15m] NSEC, freenas0._sftp-ssh._tcp.local. (Cache flush) [1h15m] NSEC, freenas0._http._tcp.local. (Cache flush) [1h15m] NSEC, 15.101.250.10.in-addr.arpa. (Cache flush) [2m] NSEC, freenas0.local. (Cache flush) [2m] NSEC (439)
08:17:04.026062 IP (tos 0x0, ttl 64, id 36169, offset 0, flags [DF], proto TCP (6), length 1500, bad cksum 0 (->c7b0)!)


Why checksum on iscsi target packet is incorrect? In the same time checkum on iscsi initiator packet is correct.
 
Status
Not open for further replies.
Top