ESXi iSCSI - no ping reply (NOP-Out) after 5 seconds; dropping connection

ChrisD. · Apr 22, 2022

HoneyBadger said:
I assume a vmkping -S 9000 -I <vmkernel interface> <your iSCSI target IP> returns good?

Yup, all good. Well, 8972 to allow for the ICMP and IP headers.

HoneyBadger said:
Is there any anti-DoS feature or "port security" enabled on the Unifi?

Not that I'm aware of.

HoneyBadger said:
Also kind of a blunt instrument, but can you enable flow control on an iSCSI interface on the Unifi for testing?

It's already enabled switch wide, I don't think it can be enabled or disabled per port (certainly not within the UniFi UI).

ChrisD. · Apr 22, 2022

Just to update, I put the iSCSI NICs onto a Mikrotik CRS305-1G-4S+ (10 Gbps switch) and I'm still seeing the same issue.

HoneyBadger · Apr 22, 2022

Got me puzzled here unfortunately. At this point I start looking towards things like NIC firmware revisions and other weird voodoo. Did you ever mention which model of 10Gbps card(s) you're using?

ChrisD. · Apr 22, 2022

You and me both. I've been in the networking/VMware game for quite some time, and although I mainly focus on cloud these days, I still do a fair bit of on-prem work.

I don't think I have mentioned the NICs. The ESXi hosts are using Intel 82599.

One NAS:

root@lando[~]# pciconf -lv | grep -A1 -B3 network
oce0@pci0:5:0:0: class=0x020000 card=0xe72a10df chip=0x071019a2 rev=0x03 hdr=0x00
vendor = 'Emulex Corporation'
device = 'OneConnect 10Gb NIC (be3)'
class = network
subclass = ethernet
oce1@pci0:5:0:1: class=0x020000 card=0xe72a10df chip=0x071019a2 rev=0x03 hdr=0x00
vendor = 'Emulex Corporation'
device = 'OneConnect 10Gb NIC (be3)'
class = network
subclass = ethernet

The other looks to be an Intel 82599 again:

root@truenas[~]# pciconf -lv | grep -A1 -B3 network
ix0@pci0:1:0:0: class=0x020000 card=0x061115d9 chip=0x10fb8086 rev=0x01 hdr=0x00
vendor = 'Intel Corporation'
device = '82599ES 10-Gigabit SFI/SFP+ Network Connection'
class = network
subclass = ethernet
ix1@pci0:1:0:1: class=0x020000 card=0x061115d9 chip=0x10fb8086 rev=0x01 hdr=0x00
vendor = 'Intel Corporation'
device = '82599ES 10-Gigabit SFI/SFP+ Network Connection'
class = network
subclass = ethernet

They are happy replicating on the other NIC from one to the other at a sustained 5 Gbps or so doing a replication task.

The second NAS, well, I can rebuild that and do what I want with it without any fear of data loss. I'm tempted to try some other NAS distro and see if the behaviour changes at all.

To be fair, you might be onto something. It could be some weird FreeBSD ism, with the specific type of NIC, that's why I put Scale onto the second NAS just to see. I even tried the on-board 2.5 Gbps NIC as Scale picked that up, no dice.

ChrisD. · Apr 26, 2022

Just to update, I disabled VAAI on both hosts and so far, no lock ups or iSCSI sense messages in vmkernel.log

Is there perhaps a bug within TrueNAS with VAAI that only shows if using > 1 Gbps NICs and NVMe backed storage?

I will of course monitor and report back.

HoneyBadger · Apr 26, 2022

ChrisD. said:
Just to update, I disabled VAAI on both hosts and so far, no lock ups or iSCSI sense messages in vmkernel.log

Good catch. If it's a VAAI primitive screwing things up I'll hedge my bets on ATS (HardwareAcceleratedLocking) - maybe re-enable (un-disable?) that one and see if the timeouts return.

ChrisD. · Apr 26, 2022

HoneyBadger said:
Good catch. If it's a VAAI primitive screwing things up I'll hedge my bets on ATS (HardwareAcceleratedLocking) - maybe re-enable (un-disable?) that one and see if the timeouts return.

It's actually HardwareAcceleratedMove = 0 which has made the difference. I'm still testing, I had one lock up earlier, although I've done so many configuration changes it's difficult to keep up with what I have changed and what I haven't.

I've rebuilt both hosts now and HardwareAcceleratedMove=0 is the only change from a default install.

Is there any further debugging I can do on this from the TrueNAS perspective, ie, trying to work out why it's having an issue with VAAI?

Important Announcement for the TrueNAS Community.

ESXi iSCSI - no ping reply (NOP-Out) after 5 seconds; dropping connection

ChrisD.

Dabbler

ChrisD.

Dabbler

HoneyBadger

actually does care

ChrisD.

Dabbler

ChrisD.

Dabbler

HoneyBadger

actually does care

ChrisD.

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

ESXi iSCSI - no ping reply (NOP-Out) after 5 seconds; dropping connection

ChrisD.

Dabbler

ChrisD.

Dabbler

HoneyBadger

actually does care

ChrisD.

Dabbler

ChrisD.

Dabbler

HoneyBadger

actually does care

ChrisD.

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "ESXi iSCSI - no ping reply (NOP-Out) after 5 seconds; dropping connection"

Similar threads