ESXi iSCSI - no ping reply (NOP-Out) after 5 seconds; dropping connection

ChrisD.

Dabbler
Joined
Apr 18, 2022
Messages
26
I assume a vmkping -S 9000 -I <vmkernel interface> <your iSCSI target IP> returns good?
Yup, all good. Well, 8972 to allow for the ICMP and IP headers.
Is there any anti-DoS feature or "port security" enabled on the Unifi?
Not that I'm aware of.
Also kind of a blunt instrument, but can you enable flow control on an iSCSI interface on the Unifi for testing?
It's already enabled switch wide, I don't think it can be enabled or disabled per port (certainly not within the UniFi UI).
 

ChrisD.

Dabbler
Joined
Apr 18, 2022
Messages
26
Just to update, I put the iSCSI NICs onto a Mikrotik CRS305-1G-4S+ (10 Gbps switch) and I'm still seeing the same issue.
 

ChrisD.

Dabbler
Joined
Apr 18, 2022
Messages
26
You and me both. I've been in the networking/VMware game for quite some time, and although I mainly focus on cloud these days, I still do a fair bit of on-prem work.

I don't think I have mentioned the NICs. The ESXi hosts are using Intel 82599.

One NAS:

root@lando[~]# pciconf -lv | grep -A1 -B3 network
oce0@pci0:5:0:0: class=0x020000 card=0xe72a10df chip=0x071019a2 rev=0x03 hdr=0x00
vendor = 'Emulex Corporation'
device = 'OneConnect 10Gb NIC (be3)'
class = network
subclass = ethernet
oce1@pci0:5:0:1: class=0x020000 card=0xe72a10df chip=0x071019a2 rev=0x03 hdr=0x00
vendor = 'Emulex Corporation'
device = 'OneConnect 10Gb NIC (be3)'
class = network
subclass = ethernet

The other looks to be an Intel 82599 again:

root@truenas[~]# pciconf -lv | grep -A1 -B3 network
ix0@pci0:1:0:0: class=0x020000 card=0x061115d9 chip=0x10fb8086 rev=0x01 hdr=0x00
vendor = 'Intel Corporation'
device = '82599ES 10-Gigabit SFI/SFP+ Network Connection'
class = network
subclass = ethernet
ix1@pci0:1:0:1: class=0x020000 card=0x061115d9 chip=0x10fb8086 rev=0x01 hdr=0x00
vendor = 'Intel Corporation'
device = '82599ES 10-Gigabit SFI/SFP+ Network Connection'
class = network
subclass = ethernet

They are happy replicating on the other NIC from one to the other at a sustained 5 Gbps or so doing a replication task.

The second NAS, well, I can rebuild that and do what I want with it without any fear of data loss. I'm tempted to try some other NAS distro and see if the behaviour changes at all.

To be fair, you might be onto something. It could be some weird FreeBSD ism, with the specific type of NIC, that's why I put Scale onto the second NAS just to see. I even tried the on-board 2.5 Gbps NIC as Scale picked that up, no dice.
 

ChrisD.

Dabbler
Joined
Apr 18, 2022
Messages
26
Just to update, I disabled VAAI on both hosts and so far, no lock ups or iSCSI sense messages in vmkernel.log

Is there perhaps a bug within TrueNAS with VAAI that only shows if using > 1 Gbps NICs and NVMe backed storage?

I will of course monitor and report back.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Just to update, I disabled VAAI on both hosts and so far, no lock ups or iSCSI sense messages in vmkernel.log
Good catch. If it's a VAAI primitive screwing things up I'll hedge my bets on ATS (HardwareAcceleratedLocking) - maybe re-enable (un-disable?) that one and see if the timeouts return.
 

ChrisD.

Dabbler
Joined
Apr 18, 2022
Messages
26
Good catch. If it's a VAAI primitive screwing things up I'll hedge my bets on ATS (HardwareAcceleratedLocking) - maybe re-enable (un-disable?) that one and see if the timeouts return.
It's actually HardwareAcceleratedMove = 0 which has made the difference. I'm still testing, I had one lock up earlier, although I've done so many configuration changes it's difficult to keep up with what I have changed and what I haven't.

I've rebuilt both hosts now and HardwareAcceleratedMove=0 is the only change from a default install.

Is there any further debugging I can do on this from the TrueNAS perspective, ie, trying to work out why it's having an issue with VAAI?
 
Top