ISCSI Presenting issues to VMware 6.7

milne301

Cadet
Joined
Mar 8, 2021
Messages
5
Hi guys

I've been racking my brain here for the last 10 hours or so... wondering if you can help me:

setup:

1 x DL380 G9 - head of VMWARe for resource
- 2 x 10Gbit Cards​
- 2 x DACS going to 10G switch​
- ISCSI network setup with VMkernel ports 10.10.10.1 and .2 (/24)setup (Round robin)​

1 x DL380 G7 - for storage - TrueNAS
- LAGG setup to include both 10Gbit ports​
- VLAN 100 created including the 2 10G ports - IP 10.10.10.10/24​
NOTE: all interfaces are using MTU: 9000

Confirmed communication as i can ping each other between trueNAS and VMware on the 10 network.

I created an ISCSI share using the wizard and using the DISK (not pools) - added it to VMware via the 10.10.10.10 address. It sees the Disk fine which is 3.1TB. I try to create the VMFS datastore and i get the following: (it fails in vmware with cannot configure host)

2021-03-08T11:25:31.602Z cpu3:2102603)iscsi_vmk: iscsivmk_ConnNetRegister:2219: socket 0x4310337a4ba0 network tracker id 256862042 tracker.iSCSI.10.10.10.10 associated
2021-03-08T11:25:31.854Z cpu3:2102603)WARNING: iscsi_vmk: iscsivmk_StartConnection:880: vmhba64:CH:0 T:1 CN:0: iSCSI connection is being marked "ONLINE"
2021-03-08T11:25:31.854Z cpu3:2102603)WARNING: iscsi_vmk: iscsivmk_StartConnection:881: Sess [ISID: 00023d000001 TARGET: iqn.2005-10.org.freenas.ctl:test TPGT: 1 TSIH: 0]
2021-03-08T11:25:31.854Z cpu3:2102603)WARNING: iscsi_vmk: iscsivmk_StartConnection:882: Conn [CID: 0 L: 10.10.10.1:50926 R: 10.10.10.10:3260]
2021-03-08T11:25:31.861Z cpu12:2102200)NMP: nmp_ThrottleLogForDevice:3818: last error status from device naa.6589cfc000000b7b531a5f4afb23402c repeated 61 times
2021-03-08T11:25:31.861Z cpu12:2102200)NMP: nmp_ThrottleLogForDevice:3872: Cmd 0x84 (0x459b01674c80, 0) to dev "naa.6589cfc000000b7b531a5f4afb23402c" on path "vmhba64:C0:T1:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x7. Act:NONE
2021-03-08T11:25:31.861Z cpu12:2102200)ScsiDeviceIO: 3448: Cmd(0x459b01674c80) 0x84, CmdSN 0x197d from world 0 to dev "naa.6589cfc000000b7b531a5f4afb23402c" failed H:0x0 D:0x2 P:0x0 Valid sense da ta: 0x6 0x29 0x7.
2021-03-08T11:35:13.437Z cpu41:2098058)DVFilter: 6053: Checking disconnected filters for timeouts
2021-03-08T11:39:00.594Z cpu3:2102200)ScsiDeviceIO: 3469: Cmd(0x45bb0104edc0) 0x85, CmdSN 0x16 from world 2099800 to dev "naa.6589cfc000000b7b531a5f4afb23402c" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
[root@localhost:~] tail -n100 /var/log/vmkernel.log
2021-03-08T11:25:05.449Z cpu0:2102200)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic:525: Sess [ISID: 00023d000001 TARGET: iqn.2005-10.org.freenas.ctl:test TPGT: 1 TSIH: 0]
2021-03-08T11:25:05.449Z cpu0:2102200)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic:526: Conn [CID: 0 L: 10.10.10.1:55223 R: 10.10.10.10:3260]
2021-03-08T11:25:05.449Z cpu0:2102200)iscsi_vmk: iscsivmk_ConnRxNotifyFailure:1235: vmhba64:CH:0 T:1 CN:0: Connection rx notifying failure: Failed to Receive. State=Online
2021-03-08T11:25:05.449Z cpu0:2102200)iscsi_vmk: iscsivmk_ConnRxNotifyFailure:1236: Sess [ISID: 00023d000001 TARGET: iqn.2005-10.org.freenas.ctl:test TPGT: 1 TSIH: 0]
2021-03-08T11:25:05.449Z cpu0:2102200)iscsi_vmk: iscsivmk_ConnRxNotifyFailure:1237: Conn [CID: 0 L: 10.10.10.1:55223 R: 10.10.10.10:3260]
2021-03-08T11:25:05.449Z cpu27:2097899)WARNING: iscsi_vmk: iscsivmk_StopConnection:699: vmhba64:CH:0 T:1 CN:0: iSCSI connection is being marked "OFFLINE" (Event:6)
2021-03-08T11:25:05.449Z cpu27:2097899)WARNING: iscsi_vmk: iscsivmk_StopConnection:700: Sess [ISID: 00023d000001 TARGET: iqn.2005-10.org.freenas.ctl:test TPGT: 1 TSIH: 0]
2021-03-08T11:25:05.449Z cpu27:2097899)WARNING: iscsi_vmk: iscsivmk_StopConnection:701: Conn [CID: 0 L: 10.10.10.1:55223 R: 10.10.10.10:3260]
2021-03-08T11:25:05.454Z cpu27:2102527)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6589cfc000000b7b531a5f4afb23402c" state in doubt; requested fast path state update...
2021-03-08T11:25:06.450Z cpu27:2102527)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6589cfc000000b7b531a5f4afb23402c" state in doubt; requested fast path state update...
2021-03-08T11:25:07.449Z cpu27:2102527)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6589cfc000000b7b531a5f4afb23402c" state in doubt; requested fast path state update...
2021-03-08T11:25:08.233Z cpu3:2102603)iscsi_vmk: iscsivmk_ConnNetRegister:2191: socket 0x4310335a8200 network resource pool netsched.pools.persist.iscsi associated
2021-03-08T11:25:08.233Z cpu3:2102603)iscsi_vmk: iscsivmk_ConnNetRegister:2219: socket 0x4310335a8200 network tracker id 256862042 tracker.iSCSI.10.10.10.10 associated
2021-03-08T11:25:08.419Z cpu27:2102527)NMP: nmp_ThrottleLogForDevice:3801: last error status from device naa.6589cfc000000b7b531a5f4afb23402c repeated 640 times
2021-03-08T11:25:08.447Z cpu27:2102527)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6589cfc000000b7b531a5f4afb23402c" state in doubt; requested fast path state update...
2021-03-08T11:25:08.487Z cpu4:2102603)WARNING: iscsi_vmk: iscsivmk_StartConnection:880: vmhba64:CH:0 T:1 CN:0: iSCSI connection is being marked "ONLINE"
2021-03-08T11:25:08.487Z cpu4:2102603)WARNING: iscsi_vmk: iscsivmk_StartConnection:881: Sess [ISID: 00023d000001 TARGET: iqn.2005-10.org.freenas.ctl:test TPGT: 1 TSIH: 0]
2021-03-08T11:25:08.487Z cpu4:2102603)WARNING: iscsi_vmk: iscsivmk_StartConnection:882: Conn [CID: 0 L: 10.10.10.1:16387 R: 10.10.10.10:3260]
2021-03-08T11:25:08.490Z cpu21:2102201)Uplink: 5853: vmnic4: TSO packet has large MSS (8948) plus L7 offset (66), which can't be fit into MTU (1500)
2021-03-08T11:25:08.727Z cpu21:2102201)Uplink: 5887: vmnic4: Non TSO L2 payload size exceeds uplink MTU. FrameLen: 9014, L3 header offset: 14
2021-03-08T11:25:08.997Z cpu21:2102201)Uplink: 5887: vmnic4: Non TSO L2 payload size exceeds uplink MTU. FrameLen: 9014, L3 header offset: 14
2021-03-08T11:25:09.327Z cpu21:2102201)Uplink: 5887: vmnic4: Non TSO L2 payload size exceeds uplink MTU. FrameLen: 9014, L3 header offset: 14
2021-03-08T11:25:09.777Z cpu21:2102201)Uplink: 5887: vmnic4: Non TSO L2 payload size exceeds uplink MTU. FrameLen: 9014, L3 header offset: 14
2021-03-08T11:25:10.467Z cpu21:2102201)Uplink: 5887: vmnic4: Non TSO L2 payload size exceeds uplink MTU. FrameLen: 9014, L3 header offset: 14
2021-03-08T11:25:11.637Z cpu12:2102201)Uplink: 5887: vmnic4: Non TSO L2 payload size exceeds uplink MTU. FrameLen: 9014, L3 header offset: 14
2021-03-08T11:25:13.235Z cpu12:2102200)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic:522: vmhba64:CH:0 T:1 CN:0: Failed to receive data: Connection reset by peer
2021-03-08T11:25:13.235Z cpu12:2102200)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic:525: Sess [ISID: 00023d000001 TARGET: iqn.2005-10.org.freenas.ctl:test TPGT: 1 TSIH: 0]
2021-03-08T11:25:13.235Z cpu12:2102200)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic:526: Conn [CID: 0 L: 10.10.10.1:16387 R: 10.10.10.10:3260]
2021-03-08T11:25:13.235Z cpu12:2102200)iscsi_vmk: iscsivmk_ConnRxNotifyFailure:1235: vmhba64:CH:0 T:1 CN:0: Connection rx notifying failure: Failed to Receive. State=Online
2021-03-08T11:25:13.235Z cpu12:2102200)iscsi_vmk: iscsivmk_ConnRxNotifyFailure:1236: Sess [ISID: 00023d000001 TARGET: iqn.2005-10.org.freenas.ctl:test TPGT: 1 TSIH: 0]

Does that in a loop until the creation fails with : Cannot change the host configuration.

VMWare events keep saying it losing both paths when it tries to create the datastore - i dont know whats going on... why can it pick the storage up but not do anything with it? then lose connectivity and offline it. I've not told TrueNAS to make it read only either.

I've also trying to create the pool, then the zvol and it does the same thing. I've also tried going into vmware CLI and try formatting it to msdos but that fails to commit.

It's as if it fails when trying to write to the disk or communicate with it.

Any ideas?

Thanks
 

milne301

Cadet
Joined
Mar 8, 2021
Messages
5
looking over the forum it seems like this is exactly what i'm seeing - although its old so i thought this would have been fixed by now:

 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
VMware is spelled with a little w, just for your info.

Without knowing exactly what hardware you're using, this is nearly impossible to tell. Jumbo frame support works differently depending on card and driver and other factors, some of which are intangible.

NOTE: all interfaces are using MTU: 9000

Confirmed communication as i can ping each other between trueNAS and VMware on the 10 network.

So it seems you actually DIDN'T confirm communication, and the hypervisor is telling you that in the error messages. "ping" is not suitable to the task, "ping" with a payload size of 9000 *might* be.

First, why are you trying to use jumbo frames? This isn't really a good idea. If you REALLY need that last small percentage of speed that jumbo can give you, you are better off using 40Gbps tech. Modern hardware offload means that the interrupt stresses that jumbo was designed to reduce back in the early 2000's on a server just don't really exist on modern hardware, and getting a jumbo-compliant network, while totally possible, represents a potentially huge amount of effort and engineering for a very modest return.

If you're in the data center, getting jumbo working on a large set of machines known to properly support it can be worth it, but small scale users may be very much better off just running standard 1500 and taking advantage of the hardware offload. Jumbo really mostly makes sense on old hardware.

Anyways, your problem is this: The hypervisor is clearly signalling that you have some 1500 MTU-configured stuff in the hypervisor networking stack.
 

milne301

Cadet
Joined
Mar 8, 2021
Messages
5
VMware is spelled with a little w, just for your info.

Without knowing exactly what hardware you're using, this is nearly impossible to tell. Jumbo frame support works differently depending on card and driver and other factors, some of which are intangible.



So it seems you actually DIDN'T confirm communication, and the hypervisor is telling you that in the error messages. "ping" is not suitable to the task, "ping" with a payload size of 9000 *might* be.

First, why are you trying to use jumbo frames? This isn't really a good idea. If you REALLY need that last small percentage of speed that jumbo can give you, you are better off using 40Gbps tech. Modern hardware offload means that the interrupt stresses that jumbo was designed to reduce back in the early 2000's on a server just don't really exist on modern hardware, and getting a jumbo-compliant network, while totally possible, represents a potentially huge amount of effort and engineering for a very modest return.

If you're in the data center, getting jumbo working on a large set of machines known to properly support it can be worth it, but small scale users may be very much better off just running standard 1500 and taking advantage of the hardware offload. Jumbo really mostly makes sense on old hardware.

Anyways, your problem is this: The hypervisor is clearly signalling that you have some 1500 MTU-configured stuff in the hypervisor networking stack.


Forgive me i spelled "VMware" wrong in the first instance lol

I'm using Jumbo frames because almost everywhere i looked they say use jumbo frames for ISCSI traffic....

I can see that it is complaining about 1500 somewhere but where i have no idea. Switch ports etc are on 9000. I guess i can try and set everything to 1500... see if it works

Thanks for the analysis.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
2021-03-08T11:25:08.490Z cpu21:2102201)Uplink: 5853: vmnic4: TSO packet has large MSS (8948) plus L7 offset (66), which can't be fit into MTU (1500)

Check the vmkernel interface or the driver for vmnic4 to see if it supports and is configured to accept jumbos. Or as @jgreco suggests, just disable them entirely as you're likely to be bottlenecking elsewhere first.
 

milne301

Cadet
Joined
Mar 8, 2021
Messages
5
Check the vmkernel interface or the driver for vmnic4 to see if it supports and is configured to accept jumbos. Or as @jgreco suggests, just disable them entirely as you're likely to be bottlenecking elsewhere first.

thanks for the reply.

So...I took MTU off the vlan in truenas and it worked. I now have a 3.1TB data store. with everything as MTU 9000.

However the speeds are pretty diabolical for having 2 x 10Gb links. (Around 1.5 Gbps)

VMware:

2 VmKernel ports, one for ISCSI A and one for ISCSI B. When I set up the iSCSI connection I make it round robin.

TrueNAS

2 10Gbit connections, I’ve put them in an aggregated link. I’ve put a vlan on it for vlan 100 (my ISCSI), with the IP of 10.10.10.10.

Should I get rid of the aggregated link? Am I overdoing it?

Switches

2 x 24 port HP 10G 5800’s. All of the ports connecting to VMware and the truenas server have a trunk port with the vlans required - including 100. Connections to the switches are 10G DACS.


Any suggestions ?
 

milne301

Cadet
Joined
Mar 8, 2021
Messages
5
just wanted to reply here - appreciate all your help.

I stupidly changed everything BUT the MTU for vswitch0 - which was still at 1500 (doh!). Once changing this to 9000 the speed shot up, im now using MTU 9000 and getting good speeds.

Thanks
 
Top