Any suggestions on getting better performance on VMware 7.0/iSCSI

RegularJoe

Patron
Joined
Aug 19, 2013
Messages
330
Hi All,

I am using iSCSI and Fiber channel QLogic 2564 HBA's. My FC switches are old 4gbps Brocade 4900's and I am wondering if there is any tweaking to do on the iSCSI side for more threads or more buffers?

Hi All,

I am using iSCSI and Fiber channel QLogic 2564 HBA's. My FC switches are old 4gbps Brocade 4900's and I am wondering if there is any tweaking to do on the iSCSI side. My zpool is a 3 way mirror of 3tb 7200rpm SAS drives with presently 21 drives in the pool for 7 stripes soon to be 14 stripes.

I am using VMware 7.0.2 Build 17867351 and TrueNAS Core 12.0-U4 with Dual E5-2660 v2 processors with 192GB of RAM. The VM is a windows server 2019 with 8 cores and 8gb ram.

Here is my /etc/ctl.conf

portal-group "default" {
}

lun "truenas1" {
ctl-lun "0"
path "/dev/zvol/vol1/vmware/tns1"
blocksize "512"
option "pblocksize" "0"
serial "246e968dc068000"
device-id "iSCSI Disk 246e968dc068000 "
option "vendor" "TrueNAS"
option "product" "iSCSI Disk"
option "revision" "0123"
option "naa" "0x6589cfc00000070d6306ae680db96d14"
option "insecure_tpc" "on"
option "rpm" "7200"
}

target "iqn.2005-10.org.superstore1.ctl:tns1" {
alias "tns1"

lun "0" "tns1"
}
 

Attachments

  • DiskLUNonVMWAREwithRRfcPATHandIOPSequal1.JPG
    DiskLUNonVMWAREwithRRfcPATHandIOPSequal1.JPG
    89.1 KB · Views: 267
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Start by reading this:

You probably don't have enough VDEVs for the IOPS you want/need.

blocksize "512"
I thought I remembered reading somewhere (maybe something @jgreco said?, but I can't find for now) that VMware uses small block sizes for updates (like 32), in which case I think 512 may be too big, but I also found this:

Which pretty clearly says 4K is the right number.

Either way, you need to change what you have there.

Also consider looking into jumbo frames if you go with 4K as the standard MTU of 1500 won't contain a 4K block.

The VM is a windows server 2019 with 8 cores and 8gb ram.
The block storage thread already tells you that you need 64GB of RAM to go well with block storage, so consider that.

Also have a look at the official ZFS tuning do:
 

RegularJoe

Patron
Joined
Aug 19, 2013
Messages
330
I have a lot of RAM and CPU on these boxes. I will do tests with NVME so that I know it is not the fiber channel but the disks that are the issue. I did get a fast svmotion from one LUN to another LUN on TrueNAS Core 12.0-u4 the other day at 750mb/s but was not able to repeat that and had a reboot as well as changes after that.

I think there is a missing link on the fiber channel cards when loading the driver/firmware. I assume when the driver loads it pushes the firmware TrueNAS wants to use.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I have a lot of RAM and CPU on these boxes. I will do tests with NVME so that I know it is not the fiber channel but the disks that are the issue. I did get a fast svmotion from one LUN to another LUN on TrueNAS Core 12.0-u4 the other day at 750mb/s but was not able to repeat that and had a reboot as well as changes after that.

The fast svMotion between LUNs is because you've got VAAI XCOPY - the data never left the TrueNAS machine and was copied directly from zvol to zvol.

At the lower level, check your Brocades for frame loss, ensure that your glass is good and you're not mismatching 50/125 with 62.5/125.

Edit: Just saw your image name - that seems to imply you've already set up RR and IOPS=1. Is this a VMDK on VMFS or ZVOL as RDM?

Check your VMware path selection policy - you can set it to round-robin with a 1 I/O switch using the esxcli command below, but you'll need to reclaim the LUNs (reboot, usually) to have it take full effect.

esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -V "TrueNAS" -M "iSCSI Disk" -P "VMW_PSP_RR" -O "iops=1" -e "TrueNAS iSCSI Claim Rule"


You also appear to be using sync=standard (writes significantly faster than reads) which does expose you to a risk of data loss as well. See the resource here:

 

RegularJoe

Patron
Joined
Aug 19, 2013
Messages
330
The LUNs were between two TrueNAS hosts, so it should have been fast.

I am using vmdk on vmfs and that is on a zvol, not a file

I have looked at the brocade switch with porterrshow and have no errors for many days and even under heavy io.

I did find one of 48 drives had smart errors increasing and another one during the smart test at noon was still running after 2.5 hours later, so I assume that one is junk. The other bad thing is I added vdevs rather than doing all 48 drives at once, this cause unbalanced io across all the vdevs, some were way too full at over 90%. Since I have 2 idenical TrueNAS hosts I get to test again once I get all my drives healthy again.
 

RegularJoe

Patron
Joined
Aug 19, 2013
Messages
330
So I have found more drive errors(3 uncorrected errors), one disk let me write to it but when I did a scrub it kicked it out as a bad drive for read checksums.

10:32 to rebuild
09:46 to scrub an idenical pool
09:10 to fill hte pool with 24.4T of random data from dd via a NVME disk
 
Top