Read performance slow.. - ESXI VMFS6 ISCSI HDD Mirror/RAID 10

apnetworks

Cadet
Joined
Dec 7, 2021
Messages
5
Hello,

I have been reading quite a few of the posts and from various sources it seems that the general consensus for highest IOPs and throughput is to run striped mirrored VDEVS ( a la RAID10 ) in an ISCSI as opposed to NFS flavor.

I am aware there is a new version of NFS4 but for various reasons ISCSI is just fine with me.

So I put me 12th gen double-cpu dell server to work with 64GB RAM and 8x2TB 7200 RPM 4kn Seagate Enterprise drives. Add me the ole read SSD cache and write SSD cache. (Could add more memory, but not sure it's justfied when the TrueNas Mini X+ is using 32GB and posting better reads with WD Red 5400 RPM drives)

Connected slot a on ESXi host to slot b on Truenas and mapped the ISCSI target / software ISCSI adapter using a 10Gbe link.

Ran the base install of Server 2016 off of the ISCSI share.

Put together some numbers with CrystalMark.

Crystal_Disk_L2ARC_Trimmed.png


That's when I stopped and said .. hey.. Everyone else is getting 400 mb reads on average when they are doing their testing. FIO results from an Ubuntu VM for sequential read are abysmmal also.

Wondering if I am missing something here / how to troubleshoot that bottleneck.

From what I've read playing with the record block sizes isn't going to help as ZFS "bobs and weaves" and uses whatever block size it wants up to the limit that is set. Using smaller block sizes when appropriate.

Seeing as how VMware latest is using 1MB block sizes with 8KB sub-blocks I would assume sub-blocks are bring transferred in 8KB chunks. ISCSI is sending with the 1500MTU standard (not Jumbo) to the TrueNas storage over the 10GbE link.

So I would assume 1.5 KB chunks are flying across the wire 4-5 times for an esxi sub-block to a TrueNas unit that sees the 1.5 packets and puts them into 8 k chunks by sending them in two pieces to a 4kn drive?

That all gets very confusing and would seem to be the write process of things, which I am not having an issue with.

So reversing that process for reads would be - read two 4k sectors from drives, send them 1.5k at a time to vmware to assemble an 8kb sub-block.

Only options I have experimented with are - SLOG/NO SLOG - didn't make a difference (not worried about write performance right now so that was expected)
L2ARC cache SSD(s) - didn't make a difference

Disabled "Atime" on Zvol.

The metrics that I have performed have been within the ESXI environment and I am currently installing the Phoronix Test Suite directly onto the TrueNas server to test the drives outside of ISCSI connectivity.

I attempted to download and run the script from ftp.sol.net but links are broken. I have checked that with an FTP client and it is not there.

Sol.Net.Missing.png



Any other recommendations to increase read performance for this setup?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I attempted to download and run the script from ftp.sol.net but links are broken. I have checked that with an FTP client and it is not there.

Presumably referring to solnet-array-test-v2?

"Try harder, or with a working FTP client"

Code:
% fetch ftp://ftp.sol.net/incoming/solnet-array-test-v2.sh
solnet-array-test-v2.sh                       100% of 9589  B   24 MBps


Works fine. That is not an ephemeral FTP server, and stuff put there generally sits there forever. Some of it has been there for a quarter of a century (the thing has a frickin' copy of FreeBSD-1.0R). I *promise* you it's there. The directory is not open to directory listing, however.

It's not a bad idea to check using that tool to see if your array has any I/O idiosyncrasies, but your problem may be a bit more general.

Make sure you've read up:


and


Without some idea of how you've designed your pool, my main comments are that it feels like your current setup might only have two or three vdevs, and is tight on memory, so you tend to lose out a bit on performance. ZFS is incredibly dependent on its ability to trade off seeks for ARC/L2ARC, and the way you get stellar block storage performance out of a ZFS filer is to resource it to an almost insane level. If you go far enough, such as making sure that the entire working set is cacheable in ARC/L2ARC, you get performance that can be hard to tell apart from SSD performance, out of mere HDD's. But this requires a LOT of resources, and in my opinion, unless you NEED the shared resources, it may be cheaper to just buy SSD these days. A RAID1 of two 870 Evo 1TB's and a good (used) RAID controller is $300-$400, and highly expandable for just the cost of additional SSD.
 

apnetworks

Cadet
Joined
Dec 7, 2021
Messages
5
It seems I was mistaken as using the fetch command for the sol.net script worked as described through an ssh session to the truenas.

It seems the array is getting less read speed than it would to a single drive.

One thing I noticed after reviewing the sol.net results is that it has included an SD card mirror (TrueNAS boot drives) as part of the "lun 0" dataset.) Then it went on to do testing on that device when selecting defaults. I'm wondering if da8 is somehow being included within the lun 0 even though it isn't setup in TrueNAS GUI? Perhaps that is the cause of slow reads?

1638972886537.png



Here's my vdev setup

Vdev_stripe_mirror_raid10.png
 

Attachments

  • Free_Nas_Sol.Net.png
    Free_Nas_Sol.Net.png
    86.7 KB · Views: 133

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
the sol.net results is that it has included an SD card mirror (TrueNAS boot drives) as part of the "lun 0" dataset.

This isn't a ZFS-specific tool and it has no idea what your datasets are. It dates from the late '90's, so, "pre-ZFS". It's still a great tool.

When you select "all disks", it uses camcontrol and targets all disks. Your boot disk is just another disk to UNIX.

If you wanted to limit it, select option 2) and then provide something it can grep for, such as ST2000

I also notice that under "cache" you seem to have MFI based devices. The MFI driver is specifically one of the things I was targeting when I wrote

 

apnetworks

Cadet
Joined
Dec 7, 2021
Messages
5
I am aware of the pass-thru settings and have set them on the direct attached storage enclosure for my LSI-Based Dell PERC. Just wondering if it is normal to see multiple devices at target 0 <dataset name> that are not actually part of the dataset when doing 'camcontrol devlist' command.

Target0_LUN0.png
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
if it is normal to see multiple devices at target 0 <dataset name> that are not actually part of the dataset

UNIX does not have a concept of "<dataset name>". A dataset is something that you establish on top of a ZFS pool. The low level SCSI stuff neither knows nor cares about ZFS or what you might have done up top. It's like wondering why you aren't seeing /usr or /var listed here.

Individual hard drives always (almost always) show up as lun 0 because the array concepts that gave us target+lun naming do not make sense for individual disks, and they are merely enumerated as targets on a given SCSI bus. So your last ST2000 HDD is showing up on scbus7 and has been determined to be target 7 on scbus7. It also happened to wind up as /dev/da7 but this is sorta random luck.
 

apnetworks

Cadet
Joined
Dec 7, 2021
Messages
5
Seems like there's a few things I can try with equipment on-hand. Another backplane/enclosure and a different Dell PERC Raid card. Perhaps moving the l2arc SSDs over to the enclosure and away from the onboard RAID controller may be beneficial as well. I'll post my results when I swap things around.

Thank you for your help!
 
Top