Slow iSCSI Read Performance

clifford64

Explorer
Joined
Aug 18, 2019
Messages
87
I have a TrueNAS system that acts as an iSCSI SAN for two ESXi hosts.
iSCSI network is 10Gb and segmented from data network. Have Jumbo Frames turned on. Not sure if it is working properly, I have a Mikrotik CRS305-1G-4S+IN and it is weird to configure jumbo frames.
TrueNAS Specs:
E3-1245 V2 (Hardly ever see it go above 5% usage, not sure if this is normal)
32GB ECC
2TBx12 drives. Drives are setup in mirrors and striped across 6 mirrors.
Truenas version 12.0 U2

ESXi Hosts: running 7.0, Ryzen 2700x, 32GB Ram.

I am getting very slow iSCSI read performance. Trying to perform a backup right now of a single VM that is 6TB, I am getting incredibly slow read performance and trying to do anything on any other vm is impossible.

This also shows when trying to do a storage vmotion of a VM from the iSCSI to a local SSD on either of my two ESXi hosts. Transferring the VM to a local datastore shows reads of only 30-80 MB/sec.

I have about 10 VMs in general. Very low usage in terms of compute. I do have a media server VM, so that does use quite a bit of computer here and there.

I have tried using a 500GB L2ARC drive, but haven't had any really noticeable differences in performance when using it.

When performing crystal disk mark tests on VMs stored on SAN, I get 10GB and saturate the the iSCSI network. Writing VMs to the SAN gets speeds of 300-400MB/sec. But trying to do anything with writes seems to be painfully slow and with the backup going, VMs are unusable. The Backup is only reading the VM at 30-70 MB/sec.

Any ideas on how to troubleshoot this?
 

clifford64

Explorer
Joined
Aug 18, 2019
Messages
87
I just performed a test with the SSD I have in the system. I created a new pool with just the SSD and created an iSCSI LUN from it. When performing vMotions to the SSD iSCSI pool, I get full speed. When performing vMotions from the iSCSI SSD pool to local SSD, it quickly drops off. When vMotioning backing, full speed. vMotion from iSCSI SSD to standard iSCSI pool nets full performance. vMotioning from iSCSI Pool to the SSD pool nets slow performance.

TLDR;
Storage vMotion performance
iSCSI to local SSD - slow 30-80MB/sec
local SSD to iSCSI - normal 300-400MB/sec
iSCSI SSD to local SSD - slow 50-100MB/sec
local SSD to iSCSI SSD - normal 300-400 MB/sec
iSCSI to iSCSI SSD - slow 50-100MB/sec
iSCSI SSD to iSCSI - normal 300-400MB/sec
Local SSD on one host to local SSD on another host - okish 200-300MB/sec
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
32GB is actually pretty low for your application. Do you also have L2ARC and SLOG in your pool? Also, what 10G NICs are you using on the TrueNAS side, and how are they configured?
 

clifford64

Explorer
Joined
Aug 18, 2019
Messages
87
32GB is actually pretty low for your application. Do you also have L2ARC and SLOG in your pool? Also, what 10G NICs are you using on the TrueNAS side, and how are they configured?

Dell Broadcom 57810s all around. I was using L2ARC with the 500GB 850 Pro, but it didn't really seem to help. I haven't tried SLOG, but again, writes don't seem to be an issue, it really only seems to be reads.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
OK, those use the if_bxe driver. Have you done any tuning on these via sysctl? In particular, do you have hardware offload enabled?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
The reason your iSCSI "write" speeds are blistering fast is because it doesn't do synchronous (safe) writes by default. Right now it's effectively "writing into RAM" whenever you svMotion data into your TN machine, and then ZFS spools the contents of RAM to disk as it's able.

Few questions for you about pool config:
  1. Is your ZVOL the entire (or up to the default-max 80%) the size of your main pool?
  2. Is it a sparse/thin ZVOL?
  3. I assume you're using VMFS6, but can you confirm this?
  4. How full is the system, both from the ZFS and VMFS perspective?
The backup of the 6TB VM may simply be beating your array into submission, because the constant cache-misses are hitting your spindles hard. What does a snapshot of gstat -dp look like when you're trying to migrate data off of your VMFS to local SSD (or performing a heavy read benchmark)?
 

clifford64

Explorer
Joined
Aug 18, 2019
Messages
87
OK, those use the if_bxe driver. Have you done any tuning on these via sysctl? In particular, do you have hardware offload enabled?

I have a few tuneables configured, but I'm not sure what I have. Any in particular to consider for the bxe cards?

I can post a full list later
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
All the if_bxe tunables are listed in the man page.
 

clifford64

Explorer
Joined
Aug 18, 2019
Messages
87
All the if_bxe tunables are listed in the man page.

Yeahhhhh, I would consider myself a beginner when it comes to that. Any keys in particular I should be setting and any idea on how I would do that?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399

clifford64

Explorer
Joined
Aug 18, 2019
Messages
87
OK, so you're using the default settings. Try these tunables for 10Gbps networking:


These are the system tuneables I have configured. I am not sure why though. I think I found them somewhere and tried doing some troubleshooting on my own without knowing what I am doing. I don't fully understand all the items in here, so if something should be removed let me know.

Are there any tuneables for the BXE card I should be considering?

Should I add all the tuneables listed under the link you sent for here? https://www.truenas.com/community/t...sue-in-one-direction.85552/page-3#post-605543
 

Attachments

  • Screenshot 2021-02-26 174843.png
    Screenshot 2021-02-26 174843.png
    56.3 KB · Views: 600

clifford64

Explorer
Joined
Aug 18, 2019
Messages
87
The reason your iSCSI "write" speeds are blistering fast is because it doesn't do synchronous (safe) writes by default. Right now it's effectively "writing into RAM" whenever you svMotion data into your TN machine, and then ZFS spools the contents of RAM to disk as it's able.

Few questions for you about pool config:
  1. Is your ZVOL the entire (or up to the default-max 80%) the size of your main pool?
  2. Is it a sparse/thin ZVOL?
  3. I assume you're using VMFS6, but can you confirm this?
  4. How full is the system, both from the ZFS and VMFS perspective?
The backup of the 6TB VM may simply be beating your array into submission, because the constant cache-misses are hitting your spindles hard. What does a snapshot of gstat -dp look like when you're trying to migrate data off of your VMFS to local SSD (or performing a heavy read benchmark)?


1. Yes, pool size is 10.47TB, ZVOL is 10.46TB. (I created a separate pool with the SSD and will test if ZVOL under 80% will be any more helpful)
2. I am 90% certain they are not sparse/thin, but I am not exactly certain how to check.
3. Can confirm they are VMFS6 datastores
4. The pool reports 99% used space on the dashboard. With "zpool list" it shows 10.9T size, 4.48T allocated, 6.39T Free. Within vCenter, The datastore is showing 6.46TB used. All VMs are about 40-80GB system drives. 10 VMs in total. Maybe using about half that in terms of actual space being used within each VM. Main storage on media server VM is reporting as 4.3TB used of the 6TB that has been allocated to it.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Yes, those are confirmed working tunables for 10G networking.
 

clifford64

Explorer
Joined
Aug 18, 2019
Messages
87
Yes, those are confirmed working tunables for 10G networking.

Yes mine are good?

Or yes I should add all tuneables in the link you listed?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Add the tunables from the link I posted.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
I would generally advise against "tuning" parameters that come from somewhat "dubious" sources (in the sense that they are not known to work under all circumstances). And the latter is a pretty rare thing, because almost always such parameters represent trade-offs. Like increasing that value will improve write performance for scenario A, but will negatively impact read performance for scenario B. Or think of indices in the context of a relational database. They are great for read performance, but slow down write activity. Applying a set of parameters without understanding in detail(!) how they relate to a given situation, is a dangerous thing.
 

clifford64

Explorer
Joined
Aug 18, 2019
Messages
87
The reason your iSCSI "write" speeds are blistering fast is because it doesn't do synchronous (safe) writes by default. Right now it's effectively "writing into RAM" whenever you svMotion data into your TN machine, and then ZFS spools the contents of RAM to disk as it's able.

Few questions for you about pool config:
  1. Is your ZVOL the entire (or up to the default-max 80%) the size of your main pool?
  2. Is it a sparse/thin ZVOL?
  3. I assume you're using VMFS6, but can you confirm this?
  4. How full is the system, both from the ZFS and VMFS perspective?
The backup of the 6TB VM may simply be beating your array into submission, because the constant cache-misses are hitting your spindles hard. What does a snapshot of gstat -dp look like when you're trying to migrate data off of your VMFS to local SSD (or performing a heavy read benchmark)?


As for the gstat -dp, here is a screenshot. I monitored the entire transfer and average busy was about 30%. Every so often, one drive would spike to 50%, but nothing crazy that I can see. Transferring to iSCSI SAN shows a different story.

Also attached are crystal disk for 1gb and 32gb. I am guessing the 1gb test will be irrelevant mostly because it will all be written to the iSCSI service ram so reads and writes will be to ram speed and not actual disk speed?

The last two screenshots are a vMotion test of a 100GB vm from iSCSI SAN to local SSD and it takes about 20-30 minutes to transfer. Values listed are pretty much what is was the entire transfer. Then transferring the same VM back to the iSCSI SAN.
 

Attachments

  • vMotion from iSCSI.png
    vMotion from iSCSI.png
    147.4 KB · Views: 581
  • vMotion to iSCSI.png
    vMotion to iSCSI.png
    23.9 KB · Views: 412
  • Crystal Disk 1GB.png
    Crystal Disk 1GB.png
    72.9 KB · Views: 409
  • Crystal Disk 32GB.png
    Crystal Disk 32GB.png
    76.3 KB · Views: 404
  • vMotion from iSCSI 100GB VM.png
    vMotion from iSCSI 100GB VM.png
    147.2 KB · Views: 376
  • vMotion to iSCSI 100GB vm.png
    vMotion to iSCSI 100GB vm.png
    23.8 KB · Views: 500
Last edited:

clifford64

Explorer
Joined
Aug 18, 2019
Messages
87
Add the tunables from the link I posted.

Look at my most recent posts. The tuneables you have suggested I think have to do more with networking that anything. I am not sure this is a network related issue as I am able to saturate the connection performing crystal disk mark 1GB tests and its reading and writing to the ram/zfs cache.

If you think I should still configure them, let me know.
 

mrstevemosher

Dabbler
Joined
Dec 21, 2020
Messages
49
clifford64, We here are jealous of your numbers. We're lucky to hit 250MB. Looks like I'll start a new thread but man I'd love to have your numbers.
 

clifford64

Explorer
Joined
Aug 18, 2019
Messages
87
clifford64, We here are jealous of your numbers. We're lucky to hit 250MB. Looks like I'll start a new thread but man I'd love to have your numbers.

That's VM performance and VM performance is fine. Everything runs and starts really fast. It's the vMotioning and such that I have problems with. That would also probably explain why backups are slow as well. It's mounting the vcenter snapshot.
 
Top