Slow Performance, High Latency, Speed drops Truenas 12

Frollo

Cadet
Joined
Feb 7, 2022
Messages
8
Greetings yall, hoping to get some help wrapping my head around the issue I'm seeing. I have always been having issues similar to the experience in the following post
which I have attributed to hardware or some other inadequate configuration but performance was tolerable as I could burst to 10GB writes now and again and sustain ~6GB/s until the crash in speed. After updating to 12.0-U8 I'm experiencing worsening performance that I would like some assistance with.

This is a home lab environment primarily hosting VM storage and Game storage (largely steam) via iscsi due to game mounting requirements and issues. I will out myself right out of the gate and state that the zvols are raidz1 (3 drives) and I know that it is not ideal for block storage as per @jgreco and his wonderful posts and guides. While it is a less than ideal state I haven't procured the drive(s) to move everything over and blow away the pool yet. That said the network performance is abysmal atm between the NAS and Windows 11 over iscsi (jumbo frames or not). Based on some of the guides and posts I have seen of others troubleshooting I am posting the performance across the drives which is showing I should be able to hit 10g speeds (which I had prior to 12.0-U8)
Screenshot 2022-02-07 174430.jpg

I'm sure we would want better for block storage however largely the speed has been acceptable.

My pool configuration is as follows with 3 SSd's for SLOG and l2ARC, spinning rust sas drives for the rest
Screenshot 2022-02-07 174634.jpg


Pool configuration is here and note it says 24% which is roughly accurate given the 1vm in it's datastore and the 4tb of games on the iscsi share
Screenshot 2022-02-07 174736.jpg


As previously mentioned I have 10gbe with 4 virtual nics for MPIO however since I'm the only one using it to the iscsi share and esxi is connected once to the vm datastore I can and likely will trim that back if not needed
Screenshot 2022-02-07 174829.jpg


My tuneables are as follows
Screenshot 2022-02-07 174659.jpg

Screenshot 2022-02-07 174709.jpg


Attached are some graphs for experienced performance when attempting to do any work on the nas be it a single file move, stream a file, benchmark, etc. I'm at a loss as to what would have changed or why the performance is tanking. Network latency is shooting through the roof on any file access and when the connection to the nas hits 99%-100 latency can be anywhere from 100-1000ms making everything come to a screeching halt. While this graph is showing 104 MB/s that is actually a luxury due to the forthcoming results from a crystalmark benchmark and likely due to larger file xfer.
Screenshot 2022-02-07 174802.jpg


Here in all it's glory is what I'm experiencing from a file xfer perspective. I had previously experienced this at almost 1GB/s in the SEQ1 portions however given my setup I never expected it to be that high nor consistently that high. That said read performance is now terrible and is overshadowed by the current write performance which is backwards from what i have come to experience. No changes to hardware, or software other than the newest 12.0-U8 updated.
Screenshot 2022-02-07 175059.jpg


Thanks for taking the time to look through and assist as it is greatly appreciated. I'm by no means an expert however I attempted to follow the best practices outlayed as I migrated from xpenology one of which I know is currently wrong is the need to switch to mirrors from raidz however I believe my main and current issue lies elsewhere.



My Specs
Supermicro H8DGU
16 Core x2 Opteron 6378 (Dual Socket)
ESXi 7
96g ddr3
LSI 6gbps SAS HBA (I believe in IT mode but would need to confirm)
Mellanox MT27500 (Connectx-3) 10G dual port fiber nic

Truenas 12.0-U8 VM
12vCPUs
64GB Ram
HBA Passthrough all 32 bays
4 Nics (VMXNet 3)

10g brocade 24port fiber switch for interconnects
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
My pool configuration is as follows with 3 SSd's for SLOG and l2ARC, spinning rust sas drives for the rest
I've got some news for you...

You aren't running L2ARC, but it seems you have a special VDEV (for Metadata?).

Depending on the hardware (what SSDs you're using) your SLOG may be part of the problem if it's not up to the task.

You're not going to be improving anything with 4 virtual NICs for one user.

Since you've already identified the biggest problem (using RAIDZ1 for an IOPS-driven block storage/sync writes workload), I don't think there's much more I can advise you on.
 

Frollo

Cadet
Joined
Feb 7, 2022
Messages
8
Thank you @sretalla for the response. You mentioned I'm not running L2ARC... I thought that was how it was represented in the drive list with the "special" vdev. The 3 ssd's are Samsung 850s (540 MB/s max)

With respect to the 4 virtual nics I agree however the intent was never for me to be the only one using the datastore in this way.

The raidz1 performance previously had been perfectly adequate however after the update i'm experiencing a 5th - 10th of the performance I had previously become accustomed to along with the insane latency. I can certainly see a misconfigured l2arc / slog contributing to that however per my understanding (and the guides I followed in the beginning) I believe it to be correct.
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
468
You might want to check the SSDs with smart, to see how much write endurance they have left - and how much space (zpool iostat -v) they have left. sVDEV can be quite punishing with writes. Got a system using Optanes for that duty and with 300+TB written to them in under a year I'm happy for that choice.
If your sVDEVs run out of space you'll overflow to the spinners.
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
468
oh, btw. - good to see you're keeping the same amount of redundancy on the sVDEVs as your data VDEVs
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
You mentioned I'm not running L2ARC... I thought that was how it was represented in the drive list with the "special" vdev. The 3 ssd's are Samsung 850s (540 MB/s max)
L2ARC will show up as cache, not special. And also shouldn't be mirrored (not sure if you even can mirror it).

What you have is a special VDEV that is critical to your pool and can't be removed. Those Samsung SSDs are OK for that task.

That model of Samsung SSD is terrible for SLOG and you may find improvement just by removing it.

If you're serious about getting 4x 10G, you'll need to invest in Optane... check out the "finding the best SLOG" thread.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Your spec hasn't got any disks in it
 

Frollo

Cadet
Joined
Feb 7, 2022
Messages
8

Frollo

Cadet
Joined
Feb 7, 2022
Messages
8
That model of Samsung SSD is terrible for SLOG and you may find improvement just by removing it.
Done, however performance is the same there. I'm seeing some mentions of 12 nerfing iscsi perf may need to roll back
 

Frollo

Cadet
Joined
Feb 7, 2022
Messages
8
@c77dk You might want to check the SSDs with smart, to see how much write endurance they have left - and how much space (zpool iostat -v) they have left.

Here is the output of that
Screenshot 2022-02-08 103512.jpg
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
468
Here is the output of that
View attachment 52953
sVDEV not full so that part seems ok.

Then you might want to check how busy your disks are when you see the issue (gstat) - if you're lucky one of them stands out like a sore thumb and needs replacing
 

Frollo

Cadet
Joined
Feb 7, 2022
Messages
8
So here is the picture after running the same load test. Primarily it is hitting da3 (which is the newly configured l2arc) and after it flushes to the drives thus the high busy time across the other drives. That busy time is extremely small and was difficult to capture. The latter picture is the gstat when streaming a game with high latency and almost 100% drive usage as reported by windows. Another curious question is that it does not seem to be using the l1arc at all as I don't get any bursts of speed or anything.

Screenshot 2022-02-09 070700.jpg


Screenshot 2022-02-09 071121.jpg


Screenshot 2022-02-09 071419.jpg
 

Frollo

Cadet
Joined
Feb 7, 2022
Messages
8
After finding this post and jgrecos suggestions I reduced CPU cores from 12 to 8 and reserved them (it wouldn't let me set latency mode to high which is frustrating since it also changes how the vmnics function) and it looks like performance is back up where i was expecting it to be. Thank you all!

Screenshot 2022-02-11 140648.jpg
 

Attachments

  • Screenshot 2022-02-11 140648.jpg
    Screenshot 2022-02-11 140648.jpg
    101.9 KB · Views: 136
Top