Slow Performance, High Latency, Speed drops Truenas 12

Frollo · Feb 7, 2022

Greetings yall, hoping to get some help wrapping my head around the issue I'm seeing. I have always been having issues similar to the experience in the following post

SOLVED - Transfer speed drops after the first trasfer

Good afternoon to everyone, I'am new to FreeNAS and I'm struggling with some strange network behaviour and low trasfer speed. I upgraded my old configuration to meet a "decent" quality for running a home backup server without any big expectations. I use Dell T1600 workstation with Xeon 31225...

www.truenas.com

which I have attributed to hardware or some other inadequate configuration but performance was tolerable as I could burst to 10GB writes now and again and sustain ~6GB/s until the crash in speed. After updating to 12.0-U8 I'm experiencing worsening performance that I would like some assistance with.

This is a home lab environment primarily hosting VM storage and Game storage (largely steam) via iscsi due to game mounting requirements and issues. I will out myself right out of the gate and state that the zvols are raidz1 (3 drives) and I know that it is not ideal for block storage as per @jgreco and his wonderful posts and guides. While it is a less than ideal state I haven't procured the drive(s) to move everything over and blow away the pool yet. That said the network performance is abysmal atm between the NAS and Windows 11 over iscsi (jumbo frames or not). Based on some of the guides and posts I have seen of others troubleshooting I am posting the performance across the drives which is showing I should be able to hit 10g speeds (which I had prior to 12.0-U8)

I'm sure we would want better for block storage however largely the speed has been acceptable.

My pool configuration is as follows with 3 SSd's for SLOG and l2ARC, spinning rust sas drives for the rest

Pool configuration is here and note it says 24% which is roughly accurate given the 1vm in it's datastore and the 4tb of games on the iscsi share

As previously mentioned I have 10gbe with 4 virtual nics for MPIO however since I'm the only one using it to the iscsi share and esxi is connected once to the vm datastore I can and likely will trim that back if not needed

My tuneables are as follows

Attached are some graphs for experienced performance when attempting to do any work on the nas be it a single file move, stream a file, benchmark, etc. I'm at a loss as to what would have changed or why the performance is tanking. Network latency is shooting through the roof on any file access and when the connection to the nas hits 99%-100 latency can be anywhere from 100-1000ms making everything come to a screeching halt. While this graph is showing 104 MB/s that is actually a luxury due to the forthcoming results from a crystalmark benchmark and likely due to larger file xfer.

Here in all it's glory is what I'm experiencing from a file xfer perspective. I had previously experienced this at almost 1GB/s in the SEQ1 portions however given my setup I never expected it to be that high nor consistently that high. That said read performance is now terrible and is overshadowed by the current write performance which is backwards from what i have come to experience. No changes to hardware, or software other than the newest 12.0-U8 updated.

Thanks for taking the time to look through and assist as it is greatly appreciated. I'm by no means an expert however I attempted to follow the best practices outlayed as I migrated from xpenology one of which I know is currently wrong is the need to switch to mirrors from raidz however I believe my main and current issue lies elsewhere.

My Specs

Supermicro H8DGU
16 Core x2 Opteron 6378 (Dual Socket)
ESXi 7
96g ddr3
LSI 6gbps SAS HBA (I believe in IT mode but would need to confirm)
Mellanox MT27500 (Connectx-3) 10G dual port fiber nic

Truenas 12.0-U8 VM
12vCPUs
64GB Ram
HBA Passthrough all 32 bays
4 Nics (VMXNet 3)

10g brocade 24port fiber switch for interconnects

sretalla · Feb 8, 2022

Frollo said:
My pool configuration is as follows with 3 SSd's for SLOG and l2ARC, spinning rust sas drives for the rest

I've got some news for you...

You aren't running L2ARC, but it seems you have a special VDEV (for Metadata?).

Depending on the hardware (what SSDs you're using) your SLOG may be part of the problem if it's not up to the task.

You're not going to be improving anything with 4 virtual NICs for one user.

Since you've already identified the biggest problem (using RAIDZ1 for an IOPS-driven block storage/sync writes workload), I don't think there's much more I can advise you on.

Frollo · Feb 8, 2022

Thank you @sretalla for the response. You mentioned I'm not running L2ARC... I thought that was how it was represented in the drive list with the "special" vdev. The 3 ssd's are Samsung 850s (540 MB/s max)

With respect to the 4 virtual nics I agree however the intent was never for me to be the only one using the datastore in this way.

The raidz1 performance previously had been perfectly adequate however after the update i'm experiencing a 5th - 10th of the performance I had previously become accustomed to along with the insane latency. I can certainly see a misconfigured l2arc / slog contributing to that however per my understanding (and the guides I followed in the beginning) I believe it to be correct.

c77dk · Feb 8, 2022

You might want to check the SSDs with smart, to see how much write endurance they have left - and how much space (zpool iostat -v) they have left. sVDEV can be quite punishing with writes. Got a system using Optanes for that duty and with 300+TB written to them in under a year I'm happy for that choice.
If your sVDEVs run out of space you'll overflow to the spinners.

c77dk · Feb 8, 2022

oh, btw. - good to see you're keeping the same amount of redundancy on the sVDEVs as your data VDEVs

sretalla · Feb 8, 2022

Frollo said:
You mentioned I'm not running L2ARC... I thought that was how it was represented in the drive list with the "special" vdev. The 3 ssd's are Samsung 850s (540 MB/s max)

L2ARC will show up as cache, not special. And also shouldn't be mirrored (not sure if you even can mirror it).

What you have is a special VDEV that is critical to your pool and can't be removed. Those Samsung SSDs are OK for that task.

That model of Samsung SSD is terrible for SLOG and you may find improvement just by removing it.

If you're serious about getting 4x 10G, you'll need to invest in Optane... check out the "finding the best SLOG" thread.

NugentS · Feb 8, 2022

Your spec hasn't got any disks in it

Frollo · Feb 8, 2022

NugentS said:
Your spec hasn't got any disks in it

Fixed

Frollo · Feb 8, 2022

That model of Samsung SSD is terrible for SLOG and you may find improvement just by removing it.

Done, however performance is the same there. I'm seeing some mentions of 12 nerfing iscsi perf may need to roll back

Frollo · Feb 8, 2022

@c77dk You might want to check the SSDs with smart, to see how much write endurance they have left - and how much space (zpool iostat -v) they have left.

Here is the output of that

c77dk · Feb 9, 2022

Frollo said:
Here is the output of that
View attachment 52953

sVDEV not full so that part seems ok.

Then you might want to check how busy your disks are when you see the issue (gstat) - if you're lucky one of them stands out like a sore thumb and needs replacing

Frollo · Feb 9, 2022

So here is the picture after running the same load test. Primarily it is hitting da3 (which is the newly configured l2arc) and after it flushes to the drives thus the high busy time across the other drives. That busy time is extremely small and was difficult to capture. The latter picture is the gstat when streaming a game with high latency and almost 100% drive usage as reported by windows. Another curious question is that it does not seem to be using the l1arc at all as I don't get any bursts of speed or anything.

Frollo · Feb 11, 2022

After finding this post and jgrecos suggestions I reduced CPU cores from 12 to 8 and reserved them (it wouldn't let me set latency mode to high which is frustrating since it also changes how the vmnics function) and it looks like performance is back up where i was expecting it to be. Thank you all!

Important Announcement for the TrueNAS Community.

Slow Performance, High Latency, Speed drops Truenas 12

Frollo

Cadet

SOLVED - Transfer speed drops after the first trasfer

sretalla

Powered by Neutrality

Frollo

Cadet

c77dk

Patron

c77dk

Patron

sretalla

Powered by Neutrality

NugentS

MVP

Frollo

Cadet

Frollo

Cadet

Frollo

Cadet

c77dk

Patron

Frollo

Cadet

Frollo

Cadet

Attachments

Similar threads