TN 13.0u3.1 write performance issues

Mullerhawk

Cadet
Joined
Nov 14, 2022
Messages
7
Hi,

I have a TrueNas setup consisting of 12pcs of 10TB Seagate X10 7200rpm SAS disks in a HP P2000 SAS cabinet connected to a LSI 2308 adapter that has been working flawlessly until now. I have ran both scrub tasks and S.M.A.R.T. long scans without any errors or issues. The machine is setup running on a HPE DL 380 Gen9 with the adapter passed through VMware ESXi 6.7 without issues for 3-4 month now.

Pool type: Z2
Pool size: 82TB
Pool size used: 23% (19TB)
RAM allocated for TrueNAS: 128GB
CPU allocated for TrueNAS: 8 vcpu

What has started happening now is that I'm getting a consistent 850-950Mbit/s (almost 10Gbe speeds) read performance, but only 20-25Mbit/s write performance. It starts out strong writing, but after writing 2-3GB is starts slowing down to a constistant abysmal 20-25Mbit/s and stays there for the rest of the write duration.

When I set this server up I transfered the ~18-19TB of data to TrueNAS without any issues, and there's only been about 1-2TB of large files added since the installation. Performance wise I was getting 200-250Mbit/s write performance when I transfered the files so I don't get what the issue is right now.

I have checked and changed the network adapter types, no differense. I have written files between VM's to exclude possible LAN issues but the issue is the same. I'm running out of ideas here and help is appreciated!
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
maybe look at your fragmentation (shouldn't be an issue with a pool only 23% full... still worth a look).

zpool get fragmentation poolname

How do you go with fio on that pool, can you still get write speeds for very large files to stay fast locally?
 

Mullerhawk

Cadet
Joined
Nov 14, 2022
Messages
7
maybe look at your fragmentation (shouldn't be an issue with a pool only 23% full... still worth a look).

zpool get fragmentation poolname

How do you go with fio on that pool, can you still get write speeds for very large files to stay fast locally?
Fragmentation is 0% on both pools (boot and my large pool). I think you need to clarify "FIO" as it is not something im familiar with. But the pool itself is setup like this:
1671203176366.png
 

Mullerhawk

Cadet
Joined
Nov 14, 2022
Messages
7
Found some information about FIO as a benchmark tester. But I'd appreciate some command line parameters to do a local speed measurement with FIO to see where the bottleneck is. Thanks in advance :)
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
Running it something like this would be a start:

fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=write --size=50g --io_size=1500g --blocksize=128k --iodepth=16 --direct=1 --numjobs=16 --runtime=120 --group_reporting

cd to a location on your pool first.
 

Mullerhawk

Cadet
Joined
Nov 14, 2022
Messages
7
Ok, managed to figure it out. Here's the result:

1671207132662.png


But what annoys me is that checking on the disk performance it seems the load is put on the first three disks in the pool DA2 to DA4, the other disks are not experiencing anything near 100% busy. But if you check disk I/O it spreads the load on all disks.. I don't get it :(. See below:

1671207275267.png

1671207306080.png

1671207365160.png
 
Last edited:

Mullerhawk

Cadet
Joined
Nov 14, 2022
Messages
7
Because I've seen it recently in a couple threads, I'll ask - are you using deduplication?
Nope, no dedup, no snapshots or anything fancy. Just a basic TrueNAS server with just a Z2 pool with twelve 10G disks created mostly with default settings. The majority of the load is with CIFS, but a small 2TB volume is created and mounted as I need iSCSI block storage for backups. But same issue there, performance drops after writing 3-5GB down to ~20MBs if I copy data there.
 
Last edited:

Mullerhawk

Cadet
Joined
Nov 14, 2022
Messages
7
Update:

I rebooted all my switches and routers so something wierd was going on with the ARP cache in my eqipment that only affected TrueNAS. I wish I could say what it was, but I really can't. Strangerly enough all other network equipments, servers and so forth that was placed on the same network or VMware server worked just fine and could keep up network wise. Even if I copied files from a Window Server to an SMB share (or iSCSI) on the NAS that is located on the same VMware hypervisor the symptom erupted after 3-5GB of data. After rebooting the equipment everything is back to normal. And trust me, I have rebooted the servers, hypervisor, powercycled the disk cabinet + servers and it still persisted in the same strange way. This was just a last ditch effort before getting christmas drunk :wink:. So I guess everything is good again *knock on wood*
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
That's a bit unusual in that a network device reboot addressed this, because the graphs suggested that we have a peek at the SMART details for da2 through da4, as they're showing that excessive "Busy" time in your charts.
 

Mullerhawk

Cadet
Joined
Nov 14, 2022
Messages
7
That's a bit unusual in that a network device reboot addressed this, because the graphs suggested that we have a peek at the SMART details for da2 through da4, as they're showing that excessive "Busy" time in your charts.
Sure is.. Im running new "Full smart" tests right now. But after rebooting perefial equipment the disk status and busy status looks about identical over the board. But looking at older smart tests they show about the same usage time and statistics (no errors).
 
Top