TN 13.0u3.1 write performance issues

Mullerhawk · Dec 16, 2022

Hi,

I have a TrueNas setup consisting of 12pcs of 10TB Seagate X10 7200rpm SAS disks in a HP P2000 SAS cabinet connected to a LSI 2308 adapter that has been working flawlessly until now. I have ran both scrub tasks and S.M.A.R.T. long scans without any errors or issues. The machine is setup running on a HPE DL 380 Gen9 with the adapter passed through VMware ESXi 6.7 without issues for 3-4 month now.

Pool type: Z2
Pool size: 82TB
Pool size used: 23% (19TB)
RAM allocated for TrueNAS: 128GB
CPU allocated for TrueNAS: 8 vcpu

What has started happening now is that I'm getting a consistent 850-950Mbit/s (almost 10Gbe speeds) read performance, but only 20-25Mbit/s write performance. It starts out strong writing, but after writing 2-3GB is starts slowing down to a constistant abysmal 20-25Mbit/s and stays there for the rest of the write duration.

When I set this server up I transfered the ~18-19TB of data to TrueNAS without any issues, and there's only been about 1-2TB of large files added since the installation. Performance wise I was getting 200-250Mbit/s write performance when I transfered the files so I don't get what the issue is right now.

I have checked and changed the network adapter types, no differense. I have written files between VM's to exclude possible LAN issues but the issue is the same. I'm running out of ideas here and help is appreciated!

sretalla · Dec 16, 2022

maybe look at your fragmentation (shouldn't be an issue with a pool only 23% full... still worth a look).

zpool get fragmentation poolname

How do you go with fio on that pool, can you still get write speeds for very large files to stay fast locally?

Mullerhawk · Dec 16, 2022

sretalla said:
maybe look at your fragmentation (shouldn't be an issue with a pool only 23% full... still worth a look).

zpool get fragmentation poolname

How do you go with fio on that pool, can you still get write speeds for very large files to stay fast locally?

Fragmentation is 0% on both pools (boot and my large pool). I think you need to clarify "FIO" as it is not something im familiar with. But the pool itself is setup like this:

Mullerhawk · Dec 16, 2022

Found some information about FIO as a benchmark tester. But I'd appreciate some command line parameters to do a local speed measurement with FIO to see where the bottleneck is. Thanks in advance :)

sretalla · Dec 16, 2022

Running it something like this would be a start:

fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=write --size=50g --io_size=1500g --blocksize=128k --iodepth=16 --direct=1 --numjobs=16 --runtime=120 --group_reporting

cd to a location on your pool first.

HoneyBadger · Dec 16, 2022

Because I've seen it recently in a couple threads, I'll ask - are you using deduplication?

Mullerhawk · Dec 16, 2022

Ok, managed to figure it out. Here's the result:

But what annoys me is that checking on the disk performance it seems the load is put on the first three disks in the pool DA2 to DA4, the other disks are not experiencing anything near 100% busy. But if you check disk I/O it spreads the load on all disks.. I don't get it :(. See below:

Mullerhawk · Dec 16, 2022

HoneyBadger said:
Because I've seen it recently in a couple threads, I'll ask - are you using deduplication?

Nope, no dedup, no snapshots or anything fancy. Just a basic TrueNAS server with just a Z2 pool with twelve 10G disks created mostly with default settings. The majority of the load is with CIFS, but a small 2TB volume is created and mounted as I need iSCSI block storage for backups. But same issue there, performance drops after writing 3-5GB down to ~20MBs if I copy data there.

Mullerhawk · Dec 16, 2022

Update:

I rebooted all my switches and routers so something wierd was going on with the ARP cache in my eqipment that only affected TrueNAS. I wish I could say what it was, but I really can't. Strangerly enough all other network equipments, servers and so forth that was placed on the same network or VMware server worked just fine and could keep up network wise. Even if I copied files from a Window Server to an SMB share (or iSCSI) on the NAS that is located on the same VMware hypervisor the symptom erupted after 3-5GB of data. After rebooting the equipment everything is back to normal. And trust me, I have rebooted the servers, hypervisor, powercycled the disk cabinet + servers and it still persisted in the same strange way. This was just a last ditch effort before getting christmas drunk

. So I guess everything is good again *knock on wood*

HoneyBadger · Dec 16, 2022

That's a bit unusual in that a network device reboot addressed this, because the graphs suggested that we have a peek at the SMART details for da2 through da4, as they're showing that excessive "Busy" time in your charts.

Mullerhawk · Dec 16, 2022

HoneyBadger said:
That's a bit unusual in that a network device reboot addressed this, because the graphs suggested that we have a peek at the SMART details for da2 through da4, as they're showing that excessive "Busy" time in your charts.

Sure is.. Im running new "Full smart" tests right now. But after rebooting perefial equipment the disk status and busy status looks about identical over the board. But looking at older smart tests they show about the same usage time and statistics (no errors).

Important Announcement for the TrueNAS Community.

TN 13.0u3.1 write performance issues

Mullerhawk

Cadet

sretalla

Powered by Neutrality

Mullerhawk

Cadet

Mullerhawk

Cadet

sretalla

Powered by Neutrality

HoneyBadger

actually does care

Mullerhawk

Cadet

Mullerhawk

Cadet

Mullerhawk

Cadet

HoneyBadger

actually does care

Mullerhawk

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

TN 13.0u3.1 write performance issues

Cadet

Powered by Neutrality

Cadet

Cadet

Powered by Neutrality

actually does care

Cadet

Cadet

Cadet

actually does care

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "TN 13.0u3.1 write performance issues"

Similar threads