Slow write speed, drop after a few seconds

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Thanks very much,
How would these look like from the pool and dataset point of view?
Can all of them be part of one pool and combine all the spaces?

Yes, its all 1 pool with all datasets on them. That's the beauty of ZFS expandability.
You can start with 3 vdevs and then add additional vdevs over time if capacity is short.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
If the first few gig is copied into the ram, is there anyway to increase RAM used by TrueNAS and improve write speed for a few gigs more!?

I tried adding an SSD as Cache and log but addition of neither made any difference!
As @morganL mentioned, knowing your workload is important - it's certainly possible to manually increase the amount of "data copied into RAM" or "dirty data" as it's called in ZFS, but if the general everyday workload is more geared towards small files and random I/O, you're likely to get better results from increasing the pool width and changing to mirrors instead of RAIDZ as mentioned.

You've accurately identified the issue though in terms of the first portion of those large files being "copied into RAM" though - by default, this is limited to 4GB in OpenZFS with a throttle starting to come in at 60% of that, which is why you see that rapid ingest and then the slope down to the ~200MB/s that your pool was capable of writing in a steady state.

Increasing the amount of RAM allowed for "pending writes" will of course create the potential of robbing your read cache by the same amount, which could hurt your VM performance if it's merrily running along getting its data served from RAM, and then the VM data gets evicted because the system is accepting large_file.bin at 10Gbps line speed, and needs to hold onto it in RAM until it's been fully spun off to disk in the background.

The reality is that without having assigning enough RAM for that "dirty data" in to hold the largest possible file you'll ever load (eg: 20GB) you will at some point have to pull up on the handbrake and slow things down. You can create bad edge cases as well if you have a network that's much faster than the pool speed - if it takes 20 seconds to "fill your pending writes" and 200 seconds to "empty them" then anything in that 180 seconds difference runs the risk of getting delayed. There's a metaphor about "trying to water a potted plant with a firehose" in my signature that seems like it's on the money here.
 

wonders

Dabbler
Joined
Jun 21, 2011
Messages
23
Thank you very much for the detailed explanation! Very interesting how ZFS works!
I have ordered my caddies and hopefully soon I can install all sas drives as @morganL suggested
 

wonders

Dabbler
Joined
Jun 21, 2011
Messages
23
@morganL I just got my hard drives and an additional H220 HBA Card, installed them on the server exactly the way you suggested (7 mirrors and 2 hot spare) and wow I am getting solid and consistent 1.1 GB/s for sequential Read and Write and for Random IO between 12 and 15 MB/s which is amazing comparing to the last time which was about 100 KB/s
Although I get over 60 MB/s if I use Robocopy multithreaded, which is basically what I will be using to backup my projects regularly!

Thank you very much for all your help!
 
Last edited:

wonders

Dabbler
Joined
Jun 21, 2011
Messages
23
Okay I talked too early!
Yesterday everything was so smooth and stable, I shut down the server in the evening and this morning when turned it on again, noticed that something is causing everything to be so slow, the same file is now taking over 3 minutes to copy!
It starts with good speed then it just drops to almost zero, get stuck there for over 3 minutes then it jumps up again but at a much lower speed than yesterday (around 600 MB/s)
I haven't changed anything, the pool is the same and there is only one dataset, sync is set to default(standard), I literally didn't change anything!
Checked the network and it seems table and running at full 9.5 Gb/s (iperf3)

Checked the server, everything seems to be fine in terms of power and hardware, disks seems to be healthy!

1660844917856.png


Another example, something is wrong.....

1660845735109.png




UPDATE:

Found the bugger, DEDUP, I had enabled dedup, disabled it and it just made everything stable and blazing fast!
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
UPDATE:

Found the bugger, DEDUP, I had enabled dedup, disabled it and it just made everything stable and blazing fast!

Yep, that will definitely cause the hard-stall issues you were seeing, and I suspect there are more lurking in the background

Can you paste the output of:

zpool list

so we can see what kind of reduction rate you're getting from deduplication, and

zpool status -D

to see what that reduction is costing you in terms of memory footprint?
 
Top