TrueNAS taking long pauses during transfers

tn2100 · Mar 8, 2022

Been trying to figure this out for a while now, and still unable to, so any help on where to look next would be appreciated.

Here's the problem statement:
When running a replication task (ssh+netcat) there are long pauses during the transfer. I can see the network spike up to around 100+ MB/s, stay consistent, then drop to nothing. It stays in this paused state for a random amount of time, typically around 30 seconds or more and then wakes up and is transferring at full speed again for all of 5 seconds.
There are no errors, and the CPU on the destination TrueNAS is churning away doing something. When I run iostat on the destination side, I can see all 15 drives are busy at around 80-90% utilization, primarily with writes. When transfer kicks in again for roughly 5 seconds, the utilization on all 15 drives jumps to almost 100% on each drive.
I assume TrueNAS is doing something that I'm not knowledgeable enough to notice/detect... but what is it?? Sometimes it will run for minutes at a time without issue, but mostly it just does what I described above. Running zpool status shows no errors and it's not scrubbing or resilvering... the only indication that it's doing something is the CPU utilization and the iostat is showing all 15 drives churning away on something.

Here's what is staying the same:

The ZFS pools are the same since the beginning. One is 15 x 3TB SAS drives, and the other is 15 x 2TB SAS drives.
ZFS pools are running raidz2
ZFS pools are sitting around 58% utlized on the array with 3TB drives(source), and 78% utlized on the array with 2TB drives(destination)
Both pools are in separate storage shelves - KTN-STL3
Both pools have a mix of datasets that have some lz4 compression, encryption, and deduplication (dedupe is only covering 2 TB of data)

Here's what I have changed trying to solve the problem:

New servers (DL560, DL380, DL360p, SuperMicro 8x????) including virtualizing with Proxmox
Adjusted memory from 8GB to 24GB
Tried a replacement KTN-STL3
Swapped out SAS controllers: SAS2008, SAS2308, SAS2208, and whatever the HP SAS controller is)
Have tried TrueNAS Core and SCALE
Replaced all network cables, switches, network cards
Changed boot device from HDD, to SSD to USB

Morris · Mar 15, 2022

"When I run iostat on the destination side, I can see all 15 drives are busy at around 80-90% utilization, primarily with writes. "

Your drives are busy. You can't transfer faster than they can go

tn2100 · Mar 15, 2022

I'm getting about 5-10 MB/s throughput for hours at a time. One drive can perform at 20x this speed or more, running with raidz they should be performing far beyond what a single drive can. The drives are doing something, and it's not writing the data that is being transferred, they are busy doing something else. I just don't know how to determine what that something else is.

jgreco · Mar 15, 2022

tn2100 said:
deduplication (dedupe is only covering 2 TB of data)

Well there's your problem.

tn2100 said:
Adjusted memory from 8GB to 24GB

Oh my god. You have maybe 33TB of available pool space, and typical guidance would be to have about 5x33 -> 160GB to 192GB of RAM for dedup.

Please do go read up on dedup. Your DDT's are killing your system.

tn2100 · Mar 15, 2022

Sorry, I must not have explained it correctly above. I do not have 33TB or anything close to that which is undergoing dedup, I have a dataset that is set to dedup and it only has about 2 TB of data in it.

Based on what I have read, I shouldn't need a crazy amount of RAM for only 2 TB of dedup data. Here's the TrueNas docs covering RAM recommendations for dedup: https://www.truenas.com/docs/references/zfsdeduplication/#ram

After reading that, you may be onto something in regard to dedup. The destination server where the replication process is pushing to, which is where the bottleneck is occurring, is typically turned off. I only start it to perform a backup replication task and then shut it down when it completes.

Maybe the dedup cache isn't loaded into memory yet and it's churning away trying to build that cache as it's receiving data from the replication task.

Few questions in my mind right now in case anyone has a quick answer...

If I disable dedup on a dataset, will it simply stop performing dedup when writing to that dataset? If so my next troubleshooting step would be to do this.
When replicating a pool from one host to another, is it attempting to perform the dedup on the destination system? My assumption was that the dedup had already been performed on the source system and a replication of that dataset wouldn't cause another dedup to occur during replication.

jgreco · Mar 15, 2022

tn2100 said:
I do not have 33TB or anything close to that which is undergoing dedup, I have a vdev that is set to dedup and it only has about 2 TB of data in it.

dedup has pool-wide scope. This makes things ... complicated.

tn2100 said:
Based on what I have read, I shouldn't need a crazy amount of RAM for only 2 TB of dedup data.

Well, perhaps, but, in practice, that doesn't seem to pan out as well as people would like.

tn2100 said:
Here's the TrueNas docs covering RAM recommendations for dedup: https://www.truenas.com/docs/references/zfsdeduplication/#ram

Gee, thanks. I had ... no clue. heh.

Look, I understand the desire to interpret words optimistically, and I'm even fine with saying that iXsystems likes to write in a manner that leads to optimistic interpretations. But look at this:

Pools suitable for deduplication, with deduplication ratios of 3x or more (data can be reduced to a third or less in size), might only need 1-3 GB of RAM per 1 TB of data

The operative words here are "might only", and they're 1000% correct, it MIGHT only, but it might ALSO need 5GB-per-TB, or there are even ways to make it need much more than that. Pools with modest dedup ratios are a trainwreck for DDT ARC consumption.

When the system does not contain sufficient RAM, it cannot cache DDT in memory when read and system performance can decrease.

"can decrease" is more like "performance runs into a brick wall."

So I will happily concede that this is more art than science, because the real way to determine the amount of RAM needed is to look at the amount of DDT and ARC being used, and base it on that. But typical experience suggests starting at 5GB per TB is a really swell starting point.

tn2100 said:
Maybe the dedup cache isn't loaded into memory yet and it's churning away trying to build that cache as it's receiving data from the replication task.

Yup.

tn2100 said:
f I disable dedup on a vdev, will it simply stop performing dedup when writing to that vdev

Basically once you've enabled dedup, the only way to get rid of it is to tear down the pool. Disabling dedup still leaves you with a mess, and even deleting the dedup'ed data isn't really the same..

You might want to go have a read-through of

My experiments in building a home server capable of handling fast + consistent deduplication

AIM: To help people looking at deduplication on TrueNAS 12+, what I've found on the way making it work on mine. On sustained mixed loads, such as 50GB+ file copies and multiple transfers, using TrueNAS 12 with a deduped pool and default config...

www.truenas.com

which is generally very insightful.

tn2100 · Mar 15, 2022

jgreco - thank you

I had no idea that dedup would be this impactful to the overall pool. I am using it on the dataset where I send my daily backups which have a high amount of duplication, but I have other ways/tools to recover that without using zfs dedup.

I will run a full scrub on the backup for sanity and then rebuild my main pool without dedup. Rather than a typical ZFS restore, I'll have to use an rsync job to restore it to avoid replicating the pool settings and snapshots is my guess.

I could probably still use dedup for backups if I just create a separate pool of limited size, and if it doesn't work out, then I can easily wipe out that small pool vs the larger one I'm having to rebuild now.

Will update this thread after I rebuild and restore the pool. I guess it's time to install that QSFP card before attempting this. :)

sretalla · Mar 16, 2022

tn2100 said:
I had no idea that dedup would be this impactful to the overall pool. I am using it on the vdev where I send my daily backups which have a high amount of duplication, but I have other ways/tools to recover that without using zfs dedup.

Just a little poke to remind you to check up on your terminology. You almost certainly don't mean VDEV. Maybe you're talking about a dataset or a pool.

tn2100 · Mar 16, 2022

sretalla said:
Just a little poke to remind you to check up on your terminology. You almost certainly don't mean VDEV. Maybe you're talking about a dataset or a pool.

Thanks - it was late. :) Updated my posts to use dataset instead.

tn2100 · Mar 21, 2022

Had some delays, but so far the new pool without even a hint of dedup has been working excellent. The replication tasks to restore the data/snapshots (while not retaining dataset configuration) has been chugging along smooth as silk without any pausing. Running a few restore jobs at a time to keep it busy and seeing around 300-500 MB/s throughput over the network.

I have learned my lesson about dedup and will keep it out of my pools from now on. :) Thank you for the help!!

artlessknave · Mar 21, 2022

tn2100 said:
daily backups which have a high amount of duplication

compression can make a huge saving if the data is very similar, without the overhead of dedup. lz4 is basically no performance hit, while gzip and zstd can give large benefits, particularly for mostly static data

tn2100 · Mar 21, 2022

Yes, I leave lz4 on everything as it's almost free. I don't bother too much with any other compression format as the do consume more cpu and not a huge difference in compression.

artlessknave · Mar 21, 2022

it depends on what it is compressing and what its being used for. zstd is supposed to be very fast to read, but with "meh" write speed, so if you write rarely it might be best.
if you write AND read rarely, gzip could be useful.
compression wont help much with already compressed stuff, like most audio and movie files, but if you have a huge log archive on storage you could see high ratios.

Important Announcement for the TrueNAS Community.

TrueNAS taking long pauses during transfers

tn2100

Cadet

Morris

Contributor

tn2100

Cadet

jgreco

Resident Grinch

tn2100

Cadet

jgreco

Resident Grinch

My experiments in building a home server capable of handling fast + consistent deduplication

tn2100

Cadet

sretalla

Powered by Neutrality

tn2100

Cadet

tn2100

Cadet

artlessknave

Wizard

tn2100

Cadet

artlessknave

Wizard

Similar threads

Important Announcement for the TrueNAS Community.

TrueNAS taking long pauses during transfers

Cadet

Contributor

Cadet

Resident Grinch

Cadet

Resident Grinch

Cadet

Powered by Neutrality

Cadet

Cadet

Wizard

Cadet

Wizard

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "TrueNAS taking long pauses during transfers"

Similar threads