TrueNAS taking long pauses during transfers

tn2100

Cadet
Joined
Mar 8, 2022
Messages
9
Been trying to figure this out for a while now, and still unable to, so any help on where to look next would be appreciated.

Here's the problem statement:
When running a replication task (ssh+netcat) there are long pauses during the transfer. I can see the network spike up to around 100+ MB/s, stay consistent, then drop to nothing. It stays in this paused state for a random amount of time, typically around 30 seconds or more and then wakes up and is transferring at full speed again for all of 5 seconds.
There are no errors, and the CPU on the destination TrueNAS is churning away doing something. When I run iostat on the destination side, I can see all 15 drives are busy at around 80-90% utilization, primarily with writes. When transfer kicks in again for roughly 5 seconds, the utilization on all 15 drives jumps to almost 100% on each drive.
I assume TrueNAS is doing something that I'm not knowledgeable enough to notice/detect... but what is it?? Sometimes it will run for minutes at a time without issue, but mostly it just does what I described above. Running zpool status shows no errors and it's not scrubbing or resilvering... the only indication that it's doing something is the CPU utilization and the iostat is showing all 15 drives churning away on something.

Here's what is staying the same:
  • The ZFS pools are the same since the beginning. One is 15 x 3TB SAS drives, and the other is 15 x 2TB SAS drives.
  • ZFS pools are running raidz2
  • ZFS pools are sitting around 58% utlized on the array with 3TB drives(source), and 78% utlized on the array with 2TB drives(destination)
  • Both pools are in separate storage shelves - KTN-STL3
  • Both pools have a mix of datasets that have some lz4 compression, encryption, and deduplication (dedupe is only covering 2 TB of data)
Here's what I have changed trying to solve the problem:
  • New servers (DL560, DL380, DL360p, SuperMicro 8x????) including virtualizing with Proxmox
  • Adjusted memory from 8GB to 24GB
  • Tried a replacement KTN-STL3
  • Swapped out SAS controllers: SAS2008, SAS2308, SAS2208, and whatever the HP SAS controller is)
  • Have tried TrueNAS Core and SCALE
  • Replaced all network cables, switches, network cards
  • Changed boot device from HDD, to SSD to USB
 

Morris

Contributor
Joined
Nov 21, 2020
Messages
120
"When I run iostat on the destination side, I can see all 15 drives are busy at around 80-90% utilization, primarily with writes. "

Your drives are busy. You can't transfer faster than they can go
 

tn2100

Cadet
Joined
Mar 8, 2022
Messages
9
I'm getting about 5-10 MB/s throughput for hours at a time. One drive can perform at 20x this speed or more, running with raidz they should be performing far beyond what a single drive can. The drives are doing something, and it's not writing the data that is being transferred, they are busy doing something else. I just don't know how to determine what that something else is.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
deduplication (dedupe is only covering 2 TB of data)

Well there's your problem.

Adjusted memory from 8GB to 24GB

Oh my god. You have maybe 33TB of available pool space, and typical guidance would be to have about 5x33 -> 160GB to 192GB of RAM for dedup.

Please do go read up on dedup. Your DDT's are killing your system.
 

tn2100

Cadet
Joined
Mar 8, 2022
Messages
9
Sorry, I must not have explained it correctly above. I do not have 33TB or anything close to that which is undergoing dedup, I have a dataset that is set to dedup and it only has about 2 TB of data in it.

Based on what I have read, I shouldn't need a crazy amount of RAM for only 2 TB of dedup data. Here's the TrueNas docs covering RAM recommendations for dedup: https://www.truenas.com/docs/references/zfsdeduplication/#ram

After reading that, you may be onto something in regard to dedup. The destination server where the replication process is pushing to, which is where the bottleneck is occurring, is typically turned off. I only start it to perform a backup replication task and then shut it down when it completes.

Maybe the dedup cache isn't loaded into memory yet and it's churning away trying to build that cache as it's receiving data from the replication task.

Few questions in my mind right now in case anyone has a quick answer...
  1. If I disable dedup on a dataset, will it simply stop performing dedup when writing to that dataset? If so my next troubleshooting step would be to do this.
  2. When replicating a pool from one host to another, is it attempting to perform the dedup on the destination system? My assumption was that the dedup had already been performed on the source system and a replication of that dataset wouldn't cause another dedup to occur during replication.
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I do not have 33TB or anything close to that which is undergoing dedup, I have a vdev that is set to dedup and it only has about 2 TB of data in it.

dedup has pool-wide scope. This makes things ... complicated.

Based on what I have read, I shouldn't need a crazy amount of RAM for only 2 TB of dedup data.

Well, perhaps, but, in practice, that doesn't seem to pan out as well as people would like.

Here's the TrueNas docs covering RAM recommendations for dedup: https://www.truenas.com/docs/references/zfsdeduplication/#ram

Gee, thanks. I had ... no clue. heh.

Look, I understand the desire to interpret words optimistically, and I'm even fine with saying that iXsystems likes to write in a manner that leads to optimistic interpretations. But look at this:

Pools suitable for deduplication, with deduplication ratios of 3x or more (data can be reduced to a third or less in size), might only need 1-3 GB of RAM per 1 TB of data

The operative words here are "might only", and they're 1000% correct, it MIGHT only, but it might ALSO need 5GB-per-TB, or there are even ways to make it need much more than that. Pools with modest dedup ratios are a trainwreck for DDT ARC consumption.

When the system does not contain sufficient RAM, it cannot cache DDT in memory when read and system performance can decrease.

"can decrease" is more like "performance runs into a brick wall."

So I will happily concede that this is more art than science, because the real way to determine the amount of RAM needed is to look at the amount of DDT and ARC being used, and base it on that. But typical experience suggests starting at 5GB per TB is a really swell starting point.

Maybe the dedup cache isn't loaded into memory yet and it's churning away trying to build that cache as it's receiving data from the replication task.

Yup.

f I disable dedup on a vdev, will it simply stop performing dedup when writing to that vdev

Basically once you've enabled dedup, the only way to get rid of it is to tear down the pool. Disabling dedup still leaves you with a mess, and even deleting the dedup'ed data isn't really the same..

You might want to go have a read-through of


which is generally very insightful.
 

tn2100

Cadet
Joined
Mar 8, 2022
Messages
9
jgreco - thank you

I had no idea that dedup would be this impactful to the overall pool. I am using it on the dataset where I send my daily backups which have a high amount of duplication, but I have other ways/tools to recover that without using zfs dedup.

I will run a full scrub on the backup for sanity and then rebuild my main pool without dedup. Rather than a typical ZFS restore, I'll have to use an rsync job to restore it to avoid replicating the pool settings and snapshots is my guess.

I could probably still use dedup for backups if I just create a separate pool of limited size, and if it doesn't work out, then I can easily wipe out that small pool vs the larger one I'm having to rebuild now.

Will update this thread after I rebuild and restore the pool. I guess it's time to install that QSFP card before attempting this. :)
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I had no idea that dedup would be this impactful to the overall pool. I am using it on the vdev where I send my daily backups which have a high amount of duplication, but I have other ways/tools to recover that without using zfs dedup.
Just a little poke to remind you to check up on your terminology. You almost certainly don't mean VDEV. Maybe you're talking about a dataset or a pool.
 

tn2100

Cadet
Joined
Mar 8, 2022
Messages
9
Just a little poke to remind you to check up on your terminology. You almost certainly don't mean VDEV. Maybe you're talking about a dataset or a pool.
Thanks - it was late. :) Updated my posts to use dataset instead.
 

tn2100

Cadet
Joined
Mar 8, 2022
Messages
9
Had some delays, but so far the new pool without even a hint of dedup has been working excellent. The replication tasks to restore the data/snapshots (while not retaining dataset configuration) has been chugging along smooth as silk without any pausing. Running a few restore jobs at a time to keep it busy and seeing around 300-500 MB/s throughput over the network.

I have learned my lesson about dedup and will keep it out of my pools from now on. :) Thank you for the help!!
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
daily backups which have a high amount of duplication
compression can make a huge saving if the data is very similar, without the overhead of dedup. lz4 is basically no performance hit, while gzip and zstd can give large benefits, particularly for mostly static data
 

tn2100

Cadet
Joined
Mar 8, 2022
Messages
9
Yes, I leave lz4 on everything as it's almost free. I don't bother too much with any other compression format as the do consume more cpu and not a huge difference in compression.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
it depends on what it is compressing and what its being used for. zstd is supposed to be very fast to read, but with "meh" write speed, so if you write rarely it might be best.
if you write AND read rarely, gzip could be useful.
compression wont help much with already compressed stuff, like most audio and movie files, but if you have a huge log archive on storage you could see high ratios.
 
Top