Encrypted dataset reads: Poor performance + WebUI inaccessible

lvz

Cadet
Joined
Apr 30, 2023
Messages
4
Hi,

I'm seeing large slowdowns when reading files from encrypted datasets. When this happens, the TrueNAS WebUI becomes completely inaccessible (I see "Waiting for Active TrueNAS controller to come up..." during the entirety of the transfer). Also, while the CPU (6C 12T) is showing very little usage, the system load jumps up to 16+ (normally when idle it's below 1).

I don't see this slowdown when copying files from unencrypted datasets, even when the destination is an encrypted dataset.

Here's a summary of the behavior (60 GB data transfer between datasets):
  1. unencrypted->unencrypted (within same dataset): 125MB/s
  2. unencrypted->encrypted: 121MB/s
  3. encrypted->encrypted (within same dataset): 18MB/s + inaccessible WebUI
  4. encrypted->unencrypted: 23MB/s + inaccessible WebUI

Does anyone have any ideas as to why I'm seeing the slowdown? Are there any tweaks I can try that might help?

Is it normal for the TrueNAS WebUI to hang, when there's lots of disk activity (even if that activity isn't in the boot or system/app pool)?




Here's how I tested the transfers above:
  1. Created 2 test datasets in my "Media" pool (HDD mirrors, see system specs below), 1 unencrypted, 1 encrypted:
    dataset options:
    sync: disabled compression: off atime: off zfs deduplication: off encrypted datset: passphrase, pbkdf2iters: 350000 (default), algorithm: AES-256-GCM (default)
  2. Created 3 random 20GB files in each dataset:
    for i in {1..3}; do dd if=/dev/urandom of=./20gfile-$i.txt bs=4k iflag=fullblock,count_bytes count=20G; done
  3. Tested transfer speeds between datasets:
    time rsync -avh --progress /path/to/20gfile-* .
    (I see similar behavior with cp)

Here's my system information:
TrueNAS-SCALE-22.12.2

ASRock Rack E3C246D2I
Intel Xeon E-2246G @ 3.60 GHz (6C 12T)
32GB RAM (2x16 Kingston KSM26ED8/16MR)

Boot pool:
128GB SSD (M4-CT128M4SSD2)

System/app pool:
VDEV 1 (mirror): 2x 500GB SSD (850 EVO / 870 EVO)

Media pool:
VDEV 1 (mirror): 2x 6TB HDD (2x WD60EFZX - CMR)
VDEV 2 (mirror): 2x 14TB HDD (2x WD140EFGX - CMR)

Network: using onboard Intel i210 (1GbE)
Storage: using onboard OCuLink/SATA ports

Thanks!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Is it normal for the TrueNAS WebUI to hang, when there's lots of disk activity

I've seen SCALE stall in some cases, which appears to be possibly related to swapping activity. Usually it seems to recover within maybe ten seconds but I have to say it sure isn't impressing me with Linux.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
Yeap, between Linux memory management, k3s constantly using 10+% CPU cycles (more than my Windows VM for crying out loud), the stability issues people are reporting.... I really wonder how SCALE is getting so many adopters.
 
Joined
Oct 22, 2019
Messages
3,641
Outside of the sake of "testing", I would leave "sync" to its default setting, and "compression" on (i.e, LZ4).

From my understanding, even during asynchronous operations (such as with SMB), writing metadata still requires confirmation of being safely flushed to persistent storage

As for compression, having any type of compression enabled removes extraneous "padding" at the end of records, as well as still compresses metadata.

So there's really not much benefit, with potential risks, by disabling both of these settings on a dataset. (Unless you have a very explicit need to do so).

As for the very slow read speeds with encrypted datasets, this could be due to the state of ZFS on Linux. I've experienced the same issue on Arch Linux-based distros using ZFS. Read speeds will be fast one day, then after a kernel upgrade and reboot, read speeds are once again slow. (Same speeds you're reporting, maxing out at approximately 25 MB/s.)

Your "icp_aes_impl" parameter for the "icp" module might be using "cycle" or "generic" or "x86_64", instead of "aesni". The default setting of "fastest" will choose the first available option in the order of aesni -> x86_64 -> generic

I believe regressions can surface on Linux with each iteration of ZFS and/or the Linux kernel.
 
Last edited:

lvz

Cadet
Joined
Apr 30, 2023
Messages
4
I've seen SCALE stall in some cases, which appears to be possibly related to swapping activity. Usually it seems to recover within maybe ten seconds but I have to say it sure isn't impressing me with Linux.
Outside of the encrypted reads, I've also seen random times of it hanging and recovering shortly afterwards. In this case though, it hangs for the whole encrypted read (meaning approx. an hour of no TrueNAS WebUI, while the 60GB transfer test is in progress).

Outside of the sake of "testing", I would leave "sync" to its default setting, and "compression" on (i.e, LZ4).
Yeah, for my real datasets, I have these options defaulted/on. I just wanted to eliminate any potential bottlenecks while diagnosing this issue.

As for the very slow read speeds with encrypted datasets, this could be due to the state of ZFS on Linux. I've experienced the same issue on Arch Linux-based distros using ZFS. Read speeds will be fast one day, then after a kernel upgrade and reboot, read speeds are once again slow. (Same speeds you're reporting, maxing out at approximately 25 MB/s.)
Thanks, that's good info. I wonder why only reads appear to be impacted. If the decryption was unoptimized/inefficient, I'd expect there to be high CPU usage, which I don't see in my case (just a pure guess on my part).

Your "icp_aes_impl" parameter for the "icp" module might be using "cycle" or "generic" or "x86_64", instead of "aesni". The default setting of "fastest" will choose the first available option in the order of aesni -> x86_64 -> generic
My icp_aes_impl parameter looks like it's set to "fastest":
cat /sys/module/icp/parameters/icp_aes_impl cycle [fastest] generic x86_64 aesni #

I'll see if forcing it to "aesni" changes anything. I'll also try out the different dataset encryption options.


I appreciate all the help/info!
 
Joined
Oct 22, 2019
Messages
3,641
I wonder why only reads appear to be impacted.
Writes "seem" faster, because they are async, and simply "write to RAM" when the write operation apparently "completes".


I'll see if forcing it to "aesni" changes anything.
Probably won't make a difference. "fastest" will use the best available option. Trying to force it to "aesni" may likely hit you with "invalid". (Plus, I wouldn't try to play around with this on a live system.)


Out of curiosity, what specific kernel version is the latest TrueNAS SCALE running? I'd be curious to see if you notice you don't have this "encryption performance issue" on an older version of SCALE, and more specifically with an older kernel.

(I believe SCALE originally started with 5.10.xx, and is currently on 5.15.xx?)

EDIT: Might be related, yet not finalized: https://github.com/openzfs/zfs/pull/14531

At this point, I can only say that "ZFS works and performs way better on FreeBSD than Linux."

It just... does.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
Is this a new test system?

Would it be possible to install TrueNAS Core 13.0-U3.1 and measure its read performance for encrypted datasets? Assuming you haven't fully configured and setup your SCALE environment.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
Out of curiosity, what specific kernel version is the latest TrueNAS SCALE running? I'd be curious to see if you notice you don't have this "encryption performance issue" on an older version of SCALE, and more specifically with an older kernel.
I don't know if I'm running the latest version (TrueNAS-SCALE-22.12.0), because I'm only running this in a VM for experimental purposes.

EDIT:
Looks like latest is 22.12.2, I should be close enough.

But anyways...
Code:
root@nas3[~]# uname -a
Linux nas3 5.15.79+truenas #1 SMP Tue Dec 13 12:40:04 UTC 2022 x86_64 GNU/Linux
 

lvz

Cadet
Joined
Apr 30, 2023
Messages
4
Out of curiosity, what specific kernel version is the latest TrueNAS SCALE running?
Here's TrueNAS-SCALE-22.12.2:
Linux nas 5.15.79+truenas #1 SMP Mon Apr 10 14:00:27 UTC 2023 x86_64 GNU/Linux

Is this a new test system?

Would it be possible to install TrueNAS Core 13.0-U3.1 and measure its read performance for encrypted datasets? Assuming you haven't fully configured and setup your SCALE environment.
Unfortunately, it's already set up and in-use, so I wouldn't want to switch over to Core at this point.
 

lvz

Cadet
Joined
Apr 30, 2023
Messages
4
After upgrading to TrueNAS-SCALE-23.10.0, and upgrading my pools to the new ZFS version, I'm no longer seeing major drops in encrypted read performance, and am no longer seeing the related WebUI hangs while the encrypted reads are in progress.

I re-ran the tests from the 1st post, and am getting ~120MB/s encrypted reads now.

In case anyone's wondering, here's SCALE 22.12.4's software versions (slow encrypted reads):
5.15.131+truenas, ZFS version 2.1.12-1

And here's 23.10.0:
6.1.55-production+truenas, ZFS version 2.2.0-rc4
 

sfatula

Guru
Joined
Jul 5, 2022
Messages
608
Strange, never had encrypted read issues on any version of Scale, no stalls, nothing. Makes me curious what the actual issue was! Scale is and has been rock solid for me.
 
Top