Performance Feedback

c9dan

Cadet
Joined
Dec 5, 2022
Messages
2
Greetings,

About 5 years ago I got to build my dream FreeNAS system and its been wonderful. After the initial churn its run well with little interaction required. I've used FreeNAS for other things including virtualization storage and storage virtualization, but my main NAS has been simple and effective. Recently I've been having some performance issues and I'm hoping for some feedback in troubleshooting.

Thanks for any feedback.

Some notes:

-In hindsight I know the pool should have been 4 vdevs mirrored, but raidz3 is what I've got
-MEM is 32GB ECC unbuffered and effectively maxed
-Disk upgrade about 3 years ago where each 6TB drive was replaced with a 10 TB drive. The pool remained intact.
-System was recently upgraded from 11.3 to 13
-System is using legacy geli encryption
-Some deduplication on one dataset used for desktop image backups
-iperf tests show reasoable thoughput (~600Mbps)
-disk write speeds < 1M during performance impact (I suspect this might improve after rebooting, any ideas what could cause a temporary slowdown that was cleared by a reboot?)
-have an nvme drive to use as zil/slog and L2ARC, but wanted to see if there was anything I should check before adding that in

Code:
# zpool iostat -v tank
                                                      capacity     operations     bandwidth 
pool                                                alloc   free   read  write   read  write
--------------------------------------------------  -----  -----  -----  -----  -----  -----
tank                                                40.0T  32.5T    852    356   152M  5.26M
  raidz3-0                                          40.0T  32.5T    852    356   152M  5.26M
    gptid/1a578bfe-fdc3-11e9-a688-001b21cda10c.eli      -      -    115     46  19.1M   689K
    gptid/381b2e6f-fcd5-11e9-a688-001b21cda10c.eli      -      -    108     45  18.9M   674K
    gptid/51925046-fbdb-11e9-b117-001b21cda10c.eli      -      -    101     44  18.9M   666K
    gptid/9e20beb7-fabb-11e9-8673-001b21cda10c.eli      -      -    125     42  19.2M   662K
    gptid/a1feedb9-f9c8-11e9-820f-001b21cda10c.eli      -      -     93     46  19.1M   689K
    gptid/16fa399d-f8cb-11e9-820f-001b21cda10c.eli      -      -     93     45  18.9M   676K
    gptid/303846be-f7b1-11e9-a2c5-001b21cda10c.eli      -      -    102     44  18.9M   667K
    gptid/5e2ebacd-f6a6-11e9-9cee-001b21cda10c.eli      -      -    112     42  19.3M   664K
--------------------------------------------------  -----  -----  -----  -----  -----  -----

# zpool list
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
freenas-boot  14.9G  9.47G  5.41G        -         -      -    63%  1.00x    ONLINE  -
tank          72.5T  40.0T  32.5T        -         -    29%    55%  1.65x    ONLINE  /mnt

Iperf results:
    [  5]   0.00-10.01  sec   764 MBytes   641 Mbits/sec                  receiver


Motherboard make and model:
ASROCK RACK E3C224D4I-14S
CPU make and model:
Intel(R) Xeon(R) CPU E3-1230L v3 @ 1.80GHz
RAM quantity:
32GB ECC
hw.physmem: 34263973888
hw.usermem: 3492909056
hw.realmem: 34359738368
Hard drives, quantity, model numbers, and RAID configuration, including boot drives:
Pool in question:
Raidz3
8 x 10 TB Western digital
Mediasize: 10000831348736 (9.1T)
descr: ATA WDC WD100EMAZ-00
Boot drives 2 x 16 GB Sandisk USB
Hard disk controllers:
LSI 2308
Network cards:
Intel® Ethernet Controller I210-AT
Intel® Ethernet Converged Network Adapter X520-DA2
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Greetings,

About 5 years ago I got to build my dream FreeNAS system and its been wonderful. After the initial churn its run well with little interaction required. I've used FreeNAS for other things including virtualization storage and storage virtualization, but my main NAS has been simple and effective. Recently I've been having some performance issues and I'm hoping for some feedback in troubleshooting.

Thanks for any feedback.

Some notes:

-In hindsight I know the pool should have been 4 vdevs mirrored, but raidz3 is what I've got
-MEM is 32GB ECC unbuffered and effectively maxed
-Disk upgrade about 3 years ago where each 6TB drive was replaced with a 10 TB drive. The pool remained intact.
-System was recently upgraded from 11.3 to 13
-System is using legacy geli encryption
-Some deduplication on one dataset used for desktop image backups
-iperf tests show reasoable thoughput (~600Mbps)
-disk write speeds < 1M during performance impact (I suspect this might improve after rebooting, any ideas what could cause a temporary slowdown that was cleared by a reboot?)
-have an nvme drive to use as zil/slog and L2ARC, but wanted to see if there was anything I should check before adding that in
Hello! You've done a good job of providing system information and both hardware/software details here, so thank you for that. Let's dig in.

The biggest culprit for the low performance here is likely your use of deduplication. ZFS deduplication requires a significant amount of memory, and causes a lot of metadata I/O - which your pool isn't set up to handle well, but more on that later. Can you run zpool status -D and post the output of the line that looks similar to

Code:
dedup: DDT entries 34188575, size 829B on disk, 267B in core


Multiply the number of entries by the bytes in core, and that will tell you how much RAM is being used by the deduplication tables. (Do the same for "size on disk" to see the disk space used.)

Then run zpool list - post the results here, and you should be able to see the degree of data being saved under the DEDUP column. These two snippets will tell you just how much space you're saving, and how much RAM it's costing you to do so.

The next pain point is, as you correctly assumed, your pool layout. The RAIDZ3 layout isn't nearly as performant as the 4x2-way mirrors would be - when you write small records (like metadata) to a wide Z3, you get a bunch of write amplification from having to make three parity writes for that single record, as opposed to those same four disks being able to handle two parallel writes if they were working as mirrors.

Finally, the E3-L Xeons bring a lower clock speed with their lower TDP, and you're asking yours to do both deduplication and legacy GELI encryption on top of its regular ZFS duties - and certain processes like SMB are single-threaded, so this will hit a transfer ceiling much earlier than a full-wattage chip would.

For the comment of "performance improves after reboot" - is this read or write performance? Reads may recover after a reboot, as you won't have filled your RAM with the deduplication tables yet, and writes may work for a brief period, but once you write a backup job to the dedup-enabled dataset, my hunch is that your pool is filling RAM with the deduplication tables and trying to compare them. This makes your disks busy, which slows down all of the other I/O as well.

Let's look at these points first before we look at any NVMe devices - but if you aren't doing virtualization storage here, an SLOG is likely not useful. L2ARC might have an edge case here, but let's look at the known quantities first, especially that dedup setup.
 

c9dan

Cadet
Joined
Dec 5, 2022
Messages
2
Sorry for the duplicate post, I dont see an immediate option to remove that but I will keep looking.

The biggest culprit for the low performance here is likely your use of deduplication. ZFS deduplication requires a significant amount of memory, and causes a lot of metadata I/O - which your pool isn't set up to handle well, but more on that later. Can you run zpool status -D and post the output of the line that looks similar to

Code:
dedup: DDT entries 858313, size 9.88K on disk, 3.19K in core

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    21.0M   2.62T   2.38T   2.37T    21.0M   2.62T   2.38T   2.37T
     2    7.55M    966G    789G    787G    20.6M   2.58T   2.08T   2.07T
     4    2.11M    270G    224G    224G    9.25M   1.16T    998G    998G
     8     133K   16.6G   15.6G   15.6G    1.35M    173G    162G    162G
    16    15.5K   1.94G   1.54G   1.55G     337K   42.1G   33.8G   33.9G
    32    1.27K    163M   39.5M   41.7M    50.1K   6.25G   1.64G   1.73G
    64    1.35K    173M    147M    147M     127K   15.9G   13.7G   13.7G
   128      115   14.4M   4.04M   4.35M    20.1K   2.51G    609M    665M
   256       38   4.75M    240K    411K    12.5K   1.56G   72.9M    131M
   512       21   2.62M     84K    192K    14.8K   1.85G   59.3M    135M
    1K       17   2.00M   64.5K    155K    22.0K   2.61G   84.1M    201M
    2K        3    384K     12K   27.4K    9.05K   1.13G   36.2M   82.6M
    4K        3    384K     12K   27.4K    13.5K   1.69G   54.2M    124M
 Total    30.8M   3.84T   3.38T   3.38T    52.8M   6.59T   5.63T   5.63T


My math comes back with 2611MB, I'd expect not a showstopper but surely wasteful with limited memory.

Then run zpool list - post the results here, and you should be able to see the degree of data being saved under the DEDUP column. These two snippets will tell you just how much space you're saving, and how much RAM it's costing you to do so.

Code:
# zpool list
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
freenas-boot  14.9G  9.47G  5.41G        -         -      -    63%  1.00x    ONLINE  -
tank          72.5T  40.0T  32.5T        -         -    29%    55%  1.66x    ONLINE  /mnt


I know this specific use of deduplication isn't stellar. My thought process when setting it up was, well if the same data is being backed up weekly, I should be able to snapshot and have point in time reference to any date's backup while not using any more real space. I hadn't considered compression on the backup which could change the data written each week and a rogue (aren't they all?) windows update somehow marked a 1TB scratch data drive as a system disk and is including it in the backup. The dataset expanded to about 11TB and I was able to cut it back to 5TB, but this dataset can likely be rebuilt outright without any risk, so I will look at that.

The next pain point is, as you correctly assumed, your pool layout. The RAIDZ3 layout isn't nearly as performant as the 4x2-way mirrors would be - when you write small records (like metadata) to a wide Z3, you get a bunch of write amplification from having to make three parity writes for that single record, as opposed to those same four disks being able to handle two parallel writes if they were working as mirrors.

I think the long term plan needs to be to rebuild this pool with the striped mirrors. I have two 'copies' of the data, one off site, but I'm not keen on reducing that to 1 copy while doing the rebuild. I may be able to find some options to store the data elsewhere, otherwise perhaps it is time for a new box. Its served me faithfully and definitely been worth the investment over its lifetime.

Finally, the E3-L Xeons bring a lower clock speed with their lower TDP, and you're asking yours to do both deduplication and legacy GELI encryption on top of its regular ZFS duties - and certain processes like SMB are single-threaded, so this will hit a transfer ceiling much earlier than a full-wattage chip would.

Understood, from a resource perspective I would suggest that SMB + Encryption are not exceptionally compute intensive; I don't have specific numbers to back this up other than I have see reads and writes in the realm of 400MBps (another pool, mirrored SSD, over 10Gb ethernet) on this same chip, so I am suspecting something with the specific disks or pool configuration.

For the comment of "performance improves after reboot" - is this read or write performance? Reads may recover after a reboot, as you won't have filled your RAM with the deduplication tables yet, and writes may work for a brief period, but once you write a backup job to the dedup-enabled dataset, my hunch is that your pool is filling RAM with the deduplication tables and trying to compare them. This makes your disks busy, which slows down all of the other I/O as well.

It's both, but specifically write performance is sometimes almost unusable. Last night I was getting about 600K throughput, this morning after a reboot, closer to 10MBps.

Thanks for your feedback! I'll look into migrating the backups and eliminating the deduplicated dataset, can probably do that in under a week.
 
Top