viniciusferrao
Contributor
- Joined
- Mar 30, 2013
- Messages
- 192
Hello, I'm considering enabling deduplication on a new pool that will serve some Linux repositories. Since there's a lot of common packages it would benefit from deduplication.
Unfortunately I only had the ideia after mirroring almost 9TB of data, so if I would go further with the ideia I'll need to redownload everything.
With that being said, I've done some testes with
I will also disable the
If my calculations are correct I will need roughly 17,81GB of system RAM to hold the deduplication table in the current state of the data. The system has a total of 32GB of RAM which will compromise severely the available RAM, but I'm already running in a scenario where the disk space is already at 75%
With that data I would like to hear the opinions about deduplication.
The question is: what is the state of deduplication today? In the past if we didn't have enough RAM to read the deduplication table, the pool cannot be mounted and effectively losing its data. Is this still true? Is deduplication worth in my scenario? It would reduce 33% of used disk space which seems good to me.
Please consider that:
* Performance is not that important. It cannot be neglected however, but since it's a local mirror it does not need to be extremely fast, being faster than the WAN is sufficient.
* I don't have nay dedicated SSDs on this pool nor I can add to the system.
* There's no other server to hold the data and I don't have the resources ($$$) to increase disk size.
* It's a single RAID-Z pool with 8x 2TB NL-SAS disks. Single parity.
* Running the latest version of SCALE:
Thanks in advance for any opinions on this matter.
Unfortunately I only had the ideia after mirroring almost 9TB of data, so if I would go further with the ideia I'll need to redownload everything.
With that being said, I've done some testes with
zdb
and it seems that I can increase the space efficiency to up to 1.33x:Code:
root@truenas-repos[/mnt]# zdb -U /data/zfs/zpool.cache -S repos0 Simulated DDT histogram: bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 1 45.3M 5.61T 5.48T 5.48T 45.3M 5.61T 5.48T 5.48T 2 10.3M 1.27T 1.24T 1.24T 21.4M 2.64T 2.57T 2.57T 4 1.27M 160G 156G 156G 5.57M 704G 686G 686G 8 142K 17.7G 17.4G 17.4G 1.19M 152G 149G 149G 16 8.48K 1.04G 1.00G 1.00G 167K 20.6G 19.7G 19.7G 32 2.79K 355M 331M 331M 132K 16.4G 15.3G 15.4G 64 999 124M 123M 123M 77.1K 9.57G 9.48G 9.48G 128 9 774K 350K 370K 1.36K 113M 46.7M 50.1M 256 1 128K 4K 6.73K 510 63.8M 1.99M 3.35M Total 57.0M 7.06T 6.89T 6.89T 73.8M 9.13T 8.91T 8.91T dedup = 1.29, compress = 1.02, copies = 1.00, dedup * compress / copies = 1.33
I will also disable the
zstd
compression since it does not save anything and will stick with lz4
.If my calculations are correct I will need roughly 17,81GB of system RAM to hold the deduplication table in the current state of the data. The system has a total of 32GB of RAM which will compromise severely the available RAM, but I'm already running in a scenario where the disk space is already at 75%
Code:
root@truenas-repos[/mnt]# zfs list NAME USED AVAIL REFER MOUNTPOINT boot-pool 2.65G 25.0G 96K none boot-pool/ROOT 2.64G 25.0G 96K none boot-pool/ROOT/22.12.2 2.64G 25.0G 2.63G legacy boot-pool/ROOT/Initial-Install 8K 25.0G 2.63G / boot-pool/grub 8.20M 25.0G 8.20M legacy repos0 8.97T 3.14T 209K /mnt/repos0 repos0/.system 712M 3.14T 645M legacy repos0/.system/configs-40cc91dacae0491e84781ab81ded8ba4 2.64M 3.14T 2.64M legacy repos0/.system/cores 162K 1024M 162K legacy repos0/.system/ctdb_shared_vol 162K 3.14T 162K legacy repos0/.system/glusterd 175K 3.14T 175K legacy repos0/.system/rrd-40cc91dacae0491e84781ab81ded8ba4 34.9M 3.14T 34.9M legacy repos0/.system/samba4 875K 3.14T 875K legacy repos0/.system/services 162K 3.14T 162K legacy repos0/.system/syslog-40cc91dacae0491e84781ab81ded8ba4 27.7M 3.14T 27.7M legacy repos0/.system/webui 162K 3.14T 162K legacy repos0/repos 8.97T 3.14T 8.97T /mnt/repos0/repos
With that data I would like to hear the opinions about deduplication.
The question is: what is the state of deduplication today? In the past if we didn't have enough RAM to read the deduplication table, the pool cannot be mounted and effectively losing its data. Is this still true? Is deduplication worth in my scenario? It would reduce 33% of used disk space which seems good to me.
Please consider that:
* Performance is not that important. It cannot be neglected however, but since it's a local mirror it does not need to be extremely fast, being faster than the WAN is sufficient.
* I don't have nay dedicated SSDs on this pool nor I can add to the system.
* There's no other server to hold the data and I don't have the resources ($$$) to increase disk size.
* It's a single RAID-Z pool with 8x 2TB NL-SAS disks. Single parity.
* Running the latest version of SCALE:
TrueNAS-SCALE-22.12.2
Thanks in advance for any opinions on this matter.