Resource icon

ZFS de-Duplication - Or why you shouldn't use de-dup

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Arwen submitted a new resource:

ZFS de-Duplication - Or why you shouldn't use de-dup - ZFS de-duplicate essentials

The TrueNAS forums occasionally have people who come across ZFS de-duplicate, and want to investigate its use. Or think it is a good idea, and want to implement it.

Here are some suggested configuration details:
  • Understand that you need CPU power to compare ZFS blocks, for all writes in a de-dup dataset or zVol. This also means writes are delayed until the de-dup process compare is complete. So, faster CPU cores can work better with de-dup, than more cores.
  • In most cases...

Read more about this resource...
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
The "1.25GB for 1TB" ratio often vastly underrepresents the requirements of deduplication.

More often, we see the "5GB per 1TB" suggestion and that notably also applies to a dataset with an average record size of 64KB. If deduplication is applied to an iSCSI ZVOL (with a default volblocksize of 16K) this will result in the potential for 4x the memory usage, or "20GB per 1TB"

I would suggest leveraging and/or including a point to the resource from @Stilez on their experimentations with, and hardware requirements in order to get performant results (it required a dedup vdev using Optane devices) as a reference for deduplication requiring significant hardware.

 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Forum moderators, can we get the discussion moved out of;
Forums -> TrueNAS -> FreeNAS (Legacy Software Releases) -> FreeNAS Help & support -> General Questions and Help
This is really not a "FreeNAS (Legacy Software Releases)" resource.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
The "1.25GB for 1TB" ratio often vastly underrepresents the requirements of deduplication.

More often, we see the "5GB per 1TB" suggestion and that notably also applies to a dataset with an average record size of 64KB. If deduplication is applied to an iSCSI ZVOL (with a default volblocksize of 16K) this will result in the potential for 4x the memory usage, or "20GB per 1TB"

I would suggest leveraging and/or including a point to the resource from @Stilez on their experimentations with, and hardware requirements in order to get performant results (it required a dedup vdev using Optane devices) as a reference for deduplication requiring significant hardware.

I've added both points. Copied your memory suggestion and put in a suggested reading section.

Keep the suggestions coming. We want something straight forward to point new users that want to use De-Dup to read, so at least they are informed.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Forum moderators, can we get the discussion moved out of;
Forums -> TrueNAS -> FreeNAS (Legacy Software Releases) -> FreeNAS Help & support -> General Questions and Help
This is really not a "FreeNAS (Legacy Software Releases)" resource.

Moved to Operation and Performance which seems like the most accurate place for the discussion thread.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Moved to Operation and Performance which seems like the most accurate place for the discussion thread.
Thank you.

We should probably move some more of the Resource discussion threads out of the legacy FreeNAS sub-forums, as the implication is that the resource would then only apply to legacy FreeNAS
 

Okeur75

Dabbler
Joined
Nov 16, 2022
Messages
36
Thank you for the ressource.
I would add, based on my personal experience, that if you really want to test de-dedup, you should do it on a dedicated pool (even if it can be enabled on a particular dataset).
From what I experienced, an issue on a deduplication-enabled dataset can prevent you from mounting the whole pool (see my thread here).

Also, since dedup relies even more on memory than ZFS, maybe emphasize the need for ECC RAM (already strongly recommended for ZFS, but even more if you use dedup).

And for "spelling update" you missed a word here : when you a program like RSync
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Thank you for the ressource.
I would add, based on my personal experience, that if you really want to test de-dedup, you should do it on a dedicated pool (even if it can be enabled on a particular dataset).
From what I experienced, an issue on a deduplication-enabled dataset can prevent you from mounting the whole pool (see my thread here).

Also, since dedup relies even more on memory than ZFS, maybe emphasize the need for ECC RAM (already strongly recommended for ZFS, but even more if you use dedup).

And for "spelling update" you missed a word here : when you a program like RSync
Thank you.

Added this section:
  • In some cases, it is better to use a dedicated pool for de-dup, than to share the pool with both de-dupped datasets and ones without. Thus, if a pool problems arise, it will not affect all your data.

I've added a suggestion for the ECC RAM:
  • Because de-dup is memory intensive, some people suggest that ECC memory is more important for this use.

I've fixed the wording for:
when you update with a program like RSync
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Something occurred to me.

Is the ZFS de-dup checksum table sorted?
If not, would sorting the de-dup checksum table improve performance?

I mean, imagine that you have 10, 256 byte checksums you have to compare against your new one. If they are not in sorted order, you have to check every single one against your new one. BUT, if they are in sorted order like this:

New checksum - 5678...

Checksum table:
1234...
2345...
3456...
4567...
6789...
7890...
8901...
9012..
9999...

As you can see, you only have to compare the first 64 bit word of the first 5 checksums before you find that your new one is unique. The same would apply to some degree if their was a match, because you avoid all the unnecessary compares.

Plus, some of sort routines would apply. Like starting the compare in the middle. Then checking the middle entry of the upper half or lower half, and continuing until either match or unique.

Now obviously the de-dup table would need to be a sorted linked list, so that you can add a new entry anywhere in the middle, not just at the ends.

So, did my half-awake brain figure out how ZFS de-dup works?
Or did I come up with a huge optimization?

I searched the web, but could not find an answer, (for my half-awake brain...).
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
It might be worthwhile to go into a little discussion about file-level deduplication strategies a little bit more.

File-level deduplication: This is cheap and super-easy, without any significant requirements. I like Phil Karn's dupmerge (look for "KA9Q dupmerge" to find the source code).

The code removes duplicate files and replaces copies other than the first with a hardlink back to the first. However, it isn't automatic, you have to run something like a script to cause it to do the deduplication. It also relies on you not changing the contents of the files, so it is mostly useful for archival file access.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
It might be worthwhile to go into a little discussion about file-level deduplication strategies a little bit more.

File-level deduplication: This is cheap and super-easy, without any significant requirements. I like Phil Karn's dupmerge (look for "KA9Q dupmerge" to find the source code).

The code removes duplicate files and replaces copies other than the first with a hardlink back to the first. However, it isn't automatic, you have to run something like a script to cause it to do the deduplication. It also relies on you not changing the contents of the files, so it is mostly useful for archival file access.
Done. Made a separate section, and included RSync which can do something similar.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Done. Made a separate section, and included RSync which can do something similar.

Excellent. You rock. You do a very nice job of writing on your resources, I'm just a little jealous. :smile:
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Excellent. You rock. You do a very nice job of writing on your resources, I'm just a little jealous. :smile:
I am a professional writer after all, even have several published works!
 
Last edited:
Top