ZFS Deduplication on and off supported

Status
Not open for further replies.

dirkme

Contributor
Joined
Jul 19, 2017
Messages
162
Dear FreeNAS Friends,

what I mean by that is, if I try ZFS Deduplication and would feel it takes too much of performance, can i then switch it back to off without losing data?

As much as I understand ZFS Deduplication makes kind of a link to a duplicate and if I switch back to off, will that link be replaced again with a file?
 
Last edited by a moderator:

Thomas102

Explorer
Joined
Jun 21, 2017
Messages
83
Hi,
No it won't be. the "links" won't be replaced and it may not help performance.
New data copied to the disk won't use deduplication. you may remove dedup on a file by copying it but not sure it helps performance either
 
Last edited:

dirkme

Contributor
Joined
Jul 19, 2017
Messages
162
Hi,
No it won't be. the "links" won't be replaced and it may not help performance.
New data copied to the disk won't use deduplication. you may remove dedup on a file by copying it but not sure it helps performance either

The idea was like on NextCloud, if a Guy shares 1 file to 10 users, it would store only one file on the storage, thinking if you then switch off dedup, would those 10 users still have the file in their NextCloud? Otherwise you switch it on, you are stuck or you have to move the data to a new dataset, but you would still lose duplicated files which were in multiple folders, sigh.
 

Thomas102

Explorer
Joined
Jun 21, 2017
Messages
83
switching off dedup does not change anything to the file system.
Deduped files (blocks is more accurate) will stay dedup.
It only apply to data created after switching off.

The key point is that if you have a performance issue, switching off is not enought.
You will have to recreate the pool.
 

dirkme

Contributor
Joined
Jul 19, 2017
Messages
162
switching off dedup does not change anything to the file system.
Deduped files (blocks is more accurate) will stay dedup.
It only apply to data created after switching off.

The key point is that if you have a performance issue, switching off is not enought.
You will have to recreate the pool.

I will play around on a small dataset and will see what it does to my performance and if it is worth switching it on at all.

I will update this post with my findings, and if anyone has more explanation, advantages or best usage, that would be great. Also, experience reports would be great :smile:
 

pschatz100

Guru
Joined
Mar 30, 2014
Messages
1,184
Just curious... Why are you interested in dedup? What kind of data do you plan to store?
 

dirkme

Contributor
Joined
Jul 19, 2017
Messages
162
Just curious... Why are you interested in dedup? What kind of data do you plan to store?

I have a few 100 MB on photos, some are sorted and some are still in folder and sub-folders.

I trust there are many duplicates and I thought it would save significant space.

However, I tested around with a 99 GB dataset and I saved somewhat of 9 GB at most.

But the performance decrease is massive and in my opinion not worth it.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977

pschatz100

Guru
Joined
Mar 30, 2014
Messages
1,184
Dedupe requires quite a bit of CPU power and lots of memory. Your system may not have sufficient resources to do it effectively.

I'm not really surprised by your results. Dedupe is not designed for media, and generally has limited benefit on groups of files that are already compressed such as mp3, mp4, and jpg.

Thanks for sharing your interesting experiment.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I have a few 100 MB on photos, some are sorted and some are still in folder and sub-folders.

I trust there are many duplicates and I thought it would save significant space.

However, I tested around with a 99 GB dataset and I saved somewhat of 9 GB at most.

But the performance decrease is massive and in my opinion not worth it.
You're looking at the wrong tool. Your static set of duplicate-containing data is best suited to a client-side solution.

I've used dupeGuru successfully in the past, and it also has versions that can look for similar images (and audio) even if the files themselves are different (different type, resolution or encoding).
 

dirkme

Contributor
Joined
Jul 19, 2017
Messages
162
You're looking at the wrong tool. Your static set of duplicate-containing data is best suited to a client-side solution.

I've used dupeGuru successfully in the past, and it also has versions that can look for similar images (and audio) even if the files themselves are different (different type, resolution or encoding).

Is that available for FreeNAS?
 
Last edited by a moderator:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419

Rattlebattle79

Dabbler
Joined
Aug 1, 2016
Messages
10
If you ask if it's a good idea to use dedup, don't use it. You have to be at a much higher level as a sysadmin before you start fiddling with dedup. It could have severe consequences if you don't have a good plan and strategy and know exactly what you are doing!
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
If you ask if it's a good idea to use dedup, don't use it. You have to be at a much higher level as a sysadmin before you start fiddling with dedup. It could have severe consequences if you don't have a good plan and strategy and know exactly what you are doing.
Hence my suggestion for a sort of offline (from the server's perspective) dedup.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
I'm currently using rdfind from a jail to dedupe images.
I've also had good results with duff.
 
Status
Not open for further replies.
Top