- Joined
- May 17, 2014
- Messages
- 3,611
tl-dnr:
De-dup really is a specialized use case. If you think you need it, you don't. You will know when you need it and can actually implement it wisely.
There have been a few posts about using de-dup. So, let's first get some things out of the way;
NOTE: This does not do real de-dup. But it does save space on things that don't change.
Edit: Put the take-away at the top. Re-worded a bit for proper english syntax.
Edit: Per suggestion, added line item about not being able to disable de-dup easily.
De-dup really is a specialized use case. If you think you need it, you don't. You will know when you need it and can actually implement it wisely.
There have been a few posts about using de-dup. So, let's first get some things out of the way;
- De-dup is a memory hog, as the de-dup table(s) must reside in memory. The more data de-dupped, the more memory needed.
- De-dup can be a CPU hog for writing, (it needs to scan the de-dup table for matches).
- De-dup in general wants better checksum algorythms, (which tend to be slower), to prevent hash collisions.
- De-dup can't be retroactively enabled, (and have all the data magically de-dupped). It has to be written to a de-dupped enabled dataset. Plus, ALL it's support data, (the other data you want to de-dup against), has to be in the same dataset.
- De-dup can't be retroactively disabled. All dataset(s) that use de-dup would have to be copied to dataset(s) without de-dup. Then the source dataset(s) destroyed. Until then, the memory impact still exists.
- De-dup is checksum dependant. Changing the checksum algorythm on an active de-dup dataset prevents de-dup from de-dupping new data against old data.
- Create a backup dataset
- Create a client dataset inside the backup dataset
- Use any file by file full backup tool to initially populate the client dataset
- Snapshot the client dataset, (I use the date, as in @20181002, for the name)
- All future backups use Rsync to only copy files that have changed, snapshot again after
NOTE: This does not do real de-dup. But it does save space on things that don't change.
Edit: Put the take-away at the top. Re-worded a bit for proper english syntax.
Edit: Per suggestion, added line item about not being able to disable de-dup easily.
Last edited: