Resource icon

My experiments in building a home server capable of handling fast + consistent deduplication

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
Stilez submitted a new resource:

My experiments in building a home server capable of handling fast + consistent deduplication

AIM:

To help people looking at deduplication on TrueNAS 12+, what I've found on the way making it work on mine.

On sustained mixed loads, such as 50GB+ file copies and multiple transfers, using TrueNAS 12 with a deduped pool and default config, I now get almost totally consistent and reliable ~300 MB/sec client-server, server-client, and server-server, right now, and couldn't get close to that before, on 11.3 and earlier.

BACKGROUND.... WHY WOULD I WANT DEDUP ANYWAY...

Read more about this resource...
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Looking forward to reading the next installments.. don't leave us hanging.
 

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
Its moving the pool over the next few days, while I rebuild my workstation. Upcoming stuff I want to try, then write up......

Tunables that help performance
Testing out iSCSI as an alternative to a local disk
Seeing how snapshot deletion speed compares.

I spent years trying to get good performance, with dedup. Now I can, I want to put what I've learned about doing it, up there.

But its got a bunch of replication and disk resilvering to do before then.

Omens are good. A mass recursive delete of 957 snapshots, across 17 datasets (16269 snaps total) just took 48 minutes. Thats a dedup snap destroy rate down from tens of seconds per snap, before special vdev+Optane, to 1/6 of a second each after moving to 12-BETA and adding them.
 
Last edited:

Trexx

Dabbler
Joined
Apr 18, 2021
Messages
29
It would be helpful to see a before/after of your hardware configuration (DDT, l2Arc, etc.) and tuning parameters.
 

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
171
I'm pretty sure the reads do not require dedup table lookup. Writes require a lookup and a modification, yes, but I don't think reads do.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
This post was written a little while back and I was curious if there was any update to it based on new software release.
 

Jamberry

Contributor
Joined
May 3, 2017
Messages
106
I am sure this is very interesting information for dedup users. For home users you skip over the most important part pretty quickly in my opinion:

I'm sure the detail can have holes picked in it. Why don't I do backups differently, or incrementally. User RAIDZ? use consumer not enterprise disks. Whatever. The upshot is, my pool is highly dedupable so I decided to go dedup and build a server capable of it.

Is your pool really "highly dedupable"?
If we can achieve the same storage savings results with snapshots, why bother with dedup? Your pool might be "highly dedupable", but it is also "highly snapshotable" :smile:
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Is your pool really "highly dedupable"?
If we can achieve the same storage savings results with snapshots, why bother with dedup? Your pool might be "highly dedupable", but it is also "highly snapshotable" :smile:

Snapshots and dedup are two entirely different things.

A snapshot does share blocks between snapshots, as long as they are not rewritten (which will force a CoW reallocation and additional space consumption). If you install the same OS image twice, you have written all those blocks twice, and it takes twice the space a single image would take.

ZFS does allow you to clone, which would then cause a read-write copy of a snapshot to be made. This shares blocks with the original snapshot as long as neither the original nor the clone is written to. Once written, it forces a CoW reallocation and additional space consumption. Even if the rewritten block was identical.

Dedup causes ZFS to store a table of the hash values of blocks. Where a block is written that hashes to the value of an existing block, this block is NOT written to the pool and instead indirected through the dedup table. This means that for the above cases, if you install the same OS image twice, you have written all the blocks one time, and the second attempt results in no write, but rather the results being dereferenced through the dedup table. Only a single copy of the data is stored in the pool.

If we can achieve the same storage savings results with snapshots, why bother with dedup?

Because snapshots can only do their space saving trick in one direction. If you write two OS images out on a system that does snapshots, you use twice the space. As I said, there's a way to do cloning that can mitigate that somewhat, but if you install an OS image, clone it, then boot into both images and do an OS update, you end up with significant growth in disk utilization because many of the blocks are no longer shared (they were rewritten with new contents). Dedup, on the other hand, makes it possible for even the rewritten blocks to be shared. Any write on the dedup'ed pool results in only one copy of any given unique block being stored, even if they are written at completely different times.

This can become significant if you are running hundreds of virtual machine images or something like that. Highly dedupable is not the same thing as highly snapshotable.
 

Jamberry

Contributor
Joined
May 3, 2017
Messages
106
Yeah true, I tought to much about my use case, which is mostly not long living VMs and not much changed after the initial install. Snapshots probably also don't work very well for Windows with lots of updates.
 
Top