My experiments in building a home server capable of handling fast + consistent deduplication

Stilez · Jul 12, 2020

Stilez submitted a new resource:

My experiments in building a home server capable of handling fast + consistent deduplication

AIM:

To help people looking at deduplication on TrueNAS 12+, what I've found on the way making it work on mine.

On sustained mixed loads, such as 50GB+ file copies and multiple transfers, using TrueNAS 12 with a deduped pool and default config, I now get almost totally consistent and reliable ~300 MB/sec client-server, server-client, and server-server, right now, and couldn't get close to that before, on 11.3 and earlier.

BACKGROUND.... WHY WOULD I WANT DEDUP ANYWAY...

Read more about this resource...

morganL · Aug 1, 2020

Looking forward to reading the next installments.. don't leave us hanging.

Stilez · Aug 1, 2020

Its moving the pool over the next few days, while I rebuild my workstation. Upcoming stuff I want to try, then write up......

Tunables that help performance
Testing out iSCSI as an alternative to a local disk
Seeing how snapshot deletion speed compares.

I spent years trying to get good performance, with dedup. Now I can, I want to put what I've learned about doing it, up there.

But its got a bunch of replication and disk resilvering to do before then.

Omens are good. A mass recursive delete of 957 snapshots, across 17 datasets (16269 snaps total) just took 48 minutes. Thats a dedup snap destroy rate down from tens of seconds per snap, before special vdev+Optane, to 1/6 of a second each after moving to 12-BETA and adding them.

Trexx · Apr 27, 2021

It would be helpful to see a before/after of your hardware configuration (DDT, l2Arc, etc.) and tuning parameters.

AlexGG · Apr 27, 2021

I'm pretty sure the reads do not require dedup table lookup. Writes require a lookup and a modification, yes, but I don't think reads do.

Chris Moore · Jan 12, 2023

This post was written a little while back and I was curious if there was any update to it based on new software release.

Jamberry · Jan 19, 2023

I am sure this is very interesting information for dedup users. For home users you skip over the most important part pretty quickly in my opinion:

I'm sure the detail can have holes picked in it. Why don't I do backups differently, or incrementally. User RAIDZ? use consumer not enterprise disks. Whatever. The upshot is, my pool is highly dedupable so I decided to go dedup and build a server capable of it.

Is your pool really "highly dedupable"?
If we can achieve the same storage savings results with snapshots, why bother with dedup? Your pool might be "highly dedupable", but it is also "highly snapshotable"

jgreco · Feb 5, 2023

Jamberry said:
Is your pool really "highly dedupable"?
If we can achieve the same storage savings results with snapshots, why bother with dedup? Your pool might be "highly dedupable", but it is also "highly snapshotable"

Snapshots and dedup are two entirely different things.

A snapshot does share blocks between snapshots, as long as they are not rewritten (which will force a CoW reallocation and additional space consumption). If you install the same OS image twice, you have written all those blocks twice, and it takes twice the space a single image would take.

ZFS does allow you to clone, which would then cause a read-write copy of a snapshot to be made. This shares blocks with the original snapshot as long as neither the original nor the clone is written to. Once written, it forces a CoW reallocation and additional space consumption. Even if the rewritten block was identical.

Dedup causes ZFS to store a table of the hash values of blocks. Where a block is written that hashes to the value of an existing block, this block is NOT written to the pool and instead indirected through the dedup table. This means that for the above cases, if you install the same OS image twice, you have written all the blocks one time, and the second attempt results in no write, but rather the results being dereferenced through the dedup table. Only a single copy of the data is stored in the pool.

Jamberry said:
If we can achieve the same storage savings results with snapshots, why bother with dedup?

Because snapshots can only do their space saving trick in one direction. If you write two OS images out on a system that does snapshots, you use twice the space. As I said, there's a way to do cloning that can mitigate that somewhat, but if you install an OS image, clone it, then boot into both images and do an OS update, you end up with significant growth in disk utilization because many of the blocks are no longer shared (they were rewritten with new contents). Dedup, on the other hand, makes it possible for even the rewritten blocks to be shared. Any write on the dedup'ed pool results in only one copy of any given unique block being stored, even if they are written at completely different times.

This can become significant if you are running hundreds of virtual machine images or something like that. Highly dedupable is not the same thing as highly snapshotable.

Jamberry · Feb 14, 2023

Yeah true, I tought to much about my use case, which is mostly not long living VMs and not much changed after the initial install. Snapshots probably also don't work very well for Windows with lots of updates.

Important Announcement for the TrueNAS Community.

My experiments in building a home server capable of handling fast + consistent deduplication

Stilez

Guru

morganL

Captain Morgan

Stilez

Guru

Trexx

Dabbler

AlexGG

Contributor

Chris Moore

Hall of Famer

Jamberry

Contributor

jgreco

Resident Grinch

Jamberry

Contributor

Similar threads

Important Announcement for the TrueNAS Community.

My experiments in building a home server capable of handling fast + consistent deduplication

Guru

Captain Morgan

Guru

Dabbler

Contributor

Hall of Famer

Contributor

Resident Grinch

Contributor

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "My experiments in building a home server capable of handling fast + consistent deduplication"

Similar threads