Help upgrade for deduplication (keeping chassis)

Davide Zanon

Dabbler
Joined
Jan 25, 2017
Messages
44
Hi everybody,
sorry for the long and confusing post but I'm trying to explain my needs as best as I can.
I need to upgrade the mobo/ram/cpu of my backup server which is an old Netgear Readydata 5200 with
TrueNAS-13.0-U3.1 installed and I want to/must keep the original chassis for these reasons mainly:
It Just Works (so no need to spend other money)
it has two 700W hot-swappable redundant PSUs
it has 12x3.5" HDD bays (already filled up)
The reason I want to upgrade is because of deduplication, the system is always starving for resources and it often
crawls to a halt, forcing me to hard reset which is not good for the disks.
The system right now has "only" 32gb of RAM and that's the max supported RAM of this motherboard, but if I've
read ZFS Deduplication correctly this is not enough because when running "zpool status -D" I get this numbers
dedup: DDT entries 101436443, size 611B on disk, 366B in core
which tells me that it needs at least 36gb of ram right now to manage the DDT table (366*101436443, correct?).
This system specs are:
(2U chassis)
Supermicro X8SI6-F motherboard with 2 SAS ports (one unused), 2x1GB eth, IPMI controller
Intel Xeon X3450 with 8 core @ 2.67GHz
32gb RAM DDR3 non-ECC
RAIDZ1 pool of 12x3TB spinning disks, 36TB total (connected to the SAS controller)
I would like to upgrade to something like this without spending too much:
(keep the 2U chassis)
motherboard with 2 SAS connectors, nvme slot(s), 2x10GB ethernet, IPMI controller
Intel Xeon Silver/Gold CPU with at least 8C/16T
64gb DDR4 ECC RAM (should be enough for my pool... right?)
2 SSDs/NVMEs in mirror as special vdev for storing DDT table
keep the pool/disks because I don't need more space nor speed (with SSDs for example)
The PSU has one 24-pin ATX power cable and one 8-pin CPU power cable in use, and one
4-pin spare cable, so I need to keep this in mind when looking for a new mobo/CPU.
So far this is what I'm oriented on buying that should fit in the chassis:
Motherboard: Supermicro X11SPH-nCTPF, it has everything I listed above with the right power connectors
CPU: Intel Xeon Silver 4208, it has 8 core/16 thread @2.10Ghz, I hpe it is sufficient for the foreseeable future
RAM: 4x16gb@2400Mhz (no particular brand), I wouldn't buy something with higher clock if it's not strictly necessary
Now as for storing DDT table on mirrored SSDs/NVMEs:
is it a good solution to speed up deduplication and data access given that I have spinning disks on my pool?
The new mobo has SATA connectors like the old one but I don't have extra space in the front bay for installing
other disks (they would be connected to the SAS controller anyway).
I was thinking at some kind of adapter to install 2.5" disks in the chassis PCI slots (not a SATA2PCIe adapter)
but I can't find one, or I don't know how to properly search for it.
NVMEs are another problem, the new motherboard has only one M2 slot, but there are 2 Oculink connectors that
I don't really know how to use, I tried looking for some Oculink2nvme adapter to install on the PCI chassis slots
but I can't find anything useful, can someone please help me with this too?
Other suggestions for speeding up the system, like ZIL/SLOG/L2ARC?

I really need deduplication because I can keep way more data than simply enabling compression, I already made
a test on another system with only compression enabled (without dedup) and it gets filled up in a few days of
backup instead of months with dedup on the production system (my current dedup ratio is always over 2.66x).
Please kindly help, any suggestions are welcomed, tel me if you need more infos or clarifications.

Thanks
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Sorry I can't help with the de-duplication part, as that is a less used feature around here, and pretty complicated.

However, I run with both compression and snapshots for my backups. By using snapshots I have pseudo de-duplication because my clients use RSync. If the file to be backed up has not changed, a new copy is not written to the NAS. Thus, I can have dozens of snapshots and "de-duplicated" backups without the complexity of actual ZFS De-Duplication. This gives me the ability to go far back in time for backups, if I need to restore a file from a year or more ago. And yet not take up space for each backup session.

This scheme is not perfect. I backup 3 Linux clients, desktop, laptop and media server. But, I use different paths for each client, so their is no pseudo de-duplication between those 3 backups, (which might have many of the same Linux OS files).
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
468
For DDT you want to go for Optanes - try searching for DDT. There was a quite good testing+writeup when the sVDEV feature was introduced.

Not much else than Optanes can perform at the required iops, and not burn out fast.
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
468

dxun

Explorer
Joined
Jan 24, 2016
Messages
52

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
This - mighty Optanes are one of the few drives that will host dedup tables with low latencies and almost no wear. There seems to be a "fire sale" of Optanes going on right now. I am seeing 1 TB 905p drives going for 350-400 USDs - with a pair of these (and at the very least 32 GB RAM), dedup becomes a viable option.
Good for those who can snap some of these drives.
Mind that a "special vdev" is pool-critical and requires redundancy (2-way or preferably 3-way mirror) while hosting the DDT on a persistant L2ARC requires no redundancy but still requires enough RAM to manage the L2ARC (typically L2ARC = 5-10x RAM; it's not clear whether a metadata-only L2ARC changes the calculation).
Deduplication is still a tricky feature, which is perhaps best avoided.
 

dxun

Explorer
Joined
Jan 24, 2016
Messages
52
Agreed - even with Optane and its exceptional endurance, I don't think I'd be brave enough to run a dedup vdev without a 2-way mirror (3-way mirror for Optanes might be a bit too much, IMO).

Aside from the necessary RAM, I'd also mention CPU cost as basically each written block has to have its hash computed and compared against the dedup table. A single hash itself is not an expensive computation but at the scale this happens with dedup, it might be and it might crush lesser CPUs - as a bare minimum, I wouldn't run dedup on anything that's not high-speed 4-core part with hyperthreading.
 

Davide Zanon

Dabbler
Joined
Jan 25, 2017
Messages
44
First of all, a big thanks for all your replies!
Mind that this is a backup system, it will work mainly for nightly backup tasks so during the day is mostly idle.
My biggest and heaviest backup task is the fileserver which is a Windows Server with 1TB of data that does an
incremental backup every day and a full backup every week, the other systems are linux machines that send
compressed files of various contents (DBs, directories, etc).
Now a question comes to mind: does dedup work with compressed files too if the content doesn't change much?
Right now I have a big pool with dedup enabled on all datasets, but if compressed files don't take advantage
from dedup maybe I should enable it only on those datasets that really need it and disable it on the others,
so to ease the system a bit from the heavy load.
About vDEVS/Optane: how much storage do I need for storing DDT? Should a 256GB Optane be sufficient?
And I still have to figure out how Oculink works, any hints about it?

Thanks
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Dedup works on block level. If the raw block content does not change, dedup dedups; if the block content changes, the changed block is stored. Regardless whether the block belongs to a compressed file or not. (With ZFS compression on, everything is compressed anyway!).

The guideline is 5 GB of DDT per TB of deduped data.

"Oculink" is just a connector, there's no magic involved, just ascertain which protocol your particular connector carries (PCIe lanes? SATA lanes? either by BIOS switch?) and find the right cable/adapter for your application (and the SFF-8000-something nomenclature along the way).
For NVMe drives, the easiest way is probably to go for U.2 drives, and thus look for a "SFF-8611" (OCuLink) to "SFF-8639" (U.2) cable.
 

Davide Zanon

Dabbler
Joined
Jan 25, 2017
Messages
44
The guideline is 5 GB of DDT per TB of deduped data.
So just to be extra sure: I have 32TB of pool, right now dedup ratio is 3.25, I would need a 5GBx32T DDT storage?
or, since you said "per TB of deduped data", a 5GBx(32Tx3.25ratio) DDT storage? or you mean only used space?

Thanks
 
Last edited:

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
I would really like to stress the idea of handling duplicates from backup jobs by other means than ZFS deduplication.

To illustrate my point let's take network connection reliability as a scenario. Traditionally, this is done by TCP (OSI layer 4). While this has the advantage of being a generic mechanism (like ZFS deduplication), it also has the disadvantage of being a generic mechanism; like most things in life there are trade-offs involved. Because of certain limitations of TCP, a number of applications, especially those that are sensitive about latency (e.g. cluster replication, high-frequency trading), have implemented their own reliability mechanisms (in OSI layer 7) on top of UDP.

Similarly, it is really worth looking at the backups in more detail. ZFS snapshots have already been thrown into the discussion.

How are the backups done at the moment? What software? What change rate? The more details the better we can help.
 

Davide Zanon

Dabbler
Joined
Jan 25, 2017
Messages
44
Windows server backup is done through a software that sends differential/full backups via SMB share,
linux servers send backups simply via scp, they're not on ZFS right now.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Linux does quite well with RSync, at least for me. There are 2 methods to transfer data with RSync, via SSH or RSync protocol.

SSH has the advantages of:
  • Authentication by user, (and potentially with keys)
  • Encrypting the network traffic
  • Access on the destination is limited to what the user can access
The RSync protocol has the advantages of:
  • Faster access because the network traffic is not encrypted, (assumed to be LAN not WAN)
  • RSync can list specific places on destination that can be accessed, (either R/O or R/W)
  • RSync can limit access to specific client servers
I think MS-Windows has a RSync client program.


The advantage of RSync with ZFS snapshots is that it is always a "full" backup, but only differential data is transferred and written. Each snapshot taken after a RSync backup would describe the client server at the time of the backup.
 

dxun

Explorer
Joined
Jan 24, 2016
Messages
52
So just to be extra sure: I have 32TB of pool, right now dedup ratio is 3.25, I would need a 5GBx32T DDT storage?
or, since you said "per TB of deduped data", a 5GBx(32Tx3.25ratio) DDT storage? or you mean only used space?

Thanks

What @Etorix had provided is just a guidance - a decent starting point carved out by empyrical reports over the years. It is by no means anything more and you should treat it as just that. These questions have been asked over the years many times and I would encourage you to search through the forums for "dedup sizing" (and similar) search terms - the answers you will find will give you a much more in-depth knowledge than this thread is likely to offer.

Here are a few posts that might help:
  • @Stilez 's excellent posts/writeups on dedup:
    • writeup on his use case that seems similar to yours - esentially how to build a home server capable of fast and consistent deduplication on a data set of size comparable to yours
    • writeup on Optane and DDTs
  • a good thread on superuser.com that explains a bit on how that "5 GB for 1 TB of data" came to be
  • with zdb -U /data/zfs/zpool.cache -S YourPoolNameHere you can run a dedup simulation on your pool and get estimated savings (taken from @HoneyBadger 's post on dedup)
This should be a good starting point for future discussions.

One thing to note is that there isn't deliberate coyness with answers - unfortunately, the only true way to understand the impact of deduplication on a system is to actually do it and observe. The posts and advice here can give you an idea on how to size the system before you start but ultimately, the performance and actual savings will be greatly determined by the shape of data itself. Some data sets will yield excellent deduplication, others will be impervious to deduplication.

If there is any concrete suggestion I could make, based on the system descriptions and the dataset sizes you have explained in your starting post, it looks to me you have two major bottlenecks: RAM amount and the disks where the DDT is stored:
  • hosting the DDT in a special vdev on Optane is going to be your best bet at having it run performantly
    • it looks like the DDT is going to be around 160 GB (5 [GB] x 32 [TB of data] = 160 GB)
      • but this will only be your starting point requirement. At what speed will the data set grow? Is there an upper limit?
  • 64 GB looks like a minimum point for discussing of deduping a 32 TB pool, assuming
    • DDT is hosted on Optane
    • machine is being used for backup only - no other purposes or users
  • unless you don't care too much for the loss of data, a mirrored DDT special vdev is a must
    • then again, this is a backup machine, so one Optane might be ok?
  • I wouldn't dream of hosting my data on non-ECC RAM
    • that goes even more so for something that relies so heavily on correct checksums/hashes, such as deduplication process

Other suggestions for speeding up the system, like ZIL/SLOG/L2ARC?

Some users on this forum may take serious umbrage with this sentence :) I would suggest reading this fantastic writeup by @jgreco multiple times, to consolidate understanding of what ZIL is and when an SLOG would be used. Do not mistake the SLOG/L2ARC as "caches" - they are not that.

In short, defining an SLOG and/or L2ARC at best will do nothing to improve the performance of deduplication. Likely, the latter will further hobble it by choking off the remaining RAM for its purposes.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
One thing that is generally not clear, ZFS De-Duplication is an Advanced feature. The average user should not be using it, and can probably get in to trouble using dedup. Setup, maintenance, upgrade and usage all has specific requirements that can be different from normal ZFS & TrueNAS servers.

For example, anyone using De-Dup should probably not be using any VMs, jails, dockers, apps, etc... on the same server. People might say, but with the server idle 80% of the time, I want to share it's resources. Except it is that 20% you NEED all the resources, (aka memory mostly, but some CPU too).

This is not to say TrueNAS should restrict ZFS De-Duplication. Or that the original poster is using it wrong. This note is more for others finding this thread.
 

dxun

Explorer
Joined
Jan 24, 2016
Messages
52
Except it is that 20% you NEED all the resources, (aka memory mostly, but some CPU too).

This is the salient observation - people often forget to consider what happens in exceptional circumstances, such as pool scrubbing and/or resilvering. Intuitively, deduplication is going to put an exponentially larger burden on the system in these conditions and if the platform is not sized appropriately, pain and disappointment will ensue.

A note for other finding this thread as well - be sure to consider, plan and test out your system for these conditions as the system might grind to a halt at the very moment when you need it the most.
 

Davide Zanon

Dabbler
Joined
Jan 25, 2017
Messages
44
Happy New Year folks!
Thanks for all your replies and very useful suggestions, I have a lot to read now to better undestand things up.
Alas, I know very well that I need dedup for my environment and I'm aware of the burden involed, so taking that
out is out of question for me, but what this post is really about is hardware upgrade good enough to sustain my
need and as I suspected it is mostly about RAM and NVME like @dxun pointed out, and while I know there's
always room for improvement I'll concentrate on those two points right now, the CPU I linked should be enough.
So RAM: The pool will not expand more than its actual size, in edge cases it will reach max 75-77% of used space,
I'm not planning to use any vm nor jail right now aside for a UrBackup jail that I'm testing on another test system
but that shouldn't take up much ram/cpu since basically it only copies things via network to the NAS, so I'll start
with 64GB of ECC RAM and if need be I'll upgrade it later, only time/testing it will tell.
NVME: It seems I don't have much choice but to buy Optanes for storing DDT so I'll stick with those drives,
I'll have to figure out how to install them inside the chassis because I don't have free slots in the front bays.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
NVME: It seems I don't have much choice but to buy Optanes for storing DDT so I'll stick with those drives,
I'll have to figure out how to install them inside the chassis because I don't have free slots in the front bays.
I wouldn't expect a NVMe drive to go in a 3.5" bay anyway.
The easy and safe option is to have one drive (Optane would be best, but a regular SSD could do) to host the DDT as persistent metadata L2ARC. This will install in a M.2 or PCIe slot.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
Alas, I know very well that I need dedup for my environment [..]
At the risk of sounding impolite, with the information provided there is a good chance that snapshots would be an equally suitable solution. But it is of course your decision whether or not you want to pursue this approach.
 
Last edited:
Top