Deduplication not working?

onthax

Explorer
Joined
Jan 31, 2012
Messages
81
Have been running truenas for a while at home, but running a test at work with backup sets.

We copied a similar dataset to a windows server with post processing dedup enabled and were getting around 60% dedup, they are monthly copies of the same data for backup purposes so expect the data to be highly deduplicatable.

Fired up a freenas box with an SSD to do a test to see if we could get the same benefits using ZFS inline dudup but are seeing 0 dedup benefit.

Note: this is a test dataset, so no ssd redundancy, data stored elsewhere.

Something i'm doing wrong other than just enabling dedup?

config: NAME STATE READ WRITE CKSUM deduptest ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gptid/e1b4e4c2-77ab-11ec-b5ac-1418776d9ff3 ONLINE 0 0 0 gptid/e1c0f7df-77ab-11ec-b5ac-1418776d9ff3 ONLINE 0 0 0 gptid/e21d3384-77ab-11ec-b5ac-1418776d9ff3 ONLINE 0 0 0 gptid/e1a880ed-77ab-11ec-b5ac-1418776d9ff3 ONLINE 0 0 0 gptid/e1da3026-77ab-11ec-b5ac-1418776d9ff3 ONLINE 0 0 0 gptid/e2053e11-77ab-11ec-b5ac-1418776d9ff3 ONLINE 0 0 0 gptid/e1f912f2-77ab-11ec-b5ac-1418776d9ff3 ONLINE 0 0 0 gptid/e3250fe1-77ab-11ec-b5ac-1418776d9ff3 ONLINE 0 0 0 gptid/e30e5b8b-77ab-11ec-b5ac-1418776d9ff3 ONLINE 0 0 0 gptid/e337469a-77ab-11ec-b5ac-1418776d9ff3 ONLINE 0 0 0 gptid/e36cd0a6-77ab-11ec-b5ac-1418776d9ff3 ONLINE 0 0 0 gptid/e34d095e-77ab-11ec-b5ac-1418776d9ff3 ONLINE 0 0 0 dedup gptid/e18000ce-77ab-11ec-b5ac-1418776d9ff3 ONLINE 0 0 0 errors: No known data errors dedup: DDT entries 457192758, size 516B on disk, 152B in core bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 1 436M 54.5T 54.5T 54.5T 436M 54.5T 54.5T 54.5T 2 191K 23.9G 23.9G 23.8G 395K 49.4G 49.4G 49.4G 4 80 10M 9.64M 9.66M 330 41.2M 39.2M 39.3M 8 4 512K 32K 64.0K 40 5M 320K 640K Total 436M 54.5T 54.5T 54.5T 436M 54.5T 54.5T 54.5T root@truenas[~]# zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT boot-pool 262G 1.20G 261G - - 0% 0% 1.00x ONLINE - deduptest 87.5T 71.9T 15.6T - - 9% 82% 1.00x ONLINE /mnt
Screenshot 2022-01-28 085909.png


something i'm doing wrong here?

Screenshot 2022-01-28 090114.png
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
How big is the memory on the system?

You've rewritten the data to the pool while dedup is enabled? Dedup will not go and deduplicate existing data on a pool.
 

onthax

Explorer
Joined
Jan 31, 2012
Messages
81
16GB Ram (old test box lying around)
Dedup Table is on SSD (Test, so not caring about the speed)

Yep, enabled dedup at pool creation, all data copied after this time.
Screenshot 2022-01-28 091927.png
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
16GB Ram (old test box lying around)

Ah, yeah, I don't think that's going to work.

Dedup requires perhaps 5GB of RAM per TB of stored data. You have 71.9T, so you should probably have ~384GB RAM.

dedup: DDT entries 457192758, size 516B on disk, 152B in core

And this is confirmed here, because if you multiply 500 million entries by 152 bytes in core, that works out 76GB *JUST* for DDT entries, and that doesn't mean you can do this on a 72GB or 96GB host. You really need a reasonably-sized ZFS host to begin with. ZFS wants to see 1GB per TB of disk. This is potentially a "soft" target in some cases, but I don't think it is here. You need 72GB for base RAM and then just the acknowledged overhead of 76GB more, so that's like 160GB of RAM, which is sort of the bare minimum I would think might let this possibly work. You're off by an order of magnitude.

I think.

Having the DDT on disk does reduce the memory requirements, which is why you MIGHT not need the 384GB that conventional dedup sizing says you need, but that doesn't mean you can do this on a 16GB system.
 

onthax

Explorer
Joined
Jan 31, 2012
Messages
81
I thought the DDT requirements were pretty much waived when using the SSD as the DDT would be stored on the SSD rather than RAM, eliminating the RAM requirement.


No issues with the filesystem mounting or any of the issues i would expect to encounter if i didn't have enough ram/space.

Odd that it is silently failing to dedup rather than actually failing somewhere and logging it (or i haven't found it yet)

no messages in the logs around this either.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
No issues with the filesystem mounting or any of the issues i would expect to encounter if i didn't have enough ram/space.

What issues would you expect to encounter if you didn't have enough RAM? Alarm bells? Fire? :smile:

See, I've been talking about ZFS memory requirements for years, and one of the things that people sometimes challenge me with is stuff like this, or that there's no actual hard and fast limiting mechanism. The main effect of not having enough RAM is that performance falls off a cliff. So I am used to having these surreal discussions with people who think that they are magically exempt from ZFS memory sizing guidelines, sometimes because "the GUI didn't stop me" or "some blog post suggested it was OK".

Years ago, we actually saw newbies coming in with 4GB RAM systems, fatal panics, and trashed ZFS pools. To this day, we don't know the root cause of that, but it was that which caused me to place an "8GB RAM" minimum memory requirement in the manual. But everyone agrees that such a panic SHOULD not happen, and that pool corruption should NEVER happen.

That doesn't mean that ZFS is okay with dramatically undersized RAM. In a non-dedupe environment, I would hazard a guess you and your 72TB backups pool might be able to get away with a 32GB system, but going much under that, your performance is likely to degrade over time as ZFS struggles to cache metadata as the free space fragments and it becomes harder to allocate space.

To the best of my knowledge, dedup special allocation classes don't actually eliminate the need for RAM, they just reduce it somewhat. This is great because the old requirement for the DDT to be held entirely in RAM was resulting in incredibly onerous sizing.

You might want to check in with @Stilez who has been keeping closer tabs on developments in dedup-land, and has written about it in some detail.
 

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
I thought the DDT requirements were pretty much waived when using the SSD as the DDT would be stored on the SSD rather than RAM, eliminating the RAM requirement.

Not so, sorry.

The DDT is stored in the pool, like all metadata. But like all metadata, it needs to be hauled into RAM before it can be used. From my experience, which I've documented on this forum, my pool is about 31 TB capacity and 50% full. And dedup means it's hitting something like half a million 4k I/O requests per second, sustained for tens of minutes, as it loads needed DDT data into RAM (ARC) or writes out updated DDT. That's just the DDT read load from the pool, required to perform a client to server write of single 30GB+ files across Samba, with nothing else at all happening. Where exactly do you imagine that kind of data stream and data load can go, in the system you describe? It can't, and that's the problem.

The guide and forum here are hard-core on DDT hardware demands and compromises. It will demand a lot of RAM. It'll also demand a lot of your disks and 4K mixed I/O (which is the hardest load the disk system has, really). If you solve that, it'll quite possibly starve your CPU because it needs to hash all that data real-time, too, for dedup checking.

The deal isn't free extra space and a little extra cost or RAM. The deal is, you take your hardware to a whole new level of cost and capability, and you get dedup with only modest slowdown of operations and limitations, but at least functional.

To give an idea, my server was built specifically to be optimised for dedup with about 20TB deduped data. It's got about 12 10TB HDDs for the pool (4 x 3 way mirror), mirrored 480GB optane 905s for metadata/DDT, and same again for SLOG/ZIL, about 256 GB of RAM with about 96GB resereved just for DDT/metadata, and an 8 core Xeon E5, Chelsio T5 cards with serious offloading of network tasks, and also isnt running any jails or VMs, just Samba and SSH on a 10G LAN. Guess what all of that expense gets me? Just 200-300 MB/s on Samba, and regular CPU starvation during scrub or other operations, so I have to pause those when I need Samba responsive. Because with all that, it's still not truly capable enough for dedup. Perhaps I ought to upgrade to a 40 core CPU - only Alex Motin, a ZFS developer, tried that a while back and reported it wasn't enough to solve the issue either. (CPU starvation because of hashing demand is only an issue once other bottlenecks are fixed, so it wasn't likely, historically, until modern NVMe SSDs arrived.) ZFS needed extra CPU-based throttle options, which have now been added but are still less than perfect and may yet need refinement. Maybe that puts in context just how demanding it is, to get good dedup. It's still cheaper than the power, disks, and controllers for doing the same non-dedup'ed though, which is why I do it.

Sorry to be hard on you, I don't mean to. But it really is that way.
 
Last edited:
Top