Deduplication not working?

onthax · Jan 27, 2022

Have been running truenas for a while at home, but running a test at work with backup sets.

We copied a similar dataset to a windows server with post processing dedup enabled and were getting around 60% dedup, they are monthly copies of the same data for backup purposes so expect the data to be highly deduplicatable.

Fired up a freenas box with an SSD to do a test to see if we could get the same benefits using ZFS inline dudup but are seeing 0 dedup benefit.

Note: this is a test dataset, so no ssd redundancy, data stored elsewhere.

Something i'm doing wrong other than just enabling dedup?


config:

        NAME                                            STATE     READ WRITE CKSUM
        deduptest                                       ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/e1b4e4c2-77ab-11ec-b5ac-1418776d9ff3  ONLINE       0     0     0
            gptid/e1c0f7df-77ab-11ec-b5ac-1418776d9ff3  ONLINE       0     0     0
            gptid/e21d3384-77ab-11ec-b5ac-1418776d9ff3  ONLINE       0     0     0
            gptid/e1a880ed-77ab-11ec-b5ac-1418776d9ff3  ONLINE       0     0     0
            gptid/e1da3026-77ab-11ec-b5ac-1418776d9ff3  ONLINE       0     0     0
            gptid/e2053e11-77ab-11ec-b5ac-1418776d9ff3  ONLINE       0     0     0
            gptid/e1f912f2-77ab-11ec-b5ac-1418776d9ff3  ONLINE       0     0     0
            gptid/e3250fe1-77ab-11ec-b5ac-1418776d9ff3  ONLINE       0     0     0
            gptid/e30e5b8b-77ab-11ec-b5ac-1418776d9ff3  ONLINE       0     0     0
            gptid/e337469a-77ab-11ec-b5ac-1418776d9ff3  ONLINE       0     0     0
            gptid/e36cd0a6-77ab-11ec-b5ac-1418776d9ff3  ONLINE       0     0     0
            gptid/e34d095e-77ab-11ec-b5ac-1418776d9ff3  ONLINE       0     0     0
        dedup
          gptid/e18000ce-77ab-11ec-b5ac-1418776d9ff3    ONLINE       0     0     0

errors: No known data errors

 dedup: DDT entries 457192758, size 516B on disk, 152B in core

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1     436M   54.5T   54.5T   54.5T     436M   54.5T   54.5T   54.5T
     2     191K   23.9G   23.9G   23.8G     395K   49.4G   49.4G   49.4G
     4       80     10M   9.64M   9.66M      330   41.2M   39.2M   39.3M
     8        4    512K     32K   64.0K       40      5M    320K    640K
 Total     436M   54.5T   54.5T   54.5T     436M   54.5T   54.5T   54.5T

root@truenas[~]# zpool list
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
boot-pool   262G  1.20G   261G        -         -     0%     0%  1.00x    ONLINE  -
deduptest  87.5T  71.9T  15.6T        -         -     9%    82%  1.00x    ONLINE  /mnt

something i'm doing wrong here?

jgreco · Jan 27, 2022

How big is the memory on the system?

You've rewritten the data to the pool while dedup is enabled? Dedup will not go and deduplicate existing data on a pool.

onthax · Jan 27, 2022

16GB Ram (old test box lying around)
Dedup Table is on SSD (Test, so not caring about the speed)

Yep, enabled dedup at pool creation, all data copied after this time.

jgreco · Jan 27, 2022

onthax said:
16GB Ram (old test box lying around)

Ah, yeah, I don't think that's going to work.

Dedup requires perhaps 5GB of RAM per TB of stored data. You have 71.9T, so you should probably have ~384GB RAM.

onthax said:
dedup: DDT entries 457192758, size 516B on disk, 152B in core

And this is confirmed here, because if you multiply 500 million entries by 152 bytes in core, that works out 76GB *JUST* for DDT entries, and that doesn't mean you can do this on a 72GB or 96GB host. You really need a reasonably-sized ZFS host to begin with. ZFS wants to see 1GB per TB of disk. This is potentially a "soft" target in some cases, but I don't think it is here. You need 72GB for base RAM and then just the acknowledged overhead of 76GB more, so that's like 160GB of RAM, which is sort of the bare minimum I would think might let this possibly work. You're off by an order of magnitude.

I think.

Having the DDT on disk does reduce the memory requirements, which is why you MIGHT not need the 384GB that conventional dedup sizing says you need, but that doesn't mean you can do this on a 16GB system.

onthax · Jan 27, 2022

I thought the DDT requirements were pretty much waived when using the SSD as the DDT would be stored on the SSD rather than RAM, eliminating the RAM requirement.

No issues with the filesystem mounting or any of the issues i would expect to encounter if i didn't have enough ram/space.

Odd that it is silently failing to dedup rather than actually failing somewhere and logging it (or i haven't found it yet)

no messages in the logs around this either.

jgreco · Jan 28, 2022

onthax said:
No issues with the filesystem mounting or any of the issues i would expect to encounter if i didn't have enough ram/space.

What issues would you expect to encounter if you didn't have enough RAM? Alarm bells? Fire?

See, I've been talking about ZFS memory requirements for years, and one of the things that people sometimes challenge me with is stuff like this, or that there's no actual hard and fast limiting mechanism. The main effect of not having enough RAM is that performance falls off a cliff. So I am used to having these surreal discussions with people who think that they are magically exempt from ZFS memory sizing guidelines, sometimes because "the GUI didn't stop me" or "some blog post suggested it was OK".

Years ago, we actually saw newbies coming in with 4GB RAM systems, fatal panics, and trashed ZFS pools. To this day, we don't know the root cause of that, but it was that which caused me to place an "8GB RAM" minimum memory requirement in the manual. But everyone agrees that such a panic SHOULD not happen, and that pool corruption should NEVER happen.

That doesn't mean that ZFS is okay with dramatically undersized RAM. In a non-dedupe environment, I would hazard a guess you and your 72TB backups pool might be able to get away with a 32GB system, but going much under that, your performance is likely to degrade over time as ZFS struggles to cache metadata as the free space fragments and it becomes harder to allocate space.

To the best of my knowledge, dedup special allocation classes don't actually eliminate the need for RAM, they just reduce it somewhat. This is great because the old requirement for the DDT to be held entirely in RAM was resulting in incredibly onerous sizing.

You might want to check in with @Stilez who has been keeping closer tabs on developments in dedup-land, and has written about it in some detail.

Stilez · Feb 1, 2022

onthax said:
I thought the DDT requirements were pretty much waived when using the SSD as the DDT would be stored on the SSD rather than RAM, eliminating the RAM requirement.

Not so, sorry.

The DDT is stored in the pool, like all metadata. But like all metadata, it needs to be hauled into RAM before it can be used. From my experience, which I've documented on this forum, my pool is about 31 TB capacity and 50% full. And dedup means it's hitting something like half a million 4k I/O requests per second, sustained for tens of minutes, as it loads needed DDT data into RAM (ARC) or writes out updated DDT. That's just the DDT read load from the pool, required to perform a client to server write of single 30GB+ files across Samba, with nothing else at all happening. Where exactly do you imagine that kind of data stream and data load can go, in the system you describe? It can't, and that's the problem.

The guide and forum here are hard-core on DDT hardware demands and compromises. It will demand a lot of RAM. It'll also demand a lot of your disks and 4K mixed I/O (which is the hardest load the disk system has, really). If you solve that, it'll quite possibly starve your CPU because it needs to hash all that data real-time, too, for dedup checking.

The deal isn't free extra space and a little extra cost or RAM. The deal is, you take your hardware to a whole new level of cost and capability, and you get dedup with only modest slowdown of operations and limitations, but at least functional.

To give an idea, my server was built specifically to be optimised for dedup with about 20TB deduped data. It's got about 12 10TB HDDs for the pool (4 x 3 way mirror), mirrored 480GB optane 905s for metadata/DDT, and same again for SLOG/ZIL, about 256 GB of RAM with about 96GB resereved just for DDT/metadata, and an 8 core Xeon E5, Chelsio T5 cards with serious offloading of network tasks, and also isnt running any jails or VMs, just Samba and SSH on a 10G LAN. Guess what all of that expense gets me? Just 200-300 MB/s on Samba, and regular CPU starvation during scrub or other operations, so I have to pause those when I need Samba responsive. Because with all that, it's still not truly capable enough for dedup. Perhaps I ought to upgrade to a 40 core CPU - only Alex Motin, a ZFS developer, tried that a while back and reported it wasn't enough to solve the issue either. (CPU starvation because of hashing demand is only an issue once other bottlenecks are fixed, so it wasn't likely, historically, until modern NVMe SSDs arrived.) ZFS needed extra CPU-based throttle options, which have now been added but are still less than perfect and may yet need refinement. Maybe that puts in context just how demanding it is, to get good dedup. It's still cheaper than the power, disks, and controllers for doing the same non-dedup'ed though, which is why I do it.

Sorry to be hard on you, I don't mean to. But it really is that way.

Important Announcement for the TrueNAS Community.

Deduplication not working?

onthax

Explorer

jgreco

Resident Grinch

onthax

Explorer

jgreco

Resident Grinch

onthax

Explorer

jgreco

Resident Grinch

Stilez

Guru

Similar threads

Important Announcement for the TrueNAS Community.

Deduplication not working?

onthax

Explorer

jgreco

Resident Grinch

onthax

Explorer

jgreco

Resident Grinch

onthax

Explorer

jgreco

Resident Grinch

Stilez

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Deduplication not working?"

Similar threads