Recommendations for ZSTD? Especially when used with dedup.

Stilez · Sep 4, 2020

With ZSTD now in 12-Beta2.1, and being a compression system with a lot more flexibility than lz4 and previous, what is a good resource or ixSystems recommendations, for ZSTD adoption over LZ4, and the ZSTD schema(s) recommended and how/when to use each? or is it all "try them all and see!"?

Also for deduped pools does ZSTD tend to result in as many dedupable blocks as LZ4 because for pools that do legitimately runb dedup and get many multiples of deduplication, the gain of 20% compression is trivial if it loses 0.5x in dedup as a result.

I could replicate my 40TB pool half a dozen times, see what the outcome is (for my pool at least), but maybe someone already has preliminary ideas what is typical?

Any insight and guidance valued.

Ericloewe · Sep 4, 2020

I expect zstd to be pretty much the same as LZ4 when it comes to dedup, since neither salts the data (why would they?). Same block in, same compressed data out.

That comes with one big caveat, though: You can't change the mode, or the compressed data will be different. You could also be bitten by changes to the zstd compressor, conceivably, but that's something of an unknown at this stage. The same would have applied to LZ4, but a static version of the LZ4 code was imported and never really updated, so this was never an issue (compression lost on some improvements, though). The goal with zstd is to update it in the future, so definitely keep this in mind.

More generally, you'll probably want to experiment with how high you can get the compression while maintaining the performance you want. If you want faster than LZ4, go for the negative levels. The rest is typically between gzip and LZ4.

jgreco · Sep 4, 2020

Still waiting for the 4K sized block dedupe and to hell with compression. Some of us have workloads where that would be amazing.

Stilez · Sep 6, 2020

Ericloewe said:
I expect zstd to be pretty much the same as LZ4 when it comes to dedup, since neither salts the data (why would they?)

I thought otherwise? man zpool-features (BETA 2.1) states under edonr and skein hashing/checksum methods:

This implementation also utilizes the new salted checksumming functionality in ZFS, which means that the checksum is pre-seeded with a secret 256-bit random key (stored on the pool) before being fed the data block to be checksummed. Thus the produced checksums are unique to a given pool, preventing hash collision attacks on systems with dedup.

I'd be amazed if that wasn't the case for the others as well - that someone added that functionality for 2 but not the rest. Although it doesn't sday this for SHA512 or ZSTD, I did find https://github.com/openzfs/zfs/pull/10277/files - the OpenZFS ZTSD code, search for "SALT" in it, seems to be salted internally as well.

Stilez · Sep 6, 2020

jgreco said:
Still waiting for the 4K sized block dedupe and to hell with compression. Some of us have workloads where that would be amazing.

Can you elaborate? What do you mean by "4K block sized dedup" that isn't already there in 12-Beta?

Ericloewe · Sep 6, 2020

Stilez said:
I thought otherwise? man zpool-features (BETA 2.1) states under edonr and skein hashing/checksum methods:

This implementation also utilizes the new salted checksumming functionality in ZFS, which means that the checksum is pre-seeded with a secret 256-bit random key (stored on the pool) before being fed the data block to be checksummed. Thus the produced checksums are unique to a given pool, preventing hash collision attacks on systems with dedup.

I'd be amazed if that wasn't the case for the others as well - that someone added that functionality for 2 but not the rest. Although it doesn't sday this for SHA512 or ZSTD, I did find https://github.com/openzfs/zfs/pull/10277/files - the OpenZFS ZTSD code, search for "SALT" in it, seems to be salted internally as well.

Well, you're mixing compression and checksumming. Some (all?) checksums can take a pool-wide salt, but that's not going to make two identical blocks on the same pool have different checksums, they'll just be different between pools. On a tangent, I wonder how that interacts with send/recv...

Fundamentally, compression benefits in no way from adding a salt (add it before and it's extra randomness in your input, add it after and gain nothing of value). So, the overall process would be:

Raw data -> compression -> checksumming -> dedup

Where the checksum's salt is effectively just an implementation detail, because it's a constant. That leaves the compression step, which for a given algorithm implementation with the same settings will yield the same compressed output. Which means that what you need to worry about is changes to the compressor.

Stilez · Sep 6, 2020

Ericloewe said:
Well, you're mixing compression and checksumming. Some (all?) checksums can take a pool-wide salt, but that's not going to make two identical blocks on the same pool have different checksums, they'll just be different between pools. On a tangent, I wonder how that interacts with send/recv...

Fundamentally, compression benefits in no way from adding a salt (add it before and it's extra randomness in your input, add it after and gain nothing of value). So, the overall process would be:

Raw data -> compression -> checksumming -> dedup

Where the checksum's salt is effectively just an implementation detail, because it's a constant. That leaves the compression step, which for a given algorithm implementation with the same settings will yield the same compressed output. Which means that what you need to worry about is changes to the compressor.

If I understand your meaning correctly, what you're saying is roughly:

The compression step has no earthly reason to include a salt, and whatever compressor is used, blocks that are the same will compress to the same and therefore be matched at dedup, and blocks that are different will compress different, and not be matched at dedup;
Therefore, whatever compressor (or none) is used out of LZ4 and ZSTD-*, the deduplication aspect will result in the same profile of deduping;
Therefore the only difference in on-disk speed and size, will result from the compression algorithm chosen, because dedup will find exactly the same profile of dedupable data for all compressors;
Dedup may be faster or slower depending what hash/verify is selected, but that doesnt affect disk space/dedup bucket profile.

Roughly correct? Anything I'm missing?

Ericloewe · Sep 7, 2020

Yeah, that sounds about right. The only thing I'd add is that zstd a year from now may yield different output from zstd now. It may or may not be a concern.

Stilez · Sep 7, 2020

But it'll have to be backwards compatible to read old pools. So presumably the only changes if any will be for data newly written, and even that will stay compatible going forward.

ornias · Sep 7, 2020

About ZSTD and target audience:

Primary usecase for ZSTD would be "Write-not-that-often, Read-Many" with preferably multiple write streams at the same time. As the ZFS compression stacks get best results when multiple streams of data are compressed at the same time (thats a general theme with ZFS, not ZSTD specific, but gets highlighted with ZSTD in comparison to LZ4 because it's slower per-core)

About ZSTD and Dedupe in general:
ZSTD on ZFS has not been thoroughly tested or designed primarily with dedupe in mind. I'm not saying it would be bad or worse than LZ4, just that dedupe effectiveness was not part of the design progress.

About ZSTD and the future:
ZSTD is designed with easy upgradeability in mind, because people didn't do so for LZ4 and that has lead to a situation where we are now dependant on a not-very-performant version of LZ4. Versioning is added on-disk and ZSTD would always be able to backwards-decompress the data. compression however would result in using the updated compression stack and thus, indeed, as @Ericloewe pointed out: blocks writen after an upgrade do not mach the old blocks and thus would not validate for Deduplication

So, in short:
While ZSTD would keep being backwardscompatible new blocks wont be the same after some updates.
(and no, it would not be an option in the future to keep using the older ZSTD version)

Ericloewe · Sep 7, 2020

I'll add that it is perfectly typical for compressors to improve while maintaining compatibility with the same decompression code. To illustrate this, imagine a dictionary-based compression algorithm. If a new management scheme is used to optimize usage of the dictionary (e.g. choosing the most frequent segment instead of whatever comes first), you can reduce the size of your output while not changing a thing on the decompression side of things.

Stilez · Sep 7, 2020

ornias said:
Compression however would result in using the updated compression stack and thus, indeed, as @Ericloewe pointed out: blocks writen after an upgrade do not mach the old blocks and thus would not validate for Deduplication

So, in short:
While ZSTD would keep being backwardscompatible new blocks wont be the same after some updates.
(and no, it would not be an option in the future to keep using the older ZSTD version)

HEY! Ouch!!!

Okay, worst case, pool replication every major version or so. Not the most fatal thing......

Ericloewe · Sep 7, 2020

It's not certain it'll be the case, it's just something to keep track of. If it's something that turns out to be a bigger problem than you can handle with dedup, LZ4 might be better overall, as a side-effect of it being stuck in the past.

Important Announcement for the TrueNAS Community.

Recommendations for ZSTD? Especially when used with dedup.

Stilez

Guru

Ericloewe

Server Wrangler

jgreco

Resident Grinch

Stilez

Guru

Stilez

Guru

Ericloewe

Server Wrangler

Stilez

Guru

Ericloewe

Server Wrangler

Stilez

Guru

ornias

Wizard

Ericloewe

Server Wrangler

Stilez

Guru

Ericloewe

Server Wrangler

Similar threads

Important Announcement for the TrueNAS Community.

Recommendations for ZSTD? Especially when used with dedup.

Guru

Server Wrangler

Resident Grinch

Guru

Guru

Server Wrangler

Guru

Server Wrangler

Guru

Wizard

Server Wrangler

Guru

Server Wrangler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Recommendations for ZSTD? Especially when used with dedup."

Similar threads