Checksum Algorithms

Status
Not open for further replies.

GTAXL

Cadet
Joined
Jan 8, 2018
Messages
5
Hello, I've heard that the default checksum algo is fletcher4 which I've heard is bloody fast but not "cryptographically secure" as the other options and can be prone to a hash collision. While the chance of that may be extremely rare, for some files I want that to be zero. Several others have always said to set the algo to SHA256 but I'm hearing that SHA512 is more secure/reliable and is much faster than SHA256, and now I'm hearing the new skein is even faster and just as secure. Is there like a matrix grid somewhere that'll show the different algos on a chart for speed/throughput and secure/reliable? Like if SHA512 or skein is faster than SHA256 than how much faster is it from fletcher4? What performance hit would I see from skein or the SHA variants on my system? Specs in sig. Does a CPU having AES-NI help?

Also can algos be set dataset wide or is it pool wide only? If so that'd be great them my most critical documents can get a solid algo whilst stuff like videos can get fletcher4..

Thanks!
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
The hashes and collisions don't matter unless there being used for things that transform what's written to disk. Think dedupe. With you tiny server, you should worry more about fire/flood/the apocalypse than a hash collision with random bit rot. I mean, it's more likely that you win the lottery and get hit by lightning giving you super powers, then getting hit by lightning again from a different storm that take your powers away. The big time most people want sha256/512 is when there running deduplication on large amounts of data. Like 100s of terabytes.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Is there like a matrix grid somewhere that'll show the different algos on a chart for speed/throughput and secure/reliable?
No there isn't. The speed and throughput depend on so many factors that are so much bigger, it wouldn't make sense. It's like asking what tires will make your car the fastest.
Does a CPU having AES-NI help?
AES acceleration helps with AES. remember this is a hash stored in metadata, not encryption.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Also can algos be set dataset wide
Yes. zfs set checksum=sha512 pool/dataset

But I'd agree with @kdragon75 that, unless you're doing deduplication, there's just no reason to do this.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
The hashes and collisions don't matter unless there being used for things that transform what's written to disk. Think dedupe.

Correct. Since the checksum is being used for verification and not a data-reduction scenario (which defaults to sha256 anyways) the risk of a hash-collision would have to arise from bitrot affecting a record in the exact manner needed to cause one.

For an example, here's two famous blocks that create an MD5 hash collision:

Code:
d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f89
55ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5b
d8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70

d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f89
55ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5b
d8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70

Can you spot the differences? ;)

Both of these have the same MD5 sum - 79054025255fb1a26e4bc422aef54eb4 - but with six bits changed. Flip more than six, less than six, or flip one "wrong bit" and you don't collide.

And that's only within a 128-byte block. Imagine trying to hash-collide a 128Kbyte ZFS record.

Now imagine that happening through natural forces of electromagnetic interference, solar flares, or alien meddling (whatever have you) - that level of results from entropy is beyond "winning the lottery" and well into "primordial soup spontaneously forming life several billion years ago."

But if you actually want to protect against that; ZFS will let you, as described by @danb35 's command. Note that this only applies to new data written.
 

GTAXL

Cadet
Joined
Jan 8, 2018
Messages
5
Thanks for the replies. Since it can be set at the dataset level I can create new datasets to play around with and see the performance. I would mainly only apply it to critical documents, stuff that wouldn't need a whole lot of throughput anyway. Is the performance degradation that bad from going to anything else over fletcher4? As that what you guys seem to make it out to be, as there is little to no performance hit then what's the reason not to if it's more secure and even lowers chances further.. Now the real question, skein vs SHA512?

You've already solved this for me by saying it can be set per dataset. :) I can freely play around with it, where as if it was only for the entire pool, not so much. And yes I'm aware it's for newly written data only. :):p
 
Status
Not open for further replies.
Top