ZFS deduplication default hash

Status
Not open for further replies.

IceBoosteR

Guru
Joined
Sep 27, 2016
Messages
503
Hello guys,

I got a quick question about the FreeNAS ZFS deduplication integration. I spend some time and searched for this on the internet, but there was no suitable result.
So, I have one specific dataset with 400Gigs of data, where I have deduplication enabled (DD table is kinda small, 700 MB...). I have set deduplication on "ON" via the GUI. I know there is one function called "VERIFY" where every block which is identical per checksum-hash, is also compared byte-by-byte to avoid any problem, that the blocks are not identical.
So I was thinking about hash-collision. The documentation said, hash-collisions are rare these days, so there is no need for the verify option, only if you are kind of paranoid. So as there are dozen of hash algorithms existing, which one is openZFS using for block comparison by default, when doing this via GUI? Is it fletcher4? Somewhere I saw, that SHA-256 could be enabled, but only via shell, so I assume this is optional.
Can anyone be so kind and answer that question to me? I am really interested in this.

Another question on this, if I may ask:
What is happening when a hash-collision happens? So ZFS thinks the blocks are identical, but they aren't? What happens to the file, when I want to open it? Does the system reconize this time that the checksum is bad and does the self-healing from mirror, or is it not detected and you can end up with an corrupt file?
Because that would be the point, where I would use my extra horsepower from the Xeon for the verify-function, as I want also for this data maximum integrity.

Thank you in advance.
IceBoosteR
 
D

dlavigne

Guest
If you don't get an answer here, let us know if you get one on an OpenZFS or FreeBSD forum.
 

IceBoosteR

Guru
Joined
Sep 27, 2016
Messages
503
If you don't get an answer here, let us know if you get one on an OpenZFS or FreeBSD forum.
Hello dlavigne,

at the moment I have no accounts on any OpenZFS or FreeBSD forum. I hope to get an answer here :)
If there is something new to this topic, I'll let you know.
 

IceBoosteR

Guru
Joined
Sep 27, 2016
Messages
503
I would like to push it back to the top. Maybe someone would pick this up...
 

darkwarrior

Patron
Joined
Mar 29, 2015
Messages
336
Hello there,

actually surprising, but that kind of information seems quite difficult to find ...
The only piece of useful information I found concerning the Hashing algorithms is the following extract from this Oracle blog-post, where fletcher4 is not considered as a "trustworthy" hash function.

Selecting a checksum

Given the ability to detect hash collisions as described above, it is possible to use much weaker (but faster) hash functions in combination with the 'verify' option to provide faster dedup. ZFS offers this option for the fletcher4 checksum, which is quite fast:

zfs set dedup=fletcher4,verify tank

The tradeoff is that unlike SHA256, fletcher4 is not a pseudo-random hash function, and therefore cannot be trusted not to collide. It is therefore only suitable for dedup when combined with the 'verify' option, which detects and resolves hash collisions. On systems with a very high data ingest rate of largely duplicate data, this may provide better overall performance than a secure hash without collision verification.

I actually also tried to enable dedup on my Freenas 10 testing rig to see which kind of algorithm would be used be default, but I was not able to find that out ...
 

IceBoosteR

Guru
Joined
Sep 27, 2016
Messages
503
Hello there,

actually surprising, but that kind of information seems quite difficult to find ...
The only piece of useful information I found concerning the Hashing algorithms is the following extract from this Oracle blog-post, where fletcher4 is not considered as a "trustworthy" hash function.



I actually also tried to enable dedup on my Freenas 10 testing rig to see which kind of algorithm would be used be default, but I was not able to find that out ...
Hi,
thank you for your answer.
I totally agree. I don't know why there are no information about this topic - it is a key feature of ZFS to use deduplication so this information should be on the internet.
This blogpost is really interesting. The're a lot of questions coming up to this: Is fletcher4 not the default? Is it SHA-256?
If it would be SHA-256, everything should be fine as collisions are extremly rare with this one. That kind of performance decrease would be fine, instead of using something unstable like fletcher4 ;)

I will quick set up an test environment and update this post.

Edit: As the FN10 shell is not the pure unix-style anymore, I cannot create anything with zpool. I don't have a FN9.10 test-VM around, so I cannot confirm that.

Edit: Okey, just set up a FreeNAS 10 VM and ssh into the VM, then you'll get the clean-unix-style-machine-CLI ;)
In this case you can set up the zfs commands and here we go:

IamHappy.png

algo.png

Just as simple as sleeping: man zfs
And you get all your answers. I have attached an image for better understanding.
So default when setting deduplication on "ON", SHA-256 is used for comparing the blocks. And this is good, as darkwarrior mentioned, Fletcher is not secure for this. When you set your pool on "VERIFY", also SHA-256 is used, but with the verify function a byte-by-byte comparison is made.

At work I get in touch with a colleague, who set up some Oracle Boxes, and on this machines it is possible to use fletcher4 - but openZFS is as good and locks it out. Fairly, I have to say they are about 5 years old. this is a long time for hardware and their support ;)
Anyway I am happy to have my answer on the important part ;)

But what about this:
What is happening when a hash-collision happens? So ZFS thinks the blocks are identical, but they aren't? What happens to the file, when I want to open it? Does the system reconize this time that the checksum is bad and does the self-healing from mirror, or is it not detected and you can end up with an corrupt file?

Thanks
IceBoosteR
 
Last edited:

IceBoosteR

Guru
Joined
Sep 27, 2016
Messages
503
If you don't get an answer here, let us know if you get one on an OpenZFS or FreeBSD forum.
I promised to get back to you with an answer, and here I have it.
Just look at my previous comment.

IceBoosteR
 
Status
Not open for further replies.
Top