SOLVED What are the possible effects of a memory bit-flip error on ZFS vs other filesystems?

jblack

Dabbler
Joined
Feb 2, 2023
Messages
19
I know the issue of ECC vs non-ECC memory has been brought up countless times, but I am seeing so much conflicting info out there. Some people say that a single bit-flip will destroy all data in the whole pool, while others say it might corrupt a single file.
As I understand it, on most filesystems, a bit flip will go largely undetected and will usually just lead to corruption of the file that bit is part of. I keep hearing how it can be much more detrimental on ZFS, but I don't understand how. As I understand it, ZFS has some checks in place to detect errors in files, but those only take effect after the data has been written to the drive initially, so it seems like a bit flip would just result in corrupted data being written to it, which ZFS would just assume is good data, leading to the same undetected corruption as with any other filesystem.

So my question is just what are the possible (reasonably speaking) effects of a memory bit-flip error on ZFS, and how do they differ with other filesystems like EXT4 or BTRFS?

I'm trying to build a low-power, budget-friendly NAS, and trying to decide if I need ECC or not, since enterprise hardware (meaning supports ECC) is either low-power or cheap, whereas consumer stuff can pretty easily be both. The data isn't super critical, so a corrupted file from a bit-flip is a risk I'm willing to take, but losing all my data from a bit-flip is not.
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
One single bit flipped can only rarely impact more than one single file (it would need to be in metadata, not file data for that to happen, but could have impact all the way up to entire pool if you're unlucky).

The key difference with other filesystems is that they won't notice if there's an issue in a file, whereas ZFS will always ensure what was given to it is what it will give back.

Other filesystems (and ZFS) can (and do) get corrupted by flipped bits.

ZFS can heal itself in a properly designed pool in most cases (depending on when the flip happens). Other filesystems will allow you to discover the file corruption yourself with an app (whatever consumer of the data).

It all comes down to how much you love your data... ZFS or not, how willing are you to have your data corrupted?
 

jblack

Dabbler
Joined
Feb 2, 2023
Messages
19
One single bit flipped can only rarely impact more than one single file (it would need to be in metadata, not file data for that to happen, but could have impact all the way up to entire pool if you're unlucky).

The key difference with other filesystems is that they won't notice if there's an issue in a file, whereas ZFS will always ensure what was given to it is what it will give back.

Other filesystems (and ZFS) can (and do) get corrupted by flipped bits.

ZFS can heal itself in a properly designed pool in most cases (depending on when the flip happens). Other filesystems will allow you to discover the file corruption yourself with an app (whatever consumer of the data).

It all comes down to how much you love your data... ZFS or not, how willing are you to have your data corrupted?
So with any filesystem, a bit-flip error can corrupt the whole pool?
Assuming no ECC, is ZFS any safer than something like BTRFS?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
So with any filesystem, a bit-flip error can corrupt the whole pool?
Depends what's in memory when the bit flips... if it's any critical part of the partition/file table, maybe yes.

Also... mostly only ZFS uses pools (most other systems would call those volumes).

Assuming no ECC, is ZFS any safer than something like BTRFS?
A lot more hours of testing are there in ZFS than BTRFS, but they are largely similar in intent and operation.

Just read this article if you're really concerned on nitpicking, but if your intention is data integrity, ZFS is best.

 
Joined
Oct 22, 2019
Messages
3,641
As I understand it, ZFS has some checks in place to detect errors in files, but those only take effect after the data has been written to the drive initially, so it seems like a bit flip would just result in corrupted data being written to it, which ZFS would just assume is good data, leading to the same undetected corruption as with any other filesystem.
If this is a flaw of ZFS, then so is a poorly written letter (secured in a safety envelope) a flaw of your postal service. Would you expect your mailman to open the letter, read it, and contact the sender to make sure "Is this what you really meant to write?"


So my question is just what are the possible (reasonably speaking) effects of a memory bit-flip error on ZFS
World War I was started because of a bit-flip. This was before ZFS was developed, I believe.


but losing all my data from a bit-flip is not.
That's the "paper shredder" theory about ZFS scrubs. (The "Scrub of Death".) Only theoretical, and even the theory itself is questionable.


Your concerns are not exclusive to ZFS. They apply to all filesystems. So how would using ZFS be riskier?

"Winnie Cars boast an A+ safety rating, leagues ahead of other car manufacturers. But driving their cars over a cliff won't do anything to protect the passengers. If this is true, how can Winnie Cars be considered safe?"
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
It's probably worth mentioning (as I'm reminded by @winnielinnie ) that damage (only) to the metadata would almost always be recoverable with either transaction group rollback or with recovery tools (although most of those are not free).
 

jblack

Dabbler
Joined
Feb 2, 2023
Messages
19
If this is a flaw of ZFS, then so is a poorly written letter (secured in a safety envelope) a flaw of your postal service. Would you expect your mailman to open the letter, read it, and contact the sender to make sure "Is this what you really meant to write?"
I was not pointing to this as a flaw of ZFS, hence the "as with any other filesystem" at the end of that sentence.
Your concerns are not exclusive to ZFS. They apply to all filesystems. So how would using ZFS be riskier?
This was my exact question, because I've heard multiple times that using non-ECC with ZFS is riskier. From the answers above, that doesn't seem to be the case, but I've heard it enough times that I had to ask the question.
 
Joined
Oct 22, 2019
Messages
3,641
Top