"RAID-Z is required for automatic repair capabilities"

Joined
Oct 22, 2019
Messages
3,584
In regards to ZFS, I was always under the impression that scheduled scrubs would detect data corruption and bit rot, and attempt to fix it.

My assmption so far...
Upon reading the (corrupted) record, if the generated checksum doesn't match the expected checksum, it is marked as a checksum error (i.e, "corrupted"). However, as long as there is some level of redundancy of the underlying devices, an automatic repair is possible, correct? For instance, if it's a mirrored vDev, there's a good copy of the record on the other devices(s), with which ZFS can use to re-write the record on the drive in question.



Yet, according to this blog post by Jody Bruchon (originally written in 2017, updated in 2020),
I strongly advocate for people using what fits their specific needs, and two years ago, there was a strong ZFS fanatical element on r/DataHoarder that was aggressively pushing ZFS as a data integrity panacea that all people should use, but leaving out critical things like RAID-Z being required for automatic repair capabilities.

[...]

If you use ZFS, you have to use RAID-Z, otherwise you will get none of the advertised protection that ZFS offers other than detecting degraded data

[...]

It’s the same exact issue as ZFS without RAID-Z: you have ZFS checksums that can detect an integrity error but you have no way to fix it, so the data is still lost and backups are your only salvation.

[...]

My main point is that a lot of the information kicking around the internet about ZFS is misleading or lacking some critical points, i.e. RAID-Z being a requirement for ZFS automatic self-healing, arguably one of the most severe omissions in most ZFS evangelism, because what good is detecting bit rot if the rotten data is permanently lost anyway?



Why would RAID-Z be mandatory for this auto repeair feature? What am I misunderstanding about this process? Does a scrub only detect corruption without being able to fix it, unless your pool is made of RAID-Z vDevs? For what it's worth, I have only ever used mirrored vDevs for my pools.
 
Last edited:

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,111
The first question would be: Why would you trust a piece of rant on a blog over official documentation?

Short answer: Any kind of redundancy (raidzN, mirror, copies=2) allows ZFS to repair data correction; single drive vdevs with copies=1 do not.

Longer answer: The blogger is ranting in defence of RAID5 and arguing that this configuration still have use cases over RAID6 or over the more elaborate ZFS solutions. He never mentions ZFS mirrors and only mentions RAID-10 twice, only to dismiss it immediately as an enterprise-only setup that is too expensive for the little guy, which is his explicit and exclusive viewpoint.
Assuming that one reads him carefully and performs regular backups of the RAID5 array as he mandates, he may have a point—except that the little guy is unlikely to take all due care and bother to have a second NAS setup to automatically backup his first NAS, and that RAID5 on large HDDs with no proper backup remains a disaster waiting to happen to the little guy's precious data.
 
Joined
Oct 22, 2019
Messages
3,584
The first question would be: Why would you trust a piece of rant on a blog over official documentation?
I didn't, but I was genuinely curious that "perhaps I missed an important detail, because he reiterates the same point specifically about RAID-Z multiple times, as if he knows something that ZFS users are unaware of?" I read Oracle (old) and OpenZFS documenation in regards to bit rot, and they made no mention about RAID-Z being required; simply that some level of redundancy needs to be used. (My original and current assumption.)

In fact, I stumbled across such documentation around the same time I found his article (and immediately noticed his tone, but still read through it.)

- - -

If you search with Google (without quotes): how does zfs protect against bit rot

...his blog post is the third or fourth result! :oops:

- - -

In his post, I found no mention about Mirror vDevs, nor "copies=N", nor single-device vDevs. That's why I continued to read through some of his comments, as he kept reiterating "RAID-Z is required." Other readers were thankful that he brought such a "glaring omission" to light, and how the ZFS community is not being honest about bit rot and auto-repairing of corrupted data. I read through it with much suspicion, as it didn't make sense on the surface (and thank you for clarifying my doubts about his article!)

My main concern is that his post is at the top of the results for what I believe to be common keywords used in a Google search (in regards to ZFS and data integrity), and upon reading his updates and main body there is no specific clarification about "you need some level of redundancy for protection against data corruption". I figure most people who get into ZFS and care about data integrity are using redundancy (mirror, RAID-Z) by default, anyways. Again, this adds confusion for his main arguments against ZFS' inability to safeguard and repair corrupted data. (Since my belief that most people who go out of there way to build ZFS pools use underlying redundancy of multiple drives.)

Yes, it's true that redundancy and protection against bit rot is not a substitute for a smart backup strategy, but for such a blog post that hits the "top of the list" when searching for how ZFS protects against bit rot, I think it does a disservice to curious users that want to switch over to ZFS for their longterm data storage.
 
Last edited:

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,111
Well, you might have found another instance where Google ranking is less than adequate—possibly as a result of taking buzz (number of pages referring to an angry polemical blog post) as a proxy for relevance, most likely as a result as defining, by design, "ranking relevance" as "click bait around which we can sell ads" rather the definition you expect. :rolleyes:
Why anyone would take a rant as a comprehensive and thoughtful technical documentation is still quite obscure to me anyway…

The author might have come across some particularly stupid zealot who insisted that his data was protected by the magic power of ZFS even though he used single drive vdevs (never bet against human stupidity…), and had good reasons to be angry. Still, his tone makes him a major disservice. It is also quite obvious that he is focussed on a very specific use case, has decided from the outset that ZFS is unnecessary overhead for this use case and that 50% space efficiency is not acceptable, so he's not going to discuss RAID 1 or mirrors and "copies=2" must sound a complete non-starter to him (admittedly, this is not a very common option). I note that he mentions "bit rot" and long rebuilt times but conspicuously NOT the Unrecoverable Read Error rate of HDDs, which is the original reason why RAID5 was declared "dead"; to his argument that there's still life in RAID5, that's an even more glaring omission than not mentioning mirrors as a source of redundancy.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,458
Why would RAID-Z be mandatory for this auto repeair feature?
It isn't, and he's wrong. Google be damned, PageRank is a very imperfect algorithm, and any idiot can post anything they want on their blog or on Reddit. Yes, you're absolutely right; mirrors (or copies=2) will do the same thing, and more efficiently, as the data doesn't have to be reconstructed from parity (though this is going to be a very minor issue).

Now, I'm not on board the "RAID5 is dead" train; the argument makes some assumptions that are invalid with ZFS. I don't use RAIDZ, nor do I often recommend it, but it's far from the oft-claimed death sentence for your pool on resilver.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,740
Now, I'm not on board the "RAID5 is dead" train; the argument makes some assumptions that are invalid with ZFS. I don't use RAIDZ, nor do I often recommend it, but it's far from the oft-claimed death sentence for your pool on resilver.
The RAID5 is dead argument is about the naive "hardware" implementation that declares an entire disk dead as soon as there are too many unrecoverable read errors on the device. The probability of encountering another one of those during rebuild approaches 1 with increasing disk sizes. One of the strengths of ZFS is the fact that it operates strictly on a per-block basis and will happily reconstruct one data block from disk 1,2,3 and another one from disk 1,2,4 - if there are unreadable blocks on disks 3 and 4 for all the same as long as they do not overlap.
As far as I have read, simple RAID implementations don't do that but simply declare entire disks "dead" as soon as UREs occur.

Kind regards,
Patrick
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,458
The RAID5 is dead argument is about the naive "hardware" implementation
True. And as applied to ZFS, it assumes that a URE in critical pool metadata will destroy your pool, in apparent ignorance of the fact that all metadata is stored in multiple copies (six copies, I believe, of the uberblock).
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,600
@danb35 - That brings up a point that I have experienced, checksum errors on my media server's striped pool, (aka 2 disks, no data redundancy). (But, I have multiple backups of the media... and the media does not change much.)

When errors occur, 99% of the errors have been in larger video files. Easy enough to restore from backups.

However, I once saw a checksum error, but nothing listed as failing. Additional scrubs showed nothing wrong. I puzzled over that for a long time. Finally I figured it out as being in Metadata.

So your point about metadata having multiple copies is a good reminder. File / directory metadata has 2 copes, (unless you set "copies=2", then this metadata increases to 3 copies). And more important metadata has 3 copies, (unless you set "copies=2", then it increases to 4 copies). And if you have "copies=3", then metadata goes up by yet another copy. At least that's how I think it works. Of course, the Uberblocks have lots more copies per disk.
 

Borja Marcos

Contributor
Joined
Nov 24, 2014
Messages
125
In fact, I stumbled across such documentation around the same time I found his article (and immediately noticed his tone, but still read through it.)

- - -

If you search with Google (without quotes): how does zfs protect against bit rot

...his blog post is the third or fourth result! :oops:

- - -

Well, by linking to it you just raised its score! :) No offense intended, but you know how this works.

Probably that rant was linked from several non-ZFS and/or Church of Licenseology bigotry blogs. :tongue:
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
I would take a slightly different approach to interpreting what the "quoted article" was saying.

Just by using ZFS (in stripe), you won't protect against bit rot... you can detect it but not fix it.

Where it went "wrong" was to describe (and now this is my interpretation) all methods of redundancy in ZFS as "RAIDZ" either a mistaken understanding or mis-use of terminology, but I don't see the part where the author specifically says Mirrors or copies=2 are not "RAIDZ equivalent" (nor even mentions them at all).

So for me, the article is trying to point out the misconception that all ZFS pools are safe from bit rot, just doing it with the wrong words.
 
Joined
Jan 4, 2014
Messages
1,644
The article resonated well with me. Putting aside the rant, I thought the writer takes a fairly pragmatic view of file systems. The key point I took away was that, irrespective of all the data protection mechanisms a particular file system offers, or fails to address, it's still no substitute for backing up data.
 
Last edited:

Borja Marcos

Contributor
Joined
Nov 24, 2014
Messages
125
Just by using ZFS (in stripe), you won't protect against bit rot... you can detect it but not fix it.
Something other FSs won't do either. If you need high performance and you don't need redundancy at least you can detect bit rot. There are use cases for which it will still be appropiate.

Where it went "wrong" was to describe (and now this is my interpretation) all methods of redundancy in ZFS as "RAIDZ" either a mistaken understanding or mis-use of terminology, but I don't see the part where the author specifically says Mirrors or copies=2 are not "RAIDZ equivalent" (nor even mentions them at all).

So for me, the article is trying to point out the misconception that all ZFS pools are safe from bit rot, just doing it with the wrong words.
Yes, but an article is made of words. If you don't choose them properly...

Unfortunately it's a widespread issue with misconceptions propagating like the plague. A popular article will be quoted endlessly.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
TL;DR of article seems to be:

1626789580761.png


Personally, I have never seen anyone suggest that ZFS can magically repair corrupted data without a known good copy (eg: from sufficiently redundant vdevs in mirror/raidz). That's just silly, and if I caught anyone suggesting such a thing I'd make an attempt to correct them, and if they insisted on staying wrong, well, best of luck to you then, I've got better things to do with my time.
 

HarambeLives

Contributor
Joined
Jul 19, 2021
Messages
153
The RAID5 is dead argument is about the naive "hardware" implementation that declares an entire disk dead as soon as there are too many unrecoverable read errors on the device. The probability of encountering another one of those during rebuild approaches 1 with increasing disk sizes. One of the strengths of ZFS is the fact that it operates strictly on a per-block basis and will happily reconstruct one data block from disk 1,2,3 and another one from disk 1,2,4 - if there are unreadable blocks on disks 3 and 4 for all the same as long as they do not overlap.
As far as I have read, simple RAID implementations don't do that but simply declare entire disks "dead" as soon as UREs occur.

Kind regards,
Patrick

Does this mean that RAIDZ2 is much more safe than RAID5? I'm coming from a Synology system where I've opted for a RAID6. I am setting up a new system for bulk media, and I am trying to decide if I should go with RAIDZ1 or RAIDZ2. I've settled on RAIDZ2 because I am using 8TB disks and the chance of an URE is higher, but it sounds like you could be suggesting that RAIDZ1 is much, much safer than RAID5 in that regard
 
Joined
Jan 4, 2014
Messages
1,644
Does this mean that RAIDZ2 is much more safe than RAID5? I'm coming from a Synology system where I've opted for a RAID6. I am setting up a new system for bulk media, and I am trying to decide if I should go with RAIDZ1 or RAIDZ2. I've settled on RAIDZ2 because I am using 8TB disks and the chance of an URE is higher, but it sounds like you could be suggesting that RAIDZ1 is much, much safer than RAID5 in that regard
Safe is a relative term. My data is safer when replacing a disk under RAIDZ2 than under RAIDZ1. During the resilvering of the replaced disk, which can take a very long time, the RAIDZ2 pool can suffer a second disk failure and still survive; the RAIDZ1 pool will fail. However, my data is safer on a server with a RAIDZ1 pool replicated to a second server with a RAIDZ1 pool, than it is on a single souped-up server with a RAIDZ3 pool. The point is, the Nth degree of redundancy is a moot point if you don't have a backup.
 

HarambeLives

Contributor
Joined
Jul 19, 2021
Messages
153
Thanks. I will stick with RAIDZ2 after doing some more reading and getting your guys inputs

The pool in question will be for storing bulk media, which will not be backed up at all (Unless I can figure a nice and cheap way to backup 20+ TB...)
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,600
...
The pool in question will be for storing bulk media, which will not be backed up at all (Unless I can figure a nice and cheap way to backup 20+ TB...)
My NAS is backed up to a large external disk, (single disk ZFS pool). In my case, it works quite well. Takes a while, but whence the initial data was loaded, the only long part is the ZFS scrub before each backup. (At home, I only run backups about once a month. And rotate 3 disks, so a scrub every backup, 3 months, makes sense.)

There are several ways to run larger backups. For example, 2 disks in a ZFS striped pool. If it's for backups, using a striped disk pool may be okay.


Another way that I wish had an automated tool, is column & size based. Assume the source is large, multiple disks in one file system or pool. And the destination is also multiple disks, but independent file systems for safety, (so as not to loose the whole pool on failure of one). Sorting the source by largest files first, and then backing them up to the disks attempts to make them fit on 1 disk. Whence they are done, then proceed with medium sized files, then small files.

This allows each file to be stored on a single disk. So on failure, you would only loose the files on that disk. Since it's a backup, simply buy / replace the disk, and re-run the backup. It will re-fill the failed disk with all the files not on the other disk.

Again, I wish their was a tool to do this automatically. Something like RSync but across multiple targets based on size.

I may not have described this well,
 
Top