"RAID-Z is required for automatic repair capabilities"

winnielinnie · Jul 17, 2021

In regards to ZFS, I was always under the impression that scheduled scrubs would detect data corruption and bit rot, and attempt to fix it.

My assmption so far...
Upon reading the (corrupted) record, if the generated checksum doesn't match the expected checksum, it is marked as a checksum error (i.e, "corrupted"). However, as long as there is some level of redundancy of the underlying devices, an automatic repair is possible, correct? For instance, if it's a mirrored vDev, there's a good copy of the record on the other devices(s), with which ZFS can use to re-write the record on the drive in question.

Yet, according to this blog post by Jody Bruchon (originally written in 2017, updated in 2020),

I strongly advocate for people using what fits their specific needs, and two years ago, there was a strong ZFS fanatical element on r/DataHoarder that was aggressively pushing ZFS as a data integrity panacea that all people should use, but leaving out critical things like RAID-Z being required for automatic repair capabilities.

[...]

If you use ZFS, you have to use RAID-Z, otherwise you will get none of the advertised protection that ZFS offers other than detecting degraded data

[...]

It’s the same exact issue as ZFS without RAID-Z: you have ZFS checksums that can detect an integrity error but you have no way to fix it, so the data is still lost and backups are your only salvation.

[...]

My main point is that a lot of the information kicking around the internet about ZFS is misleading or lacking some critical points, i.e. RAID-Z being a requirement for ZFS automatic self-healing, arguably one of the most severe omissions in most ZFS evangelism, because what good is detecting bit rot if the rotten data is permanently lost anyway?

Why would RAID-Z be mandatory for this auto repeair feature? What am I misunderstanding about this process? Does a scrub only detect corruption without being able to fix it, unless your pool is made of RAID-Z vDevs? For what it's worth, I have only ever used mirrored vDevs for my pools.

Etorix · Jul 18, 2021

The first question would be: Why would you trust a piece of rant on a blog over official documentation?

Short answer: Any kind of redundancy (raidzN, mirror, copies=2) allows ZFS to repair data correction; single drive vdevs with copies=1 do not.

Longer answer: The blogger is ranting in defence of RAID5 and arguing that this configuration still have use cases over RAID6 or over the more elaborate ZFS solutions. He never mentions ZFS mirrors and only mentions RAID-10 twice, only to dismiss it immediately as an enterprise-only setup that is too expensive for the little guy, which is his explicit and exclusive viewpoint.
Assuming that one reads him carefully and performs regular backups of the RAID5 array as he mandates, he may have a point—except that the little guy is unlikely to take all due care and bother to have a second NAS setup to automatically backup his first NAS, and that RAID5 on large HDDs with no proper backup remains a disaster waiting to happen to the little guy's precious data.

winnielinnie · Jul 18, 2021

Etorix said:
The first question would be: Why would you trust a piece of rant on a blog over official documentation?

I didn't, but I was genuinely curious that "perhaps I missed an important detail, because he reiterates the same point specifically about RAID-Z multiple times, as if he knows something that ZFS users are unaware of?" I read Oracle (old) and OpenZFS documenation in regards to bit rot, and they made no mention about RAID-Z being required; simply that some level of redundancy needs to be used. (My original and current assumption.)

In fact, I stumbled across such documentation around the same time I found his article (and immediately noticed his tone, but still read through it.)

- - -

If you search with Google (without quotes): how does zfs protect against bit rot

...his blog post is the third or fourth result!

- - -

In his post, I found no mention about Mirror vDevs, nor "copies=N", nor single-device vDevs. That's why I continued to read through some of his comments, as he kept reiterating "RAID-Z is required." Other readers were thankful that he brought such a "glaring omission" to light, and how the ZFS community is not being honest about bit rot and auto-repairing of corrupted data. I read through it with much suspicion, as it didn't make sense on the surface (and thank you for clarifying my doubts about his article!)

My main concern is that his post is at the top of the results for what I believe to be common keywords used in a Google search (in regards to ZFS and data integrity), and upon reading his updates and main body there is no specific clarification about "you need some level of redundancy for protection against data corruption". I figure most people who get into ZFS and care about data integrity are using redundancy (mirror, RAID-Z) by default, anyways. Again, this adds confusion for his main arguments against ZFS' inability to safeguard and repair corrupted data. (Since my belief that most people who go out of there way to build ZFS pools use underlying redundancy of multiple drives.)

Yes, it's true that redundancy and protection against bit rot is not a substitute for a smart backup strategy, but for such a blog post that hits the "top of the list" when searching for how ZFS protects against bit rot, I think it does a disservice to curious users that want to switch over to ZFS for their longterm data storage.

Etorix · Jul 18, 2021

Well, you might have found another instance where Google ranking is less than adequate—possibly as a result of taking buzz (number of pages referring to an angry polemical blog post) as a proxy for relevance, most likely as a result as defining, by design, "ranking relevance" as "click bait around which we can sell ads" rather the definition you expect.

Why anyone would take a rant as a comprehensive and thoughtful technical documentation is still quite obscure to me anyway…

The author might have come across some particularly stupid zealot who insisted that his data was protected by the magic power of ZFS even though he used single drive vdevs (never bet against human stupidity…), and had good reasons to be angry. Still, his tone makes him a major disservice. It is also quite obvious that he is focussed on a very specific use case, has decided from the outset that ZFS is unnecessary overhead for this use case and that 50% space efficiency is not acceptable, so he's not going to discuss RAID 1 or mirrors and "copies=2" must sound a complete non-starter to him (admittedly, this is not a very common option). I note that he mentions "bit rot" and long rebuilt times but conspicuously NOT the Unrecoverable Read Error rate of HDDs, which is the original reason why RAID5 was declared "dead"; to his argument that there's still life in RAID5, that's an even more glaring omission than not mentioning mirrors as a source of redundancy.

danb35 · Jul 18, 2021

winnielinnie said:
Why would RAID-Z be mandatory for this auto repeair feature?

It isn't, and he's wrong. Google be damned, PageRank is a very imperfect algorithm, and any idiot can post anything they want on their blog or on Reddit. Yes, you're absolutely right; mirrors (or copies=2) will do the same thing, and more efficiently, as the data doesn't have to be reconstructed from parity (though this is going to be a very minor issue).

Now, I'm not on board the "RAID5 is dead" train; the argument makes some assumptions that are invalid with ZFS. I don't use RAIDZ, nor do I often recommend it, but it's far from the oft-claimed death sentence for your pool on resilver.

Patrick M. Hausen · Jul 18, 2021

danb35 said:
Now, I'm not on board the "RAID5 is dead" train; the argument makes some assumptions that are invalid with ZFS. I don't use RAIDZ, nor do I often recommend it, but it's far from the oft-claimed death sentence for your pool on resilver.

The RAID5 is dead argument is about the naive "hardware" implementation that declares an entire disk dead as soon as there are too many unrecoverable read errors on the device. The probability of encountering another one of those during rebuild approaches 1 with increasing disk sizes. One of the strengths of ZFS is the fact that it operates strictly on a per-block basis and will happily reconstruct one data block from disk 1,2,3 and another one from disk 1,2,4 - if there are unreadable blocks on disks 3 and 4 for all the same as long as they do not overlap.
As far as I have read, simple RAID implementations don't do that but simply declare entire disks "dead" as soon as UREs occur.

Kind regards,
Patrick

danb35 · Jul 18, 2021

Patrick M. Hausen said:
The RAID5 is dead argument is about the naive "hardware" implementation

True. And as applied to ZFS, it assumes that a URE in critical pool metadata will destroy your pool, in apparent ignorance of the fact that all metadata is stored in multiple copies (six copies, I believe, of the uberblock).

Arwen · Jul 18, 2021

@danb35 - That brings up a point that I have experienced, checksum errors on my media server's striped pool, (aka 2 disks, no data redundancy). (But, I have multiple backups of the media... and the media does not change much.)

When errors occur, 99% of the errors have been in larger video files. Easy enough to restore from backups.

However, I once saw a checksum error, but nothing listed as failing. Additional scrubs showed nothing wrong. I puzzled over that for a long time. Finally I figured it out as being in Metadata.

So your point about metadata having multiple copies is a good reminder. File / directory metadata has 2 copes, (unless you set "copies=2", then this metadata increases to 3 copies). And more important metadata has 3 copies, (unless you set "copies=2", then it increases to 4 copies). And if you have "copies=3", then metadata goes up by yet another copy. At least that's how I think it works. Of course, the Uberblocks have lots more copies per disk.

Borja Marcos · Jul 19, 2021

winnielinnie said:
In fact, I stumbled across such documentation around the same time I found his article (and immediately noticed his tone, but still read through it.)

- - -

If you search with Google (without quotes): how does zfs protect against bit rot

...his blog post is the third or fourth result!

- - -

Well, by linking to it you just raised its score! :) No offense intended, but you know how this works.

Probably that rant was linked from several non-ZFS and/or Church of Licenseology bigotry blogs.

sretalla · Jul 19, 2021

I would take a slightly different approach to interpreting what the "quoted article" was saying.

Just by using ZFS (in stripe), you won't protect against bit rot... you can detect it but not fix it.

Where it went "wrong" was to describe (and now this is my interpretation) all methods of redundancy in ZFS as "RAIDZ" either a mistaken understanding or mis-use of terminology, but I don't see the part where the author specifically says Mirrors or copies=2 are not "RAIDZ equivalent" (nor even mentions them at all).

So for me, the article is trying to point out the misconception that all ZFS pools are safe from bit rot, just doing it with the wrong words.

Basil Hendroff · Jul 19, 2021

The article resonated well with me. Putting aside the rant, I thought the writer takes a fairly pragmatic view of file systems. The key point I took away was that, irrespective of all the data protection mechanisms a particular file system offers, or fails to address, it's still no substitute for backing up data.

Borja Marcos · Jul 19, 2021

sretalla said:
Just by using ZFS (in stripe), you won't protect against bit rot... you can detect it but not fix it.

Something other FSs won't do either. If you need high performance and you don't need redundancy at least you can detect bit rot. There are use cases for which it will still be appropiate.

Where it went "wrong" was to describe (and now this is my interpretation) all methods of redundancy in ZFS as "RAIDZ" either a mistaken understanding or mis-use of terminology, but I don't see the part where the author specifically says Mirrors or copies=2 are not "RAIDZ equivalent" (nor even mentions them at all).

So for me, the article is trying to point out the misconception that all ZFS pools are safe from bit rot, just doing it with the wrong words.

Yes, but an article is made of words. If you don't choose them properly...

Unfortunately it's a widespread issue with misconceptions propagating like the plague. A popular article will be quoted endlessly.

HoneyBadger · Jul 20, 2021

TL;DR of article seems to be:

Personally, I have never seen anyone suggest that ZFS can magically repair corrupted data without a known good copy (eg: from sufficiently redundant vdevs in mirror/raidz). That's just silly, and if I caught anyone suggesting such a thing I'd make an attempt to correct them, and if they insisted on staying wrong, well, best of luck to you then, I've got better things to do with my time.

HarambeLives · Jul 21, 2021

Patrick M. Hausen said:
The RAID5 is dead argument is about the naive "hardware" implementation that declares an entire disk dead as soon as there are too many unrecoverable read errors on the device. The probability of encountering another one of those during rebuild approaches 1 with increasing disk sizes. One of the strengths of ZFS is the fact that it operates strictly on a per-block basis and will happily reconstruct one data block from disk 1,2,3 and another one from disk 1,2,4 - if there are unreadable blocks on disks 3 and 4 for all the same as long as they do not overlap.
As far as I have read, simple RAID implementations don't do that but simply declare entire disks "dead" as soon as UREs occur.

Kind regards,
Patrick

Does this mean that RAIDZ2 is much more safe than RAID5? I'm coming from a Synology system where I've opted for a RAID6. I am setting up a new system for bulk media, and I am trying to decide if I should go with RAIDZ1 or RAIDZ2. I've settled on RAIDZ2 because I am using 8TB disks and the chance of an URE is higher, but it sounds like you could be suggesting that RAIDZ1 is much, much safer than RAID5 in that regard

danb35 · Jul 21, 2021

RAIDZ1 is considerably safer than traditional RAID5; RAIDZ2 is considerably safer than RAIDZ1.

Basil Hendroff · Jul 21, 2021

HarambeLives said:
Does this mean that RAIDZ2 is much more safe than RAID5? I'm coming from a Synology system where I've opted for a RAID6. I am setting up a new system for bulk media, and I am trying to decide if I should go with RAIDZ1 or RAIDZ2. I've settled on RAIDZ2 because I am using 8TB disks and the chance of an URE is higher, but it sounds like you could be suggesting that RAIDZ1 is much, much safer than RAID5 in that regard

Safe is a relative term. My data is safer when replacing a disk under RAIDZ2 than under RAIDZ1. During the resilvering of the replaced disk, which can take a very long time, the RAIDZ2 pool can suffer a second disk failure and still survive; the RAIDZ1 pool will fail. However, my data is safer on a server with a RAIDZ1 pool replicated to a second server with a RAIDZ1 pool, than it is on a single souped-up server with a RAIDZ3 pool. The point is, the Nth degree of redundancy is a moot point if you don't have a backup.

HarambeLives · Jul 21, 2021

Thanks. I will stick with RAIDZ2 after doing some more reading and getting your guys inputs

The pool in question will be for storing bulk media, which will not be backed up at all (Unless I can figure a nice and cheap way to backup 20+ TB...)

Arwen · Jul 22, 2021

HarambeLives said:
...
The pool in question will be for storing bulk media, which will not be backed up at all (Unless I can figure a nice and cheap way to backup 20+ TB...)

My NAS is backed up to a large external disk, (single disk ZFS pool). In my case, it works quite well. Takes a while, but whence the initial data was loaded, the only long part is the ZFS scrub before each backup. (At home, I only run backups about once a month. And rotate 3 disks, so a scrub every backup, 3 months, makes sense.)

There are several ways to run larger backups. For example, 2 disks in a ZFS striped pool. If it's for backups, using a striped disk pool may be okay.

Another way that I wish had an automated tool, is column & size based. Assume the source is large, multiple disks in one file system or pool. And the destination is also multiple disks, but independent file systems for safety, (so as not to loose the whole pool on failure of one). Sorting the source by largest files first, and then backing them up to the disks attempts to make them fit on 1 disk. Whence they are done, then proceed with medium sized files, then small files.

This allows each file to be stored on a single disk. So on failure, you would only loose the files on that disk. Since it's a backup, simply buy / replace the disk, and re-run the backup. It will re-fill the failed disk with all the files not on the other disk.

Again, I wish their was a tool to do this automatically. Something like RSync but across multiple targets based on size.

I may not have described this well,

Server	Version	HPE Proliant Micro Server	CPU	RAM (DDR3 ECC @ 1600 MHz)	Pool	Boot	Battery Backup	Jails	VMs	Docker	Other
truenas-l	CORE 12.0-U6	Gen 8	Intel Xeon E3-1270L V2 @ 2.3GHz	16GB	4 x 10TB WD Red+ in RAID-Z1	2 x 16GB Verbatim Store n Go USB 3.0 Gold flash drives in mirror	PowerShield Defender 1200VA. Server is NUT master	DNSmasq, Heimdall, Nextcloud, Plex (Beta), Resilio Sync, Tautulli, Transmission, WordPress			File & media server. Replication source.
truenas-l2	CORE 12.0-U6	Gen 8	Intel Xeon E3-1220L V2 @ 3.5GHz	16GB	4 x 8TB WD Red+ in RAID-Z1	2 x 16GB Verbatim Store n Go USB 3.0 Gold flash drives in mirror	PowerShield Defender 1200VA. Server is NUT slave	Caddy Reverse Proxy	Ubuntu 20.0.1 Desktop (2 core, 4GB RAM, 150GB HDD) with Docker and Docker Compose	OnlyOffice, Collabora, TrueCommand, TC 1.2.3 & 1.3.2 Portainer, Nextcloud-Apache, Nextcloud-FPM, WordPress	Plex DVR media server.
truenas-b1	CORE 12.0-U6	Gen 7 N54L	AMD Turion II Neo N54L @ 2.2GHz	12GB	5 x 6TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror	PowerShield Defender 1200VA. Server is NUT master				Media replication target.
truenas-b2	CORE 12.0-U6	Gen 7 N54L	AMD Turion II Neo N54L @ 2.2GHz	12GB	5 x 4TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror	PowerShield Defender 1200VA Server is NUT slave				File replication target.
truenas-r	CORE 12.0-U6	Gen 7 N54L	AMD Turion II Neo N54L @ 2.2GHz	10GB	5 x 6TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror		Plex (Beta)			Off-site backup
truenas-t	CORE 12.0-U6	Gen 7 N40L	AMD Turion II Neo N40L @ 1.5GHz	8GB	4 x 3TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror					Test server
truenas-s	SCALE 22.02-RC.1	Gen 8	Intel Xeon E3-1220L V2 @ 3.5GHz	16GB	2 x 1TB WD Red in mirror	1 x 32GB Transcend M.2 SSD in a USB 3.1 enclosure				OnlyOffice, Collabora, TrueCommand	Test server

Server	Version	HPE Proliant Micro Server	CPU	RAM (DDR3 ECC @ 1600 MHz)	Pool	Boot	Battery Backup	Jails	VMs	Docker	Other
truenas-l	CORE 12.0-U6	Gen 8	Intel Xeon E3-1270L V2 @ 2.3GHz	16GB	4 x 10TB WD Red+ in RAID-Z1	2 x 16GB Verbatim Store n Go USB 3.0 Gold flash drives in mirror	PowerShield Defender 1200VA. Server is NUT master	DNSmasq, Heimdall, Nextcloud, Plex (Beta), Resilio Sync, Tautulli, Transmission, WordPress			File & media server. Replication source.
truenas-l2	CORE 12.0-U6	Gen 8	Intel Xeon E3-1220L V2 @ 3.5GHz	16GB	4 x 8TB WD Red+ in RAID-Z1	2 x 16GB Verbatim Store n Go USB 3.0 Gold flash drives in mirror	PowerShield Defender 1200VA. Server is NUT slave	Caddy Reverse Proxy	Ubuntu 20.0.1 Desktop (2 core, 4GB RAM, 150GB HDD) with Docker and Docker Compose	OnlyOffice, Collabora, TrueCommand, TC 1.2.3 & 1.3.2 Portainer, Nextcloud-Apache, Nextcloud-FPM, WordPress	Plex DVR media server.
truenas-b1	CORE 12.0-U6	Gen 7 N54L	AMD Turion II Neo N54L @ 2.2GHz	12GB	5 x 6TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror	PowerShield Defender 1200VA. Server is NUT master				Media replication target.
truenas-b2	CORE 12.0-U6	Gen 7 N54L	AMD Turion II Neo N54L @ 2.2GHz	12GB	5 x 4TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror	PowerShield Defender 1200VA Server is NUT slave				File replication target.
truenas-r	CORE 12.0-U6	Gen 7 N54L	AMD Turion II Neo N54L @ 2.2GHz	10GB	5 x 6TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror		Plex (Beta)			Off-site backup
truenas-t	CORE 12.0-U6	Gen 7 N40L	AMD Turion II Neo N40L @ 1.5GHz	8GB	4 x 3TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror					Test server
truenas-s	SCALE 22.02-RC.1	Gen 8	Intel Xeon E3-1220L V2 @ 3.5GHz	16GB	2 x 1TB WD Red in mirror	1 x 32GB Transcend M.2 SSD in a USB 3.1 enclosure				OnlyOffice, Collabora, TrueCommand	Test server

Important Announcement for the TrueNAS Community.

"RAID-Z is required for automatic repair capabilities"

winnielinnie

MVP

Etorix

Wizard

winnielinnie

MVP

Etorix

Wizard

danb35

Hall of Famer

Patrick M. Hausen

Hall of Famer

danb35

Hall of Famer

Arwen

MVP

Borja Marcos

Contributor

sretalla

Powered by Neutrality

Basil Hendroff

Wizard

Borja Marcos

Contributor

HoneyBadger

actually does care

HarambeLives

Contributor

danb35

Hall of Famer

Basil Hendroff

Wizard

HarambeLives

Contributor

Arwen

MVP

Similar threads

Important Announcement for the TrueNAS Community.

"RAID-Z is required for automatic repair capabilities"

MVP

Wizard

MVP

Wizard

Hall of Famer

Hall of Famer

Hall of Famer

MVP

Contributor

Powered by Neutrality

Wizard

Contributor

actually does care

Contributor

Hall of Famer

Wizard

Contributor

MVP

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: ""RAID-Z is required for automatic repair capabilities""

Similar threads