ZFS and RAIDZ1 Resilvering

ZFS Noob · Jan 3, 2014

We're all aware of the problems using RAID5 in the modern world - big drives, high probability of a URE on resilver, *poof* goes the volume.

This knowledge is tied to traditional hardware RAID controllers, however. I would like to know how ZFS's inherent strengths affect this issue.

Issues that might matter:

Ditto blocks are used, but these only really affect metadata and aren't used for the file data itself, so this looks like it's only useful if the URE happens when reading metadata.
Does ZFS identify a URE when rebuilding parity, realize the sector's data is lost, and choose to destroy the Zpool, or is it possible to get ZFS to continue restoring the rest of the pool even though that sector is lost?

If the second point of something configurable then someone with hundreds of thousands of photos on FreeNAS would probably prefer to know an image somewhere is now kind of screwy, rather than seeing the entire pool disappear.
Can someone clue me in? Is ZFS better with RAIDZ1 than hardware RAID controllers are with RAID5?

Dusan · Jan 3, 2014

Here's a quick experiment I did with a ZFS mirror, RAIDZ should behave the same (it will finish the resilver and report permanent errors): What happens if URE is encountered during mirrored vdev resilver?

cyberjock · Jan 3, 2014

UREs can potentially corrupt 2 things.. file data or metadata. If its file data, then zpool status -v will show your bad file, but you are lucky. Metadata normally has 2 copies, so it should be able to use the backup. But, the backup can also be bad if your failing disk has trashed both copies.

ZFS will continue to rebuild on a failure to read error. As long as the disk doesn't go offline from the system ZFS will continue. Now, the definition of "continue" is widely varied. If it trashes your metadata, the pool may unmount itself or cause a kernel panic, which is basically a useless pool(and probably unmountable now too).

Is ZFS better with RAIDZ1 than hardware RAID controllers are with RAID5?

It is, but it isn't. Deep down there are things that can make RAIDZ1 potentially better than a hardware RAID5. But the reality of it based on forum users is that you aren't likely to have a single bad sector that trashes your pool. What is likely is that you'll either have no errors at all, or you'll have rampant errors on another disk.

ZFS is designed with the expectation that it can always find and correct its own errors without exception. I cannot stress enough the "without exception" part. The most important thing is recognizing this truth. This is also why there is no chkdsk or fsck for ZFS. Scrubs will go through all of the file system data and will fix any errors using parity data. Finding any error is easy because everything is checksummed. Fixing them depends on having parity data. If you have a RAIDZ1 and you have to replace a disk you have no parity data. So any read error is potentially fatal to ZFS. They key is to always be in a position where your data always has parity data for that off chance that an error occurs. You lose that with RAIDZ1 at the one time when you need it the most.. during a resilver.

This touches on the reason why ECC RAM is so important. ZFS cannot fix errors it can't detect and correct. And ZFS loses the ability to fix errors if errors are in RAM, so you can see pool corruption even though you have sufficient redundancy! People don't realize this simple fact, and go about their merry way until their pool fails. Then they are shocked when their RAIDZ3 pool that has perfectly working disks is corrupt without any warning.

Hopefully it makes more sense. ;)

ZFS Noob · Jan 3, 2014

That's actually pretty outstanding. Thanks, folks. :)

Important Announcement for the TrueNAS Community.

ZFS and RAIDZ1 Resilvering

ZFS Noob

Contributor

Dusan

Guru

cyberjock

Inactive Account

ZFS Noob

Contributor

Similar threads

Important Announcement for the TrueNAS Community.

ZFS and RAIDZ1 Resilvering

ZFS Noob

Contributor

Dusan

Guru

cyberjock

Inactive Account

ZFS Noob

Contributor

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "ZFS and RAIDZ1 Resilvering"

Similar threads