RAID1 and Bit Rot - a theoretical question please help :)

traderjay · Jun 17, 2016

This is a question for the pros here -

Since RAID 1 puts a duplicate copy of same data on two drives, and lets say one drive experienced a bit rot that affected a file. When the file is accessed how does the controller determine which drive it reads from?

But my speculation is unless the controller actively scans the entire data and compares it against the checksum, then maybe opening the corrupted file will result in an error unless the controller re-reads from the second drive?

Bidule0hm · Jun 18, 2016

There's checksums so ZFS knows which data is good and which isn't. If the data read doesn't match the checksum then it reads from the other drive and check again against the checksum, if the data is good then it serves it to the user, if it's bad then you have a big problem with your pool and the data isn't served.

danb35 · Jun 18, 2016

traderjay said:
Since RAID 1 puts a duplicate copy of same data on two drives, and lets say one drive experienced a bit rot that affected a file. When the file is accessed how does the controller determine which drive it reads from?

With RAID 1 (i.e., non-ZFS mirroring), the controller has no way to know, and is unlikely to recognize that there's a difference between what's on one disk and the other. It might serve up the good data or it might serve up the bad data.

With ZFS mirrors, it works as @Bidule0hm says--each block of data is checksummed, and if the data on one disk is incorrect, ZFS looks to the other disk. If that data on that disk is correct, it serves that data, and also corrects the data on the first disk.

jgreco · Jun 21, 2016

traderjay said:
This is a question for the pros here -

Since RAID 1 puts a duplicate copy of same data on two drives, and lets say one drive experienced a bit rot that affected a file. When the file is accessed how does the controller determine which drive it reads from?

But my speculation is unless the controller actively scans the entire data and compares it against the checksum, then maybe opening the corrupted file will result in an error unless the controller re-reads from the second drive?

So you're talking hardware RAID controller?

A better quality RAID controller will typically issue the read command to the less-busy side of the mirror.

As previously noted, there's no way to identify that one of the blocks is in error because there's no checksum. If the drive reports that the block was read correctly, the controller assumes that to be true. If the drive reports error, the controller will re-read the block from the other side of the mirror.

traderjay · Jun 23, 2016

Thanks all for the replies!

Important Announcement for the TrueNAS Community.

RAID1 and Bit Rot - a theoretical question please help :)

traderjay

Explorer

Bidule0hm

Server Electronics Sorcerer

danb35

Hall of Famer

jgreco

Resident Grinch

traderjay

Explorer

Similar threads