Pool checksum Error

bieringm

Cadet
Joined
Oct 15, 2021
Messages
5
Hey all. I tired to search, but it was difficult to find my exact error.
Here is the result of zpool status:
pool: Den11TB state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P scan: resilvered 2.24G in 00:15:05 with 0 errors on Thu Oct 14 18:41:22 2021 config: NAME STATE READ WRITE CKSUM Den11TB ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 gptid/a524f39a-a62a-11eb-9e22-fcaa143b07ac ONLINE 0 0 0 gptid/a5932ce2-a62a-11eb-9e22-fcaa143b07ac ONLINE 0 0 2 gptid/a5a74be5-a62a-11eb-9e22-fcaa143b07ac ONLINE 0 0 0 gptid/a5bdbbcd-a62a-11eb-9e22-fcaa143b07ac ONLINE 0 0 0 cache gptid/a2e180f0-a62a-11eb-9e22-fcaa143b07ac ONLINE 0 0 0 errors: No known data errors pool: boot-pool state: ONLINE scan: scrub repaired 0B in 00:00:22 with 0 errors on Mon Oct 11 03:45:22 2021 config: NAME STATE READ WRITE CKSUM boot-pool ONLINE 0 0 0 ada0p2 ONLINE 0 0 0 errors: No known data errors

I know the error is with gptid/a5932ce2-a62a-11eb-9e22-fcaa143b07ac, but I am not sure what the error is.

Thanks for the help.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,599
That is simple. ZFS detected a checksum error and was able to automatically correct it from redundancy. The part about "One or more devices has experienced an unrecoverable error." simply means a re-read from the same disk's blocks returned an error. Which would be fatal without RAID-Z1 redundancy from another disk that you have.

Nothing to worry about in the short term. Just keep an eye on that specific disk to see if errors increase. You can also check the output from SMART to see if their are any internal problems the disk can report.

If this is the first occurrence of a problem with that disk, you can use zpool clear to clear the error. (But keep track elsewhere of when this occurred, which disk, what error, checksum in this case, and how many.)
 

enjoywithme

Dabbler
Joined
Dec 23, 2014
Messages
13
Time to time I also has such a problem. I did check the hard disk with vendor diagnostic tool and no problem found.
I doubt it could be problem of power not stable. After adding UPS it still comes, but seems much better.
The problem is I found some files was corrupt! Some RAR file cannot be extracted. SFV check failed for these files.
ZFS told it's repaired. But it seems not true.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,599
Time to time I also has such a problem. I did check the hard disk with vendor diagnostic tool and no problem found.
I doubt it could be problem of power not stable. After adding UPS it still comes, but seems much better.
The problem is I found some files was corrupt! Some RAR file cannot be extracted. SFV check failed for these files.
ZFS told it's repaired. But it seems not true.
If ZFS says the files are good, it's possible that the corruption occurred before putting on the files on ZFS. Which means ZFS generated it's checksum on bad data, (but it's a good checksum).

People have literally said that they have seen no corruption on their MS-Windows NTFS file systems. Or Linux EXT3/4 file systems. So why use ZFS?

This is one of the quirks of using ZFS. People say everything was fine on the source computer. Except without any verification, how does someone know their files were good?

All that said, their have been data integrity bugs in ZFS. They are rare, and tend to get fixed very quickly, after identification. That can take time, both to recognize that it's data loss due to ZFS. And to find the cause. (Sometimes rolling back patches or new features might be warranted.)
 

ASap

Dabbler
Joined
Dec 15, 2022
Messages
23
If ZFS says the files are good, it's possible that the corruption occurred before putting on the files on ZFS. Which means ZFS generated it's checksum on bad data, (but it's a good checksum).

People have literally said that they have seen no corruption on their MS-Windows NTFS file systems. Or Linux EXT3/4 file systems. So why use ZFS?

This is one of the quirks of using ZFS. People say everything was fine on the source computer. Except without any verification, how does someone know their files were good?

All that said, their have been data integrity bugs in ZFS. They are rare, and tend to get fixed very quickly, after identification. That can take time, both to recognize that it's data loss due to ZFS. And to find the cause. (Sometimes rolling back patches or new features might be warranted.)
Well-explained and totally makes sense to me. So, how can we identify or repair these pre-corrupted files?
I'd identify the pre-corrupted files and then repair them if possible; otherwise, delete them, so the checksum error does not reoccur.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
When ZFS detects corruption, it's based on the checksums stored with each block when the data was first written on ZFS (and can be corrected automatically if there's parity stored).

If corruption is detected, it's becasue ZFS is noticing that what it's reading back isn't what it had written earlier.

When the detection is in the CKSUM column, it indicates the cabling or controller is somehow a potential part of the cause, since the disk hasn't indicated a read error itself (would be seen in the READ column otherwise), so check your cabling for that disk.
 

ASap

Dabbler
Joined
Dec 15, 2022
Messages
23
My SATA cables are all new, two from Dynamix brand and another two are original from the motherboard. My point is, if it's a cabling issue, how come two different manufacturers have the same error?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
My point is, if it's a cabling issue, how come two different manufacturers have the same error?
I don't see your issue documented here... is it this one?

If you're seeing multiple CKSUM errors on multiple disks (I note you only see 1 on 3 of your 4), then the SATA controller comes into focus (maybe also power supply).
 

ASap

Dabbler
Joined
Dec 15, 2022
Messages
23
I don't see your issue documented here... is it this one?

If you're seeing multiple CKSUM errors on multiple disks (I note you only see 1 on 3 of your 4), then the SATA controller comes into focus (maybe also power supply).
Thanks for jumping in, @sretalla.

Yes, that's the one.

Can you explain why the Power Supply might cause the checksum errors? And where to look into it to support the evidence?
 
Last edited:
Top