Unrecoverable Checksum Error - best course of action

paulinventome

Explorer
Joined
May 18, 2015
Messages
62
Hi,

So I have 3 SSD running on ZRaid. One of them disappeared which I think was cable related as I put on a cover, so may not be the drive. Shut down, rebooted and I get this on the Pool

Pool Core state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected​


And I see 1 checksum error on the disc that disappeared.

What I don't understand is what I should do next? Can I clear the error or should I rebuild the whole pool? Which is early days and everything is backed up so it's just a time issue?

Thanks!
Paul
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
More info on your hardware, please, per the Forum Rules.
 

paulinventome

Explorer
Joined
May 18, 2015
Messages
62

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399

Unfortunately, this is why we don't run RAID controllers in JBOD mode. It will appear to run fine initially, but will inevitably go south because the RAID controller and OpenZFS will try to do the same things while unaware of what the other is doing.

You should back up your data, switch to an HBA in IT mode, and recreate your pool and reload from backup.
 

paulinventome

Explorer
Joined
May 18, 2015
Messages
62

Unfortunately, this is why we don't run RAID controllers in JBOD mode. It will appear to run fine initially, but will inevitably go south because the RAID controller and OpenZFS will try to do the same things while unaware of what the other is doing.

You should back up your data, switch to an HBA in IT mode, and recreate your pool and reload from backup.
Hi Samuel,

Think this might be the wrong thread for this reply! I'm not running a RAID controller in this case, so it's all being managed by TrueNAS

Kindest
Paul
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Your signature says you have a Highpoint RAID controller in your system.
 

paulinventome

Explorer
Joined
May 18, 2015
Messages
62
Your signature says you have a Highpoint RAID controller in your system.
My bad, I wrote that too quickly, it's an HBA Card from LSI. I have no hardware raid I am just using TrueNAS as recommended.

In the end as I hadn't fully copied over what's living on the SSDs I rebuilt the pool. But I would like to know in the future in this case whether this is resettable, fixable or a case for a rebuild. This error 99.9% sure was caused by a dodgy cable connection. SMART reports all the drives as being healthy (they're all new)

Kindest
Paul
 
Top