Resync/Repair back one disk in Raidz2 pool

Peter Jakab · Feb 8, 2024

Hi All,

In the middle of the migration project (from 11.2 to the latest version via reinstall on new disk and import pools) I am running to little mistake.
When I imported the pool in the new TrueNAS core I seen pool degraded due one the the member disk from the 6qty of Raidz2 disconnected. After shutdown I see that is caused by the power connector due simple disconnected from the those disk. So that is caused one of the disk dropped out from synchronization of the pool.

I am booted back to the old 11.2 FreeNAS after the disk power connected back but those pool - of course - showing as degraded. They recommend to perform zpool clear or zpool replace (but no new disk in our case)
Those checksum error show the out-of-sync da02 disk

pool: TriSister
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: resilvered 14.5M in 0 days 00:00:02 with 0 errors on Thu Feb 8 12:39:27 2024
config:

NAME STATE READ WRITE CKSUM
TriSister ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/6c76a843-d0eb-11e5-b759-6805ca39b31e ONLINE 0 0 0
gptid/83fab4d6-896f-11e5-adf3-6805ca39b31e ONLINE 0 0 0
gptid/0beb4ce8-71b6-11e9-a83a-0cc47a500de0 ONLINE 0 0 0
gptid/a53eaf0e-b55d-11ee-99fd-0cc47a500de0 ONLINE 0 0 16
gptid/e7efe02a-019f-11e6-a4cd-6805ca39b31e ONLINE 0 0 0
gptid/99a7568a-1b7f-11ea-9e7d-0cc47a500de0 ONLINE 0 0 0

errors: No known data errors

I am not used the zpool clear due those disk definitely have bad data (independently I did not read or write on those pool). I dont have resilver possibility due those disk not a new one as "replace".
I tried to used the GUI scrub to repair those small difference first. But those are stopped to repairing at the moment (if I understand those message below correctly) due "too many errors".

This is my actual status of the pool in the middle of scrub/repair process

zpool status -x
pool: TriSister
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub in progress since Thu Feb 8 12:54:24 2024
7.97T scanned at 1.37G/s, 3.29T issued at 577M/s, 9.20T total
352K repaired, 35.74% done, 0 days 02:59:03 to go
config:

NAME STATE READ WRITE CKSUM
TriSister DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/6c76a843-d0eb-11e5-b759-6805ca39b31e ONLINE 0 0 0
gptid/83fab4d6-896f-11e5-adf3-6805ca39b31e ONLINE 0 0 0
gptid/0beb4ce8-71b6-11e9-a83a-0cc47a500de0 ONLINE 0 0 0
gptid/a53eaf0e-b55d-11ee-99fd-0cc47a500de0 DEGRADED 0 0 103 too many errors (repairing)
gptid/e7efe02a-019f-11e6-a4cd-6805ca39b31e ONLINE 0 0 0
gptid/99a7568a-1b7f-11ea-9e7d-0cc47a500de0 ONLINE 0 0 0

errors: No known data errors

Am I wrong when I try to repair with scrub? What is the general recommended steps necessary to repair/correct those small re-sync issue?

Peter Jakab · Feb 8, 2024

Problem solved.
Scrub repaired the problems. Just wrote those "too many errors (repairing)" message to the disk line in the zpool status output. On GUI those are not visible just the DEGRADED message on disk line, so to solve this have to use the CLI exclusively.
856KByte (227 checksum) repaired out of 2000MByte (2TB disk) not a bad ratio. That is the "price" paid for boot not used pool with lost disk.
Here is when scrub ended status

zpool status -x
pool: TriSister
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub repaired 856K in 0 days 05:13:27 with 0 errors on Thu Feb 8 18:07:51 2024
config:

NAME STATE READ WRITE CKSUM
TriSister DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/6c76a843-d0eb-11e5-b759-6805ca39b31e ONLINE 0 0 0
gptid/83fab4d6-896f-11e5-adf3-6805ca39b31e ONLINE 0 0 0
gptid/0beb4ce8-71b6-11e9-a83a-0cc47a500de0 ONLINE 0 0 0
gptid/a53eaf0e-b55d-11ee-99fd-0cc47a500de0 DEGRADED 0 0 227 too many errors
gptid/e7efe02a-019f-11e6-a4cd-6805ca39b31e ONLINE 0 0 0
gptid/99a7568a-1b7f-11ea-9e7d-0cc47a500de0 ONLINE 0 0 0

errors: No known data errors

And here is after command
zpool clear TriSister

pool: TriSister
state: ONLINE
scan: scrub repaired 856K in 0 days 05:13:27 with 0 errors on Thu Feb 8 18:07:51 2024
config:

NAME STATE READ WRITE CKSUM
TriSister ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/6c76a843-d0eb-11e5-b759-6805ca39b31e ONLINE 0 0 0
gptid/83fab4d6-896f-11e5-adf3-6805ca39b31e ONLINE 0 0 0
gptid/0beb4ce8-71b6-11e9-a83a-0cc47a500de0 ONLINE 0 0 0
gptid/a53eaf0e-b55d-11ee-99fd-0cc47a500de0 ONLINE 0 0 0
gptid/e7efe02a-019f-11e6-a4cd-6805ca39b31e ONLINE 0 0 0
gptid/99a7568a-1b7f-11ea-9e7d-0cc47a500de0 ONLINE 0 0 0

errors: No known data errors

So my expectation about process scrub followed by clear was correct.
Keep in mind my case the da02 disk not degraded just out-of-sync due under power-off the power cable disconnected.

NugentS · Feb 8, 2024

You could have removed the disk from the pool and then replaced it with itself which would have caused a resilver

Important Announcement for the TrueNAS Community.

Resync/Repair back one disk in Raidz2 pool

Peter Jakab

Dabbler

Peter Jakab

Dabbler

NugentS

MVP

Similar threads

Important Announcement for the TrueNAS Community.

Resync/Repair back one disk in Raidz2 pool

Peter Jakab

Dabbler

Peter Jakab

Dabbler

NugentS

MVP

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Resync/Repair back one disk in Raidz2 pool"

Similar threads