Raid unavailable after power failure during disk resilver

Joined
Jan 3, 2019
Messages
8
Hi,

We have a raidz2 of 96 disks split into 12 vdevs of 8 disks.

One disk failed so we've started the resilver process during which we had a power failure that lasted longer than our UPS capacity.
After the restart, the resilver process continued and is now finished with errors.

3 disks of the 8 in the concerned vdev are not flag as Faulted:
raidz2-3 UNAVAIL 0 0 0 insufficient replicas
gptid/8466233c-602c-11e9-87aa-0cc47a2bae8e FAULTED 6 0 0 too many errors
gptid/2ed43b07-5e8c-11ed-a369-0cc47a2bae8e ONLINE 0 0 0
gptid/8f53438e-b507-11e9-8080-0cc47a2bae8e ONLINE 0 0 0
gptid/36fb7c60-9e11-11ec-ae70-0cc47a2bae8e FAULTED 3 0 0 too many errors
gptid/8aa8efd3-4eeb-11e9-925a-0cc47a2bae8e ONLINE 0 0 0
gptid/d23146fe-8004-11eb-a48f-0cc47a2bae8e ONLINE 0 0 0
gptid/eab1e913-6fc6-11ed-a369-0cc47a2bae8e ONLINE 0 0 0
gptid/9648d239-643e-11ed-a369-0cc47a2bae8e FAULTED 3 0 0 too many errors

When trying to switch one of those disks Online, I get the following error:
Dec 3 12:21:19 backupserver GEOM_MIRROR: Device mirror/swap0 launched (3/3).
Dec 3 12:21:19 backupserver GEOM_MIRROR: Device mirror/swap1 launched (3/3).
Dec 3 12:21:19 backupserver GEOM_MIRROR: Device mirror/swap2 launched (3/3).
Dec 3 12:21:19 backupserver GEOM_MIRROR: Device mirror/swap3 launched (3/3).
Dec 3 12:21:19 backupserver GEOM_MIRROR: Device mirror/swap4 launched (3/3).
Dec 3 12:21:19 backupserver GEOM_ELI: Device mirror/swap0.eli created.
Dec 3 12:21:19 backupserver GEOM_ELI: Encryption: AES-XTS 128
Dec 3 12:21:19 backupserver GEOM_ELI: Crypto: accelerated software
Dec 3 12:21:19 backupserver GEOM_ELI: Device mirror/swap1.eli created.
Dec 3 12:21:19 backupserver GEOM_ELI: Encryption: AES-XTS 128
Dec 3 12:21:19 backupserver GEOM_ELI: Crypto: accelerated software
Dec 3 12:21:19 backupserver GEOM_ELI: Device mirror/swap2.eli created.
Dec 3 12:21:19 backupserver GEOM_ELI: Encryption: AES-XTS 128
Dec 3 12:21:19 backupserver GEOM_ELI: Crypto: accelerated software
Dec 3 12:21:19 backupserver GEOM_ELI: Device mirror/swap3.eli created.
Dec 3 12:21:19 backupserver GEOM_ELI: Encryption: AES-XTS 128
Dec 3 12:21:19 backupserver GEOM_ELI: Crypto: accelerated software
Dec 3 12:21:19 backupserver GEOM_ELI: Device mirror/swap4.eli created.
Dec 3 12:21:19 backupserver GEOM_ELI: Encryption: AES-XTS 128
Dec 3 12:21:19 backupserver GEOM_ELI: Crypto: accelerated software
Dec 3 12:21:21 backupserver 1 2022-12-03T12:21:21.303597+01:00 backupserver.localdomain savecore 67193 - - error reading last dump header at offset 2147483136 in /dev/da131p1: Device not configured
Dec 3 12:21:21 backupserver 1 2022-12-03T12:21:21.352519+01:00 backupserver.localdomain savecore 67200 - - error reading last dump header at offset 2147483136 in /dev/da127p1: Device not configured
Dec 3 12:21:21 backupserver 1 2022-12-03T12:21:21.449793+01:00 backupserver.localdomain savecore 67214 - - error reading last dump header at offset 2147483136 in /dev/da124p1: Device not configured
Dec 3 12:21:23 backupserver 1 2022-12-03T12:21:23.103985+01:00 backupserver.localdomain savecore 67380 - - error reading last dump header at offset 2147483136 in /dev/da131p1: Device not configured
Dec 3 12:21:23 backupserver 1 2022-12-03T12:21:23.152728+01:00 backupserver.localdomain savecore 67387 - - error reading last dump header at offset 2147483136 in /dev/da127p1: Device not configured
Dec 3 12:21:23 backupserver 1 2022-12-03T12:21:23.248229+01:00 backupserver.localdomain savecore 67402 - - error reading last dump header at offset 2147483136 in /dev/da124p1: Device not configured

Could we still hope to resilver one of those disks or it won't be possible at all ?

Kind regards,
Kevin
 
Last edited:

c77dk

Patron
Joined
Nov 27, 2019
Messages
468
3 disks faulted - hope you have a good backup.

That aside - what hardware are we talking about (forum rules) ?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Yes, hardware configuration please.

Having 3 disks in a single 8 disk vDev die at once is pretty odd.

Did you schedule regular scrubs?
Like at least every 2-3 weeks?
And SMART tests too?


Standard response to power loss with ZFS:
ZFS was specifically designed to have zero data loss on unexpected power offs,
(aka crashes). The only data you can loose, is data in flight, (just like any
other file system). When a crash occurs during writes, either the full set of
data was written and available afterwards. Or none in flight data is
available.

Their are exceptions to this, bad hardware. If;

A storage device lies about flushing it's write cache
Drive re-ordering writes
Using write cache based hardware RAID controller
Or potentially non-ECC RAM with errors

When Sun Microsystems designed and tested ZFS, they did not anticipate the
massive numbers of users on home & consumer hardware. Thus, those exceptions
generally don't apply to actual server grade hardware designed for NAS uses.
 
Top