RAIDZ2-60-Disk-Pool Unavailable after a RAIDZ2 vdev Failed

adeelleo · Jun 11, 2022

Dice said:
How is it "mixed" with other disks?
How is it "removed" but still installed inthe system?

If correct drive, expected to contain the correct data, is already in the system, and it still did not import or was found - then there light of hope went away.

Edit,
well, maybe. Depending on exactly what happened to the drive.
Can you describe exactly what steps it went through when it was accidentally removed?
I

Thanks.

Let me explain.
Two disks were failed in the system with vdev RAIDZ2-1 that was Part of RAIDZ2-60-Disk-Pool, as shown in the Picture i shared earlier.
But the GUI was showing both failed disks with the same disk name. So we thought that was a duplicated error and only one disk failed.
We went to the 60 Disks Expansion and found that 2 Disks did not show any activity on them. The Activity LEDs were silent. We thought that these were the faulty disks from two of the vdevs in the Pool so we pulled them out to replace them but after pulling out the disks it turned out that one of the disks was actually an active disk from vdev RAIDZ2-1 and the two disks that were being reported with the same Name in the GUI were actually two different disks. This caused 3 Disks failure on the vdev and the whole pool went down. We put the two disks back in the place but not sure if they were placed in the exact same place or the places were swapped. Is there any way to identify from the logs or any other command which serial number disk was actually removed from the vdev?

Dice · Jun 11, 2022

adeelleo said:
We put the two disks back in the place but not sure if they were placed in the exact same place or the places were swapped.

Drive-bay position should not matter. Drives being swapped like so, should not make a difference, at least not on TrueNAS CORE.
In SCALE, I dont think this should not matter, even if the drives were added using /dev/sdX rather than partuuid. I expect ZFS to be sufficiently smart to recognize available drives in the system in this scenario.

adeelleo said:
Is there any way to identify from the logs or any other command which serial number disk was actually removed from the vdev?

That I don't know.
Probably something has ended up in logs, as what drive /dev/sdX, or UUID maybe. But both, or serial even, I don't think. TN dont care about serial. It cares about UUID preferably.

One of the traditionally (I say this as I recently revisited the forums from a few years hiatus), lacking features of TN has been a simple integration to make users match UUID and serial number manually. Far less, as a drive faults, it now longer can register serial etc in the gui and the ...reverse engineering from notes/labels/index of drives that is kept outside the realm of the server, is super handy.
Much has improved since - we now get a lot of information in the gui - but the need for indexes and reverse engineering on what drive is missing, where it is located in the box, what serial it has etc still needs to be managed manually.

edit:
There might be more problems related to the disk location in the box, as you've added drives previously as /dev/sde for example. GPTID is the way to create a robust pool.
If this pool can be imported somehow, I suggest you make a new search around the forums, and start the process of "cleaning house" with the drives in each vdev.
Not to mention, the need for proper alarms/scheduling etc.

Dice · Jun 22, 2022

Any updates ?

adeelleo · Jun 23, 2022

Dice said:
Any updates ?

Hi Dice.

No progress unfortunately.

I was not able to import the pool using any commands and unless the pool in imported i dont see any way of attempting a recovery.

adeelleo · Jun 23, 2022

Dice said:
Any updates ?

I can provide you remote access to the system in case you want to have a look and maybe suggest something.

Since i seem to have exhausted all options as of now.

Dice · Jun 23, 2022

adeelleo said:
Hi Dice.

No progress unfortunately.

I was not able to import the pool using any commands and unless the pool in imported i dont see any way of attempting a recovery.

Alright.

Are you planning to rebuild/reuse the hardware again?
I'd consider some design changes in that case.

Dice · Jun 23, 2022

adeelleo said:
I can provide you remote access to the system in case you want to have a look and maybe suggest something.

Since i seem to have exhausted all options as of now.

Can't help you there, sorry.
I've ran out of ideas on how to recover that situation.
It is not rocket science really, if a vdev is loosing life, it will drag down the pool.

Too bad the last lifeline was cut off by user mistakes :/

I'm only guessing - did all resilvers / drive changes happen through GUI during the lifetime of the box? I'm inclined to suspect some of them were CLI action happening, causing drives to not be registrered by GPTID rather than device/location.

adeelleo · Jun 23, 2022

Dice said:
Alright.

Are you planning to rebuild/reuse the hardware again?
I'd consider some design changes in that case.

Yes, if all hope is lost then there is no option but to rebuild from scratch.

I am considering several design and operational changes with lessons learnt from this failure:

Following is what i would implement moving forward:

1) Stay away from TrueNAS Scale for Now and go back to a stable release of TrueNAS Core.
2) Configure RAIDZ3 instead of RAIDZ2 as well as configure 2 Spares at least for the Storage.
3) Proactive monitoring of the storage (If the user had been monitoring the storage actively and replaced the first disk which failed almost 2 months before the second disk failed. All this could have been avoided.)
4) Maybe compromise on performance and configure separate pools for each vDev so that in case a vDev fails only that specific pool fails instead of the entire storage.
5) Label each Disk Caddy with the Serial Number of the Disk installed. Since its extremely difficult to identify the faulty disk in the storage since TrueNAS does not give any error indication physically on the expansion enclosure.

Any further suggestions you might have are also welcome.

Best regards,

Adeel Akram

adeelleo · Jun 23, 2022

Dice said:
Can't help you there, sorry.
I've ran out of ideas on how to recover that situation.
It is not rocket science really, if a vdev is loosing life, it will drag down the pool.

Too bad the last lifeline was cut off by user mistakes :/

I'm only guessing - did all resilvers / drive changes happen through GUI during the lifetime of the box? I'm inclined to suspect some of them were CLI action happening, causing drives to not be registrered by GPTID rather than device/location.

All resilvers / drive changes happened through GUI.

Dice · Jun 24, 2022

adeelleo said:
Yes, if all hope is lost then there is no option but to rebuild from scratch.

In my view it is, based off what I know and what you've tried.
Upon reading the complete story line again, it really seems like you've ran into a really bad bug of Scale, and much less of user error as I indicated previously.

It is really unfortunate since you have the correct data drive still in the system, that somehow does not get picked up, or has been somehow 'damaged' in zfs integrity.

My reasoning for this comment:

Dice said:
I'm only guessing - did all resilvers / drive changes happen through GUI during the lifetime of the box? I'm inclined to suspect some of them were CLI action happening, causing drives to not be registrered by GPTID rather than device/location.

Was how I did not expect to see this variation and mixing of sdX and GPTID in the same list. That normally happens when a drive is replaced by CLI and the user does not copy the partition table, and simply adds the new drive like one would do on linux.

From your assertive response;

adeelleo said:
All resilvers / drive changes happened through GUI.

That is ruled out, and points towards a bug, or unintended behavior, or something I don't understand.

adeelleo said:

From this screen, I understand it is the 46690272-2cfb-429a-84e5-c0f94a829906 that was mentioned here:

adeelleo said:
But the GUI was showing both failed disks with the same disk name. So we thought that was a duplicated error and only one disk failed.
We went to the 60 Disks Expansion and found that 2 Disks did not show any activity on them. The Activity LEDs were silent. We thought that these were the faulty disks from two of the vdevs in the Pool so we pulled them out to replace them but after pulling out the disks it turned out that one of the disks was actually an active disk from vdev RAIDZ2-1 and the two disks that were being reported with the same Name in the GUI were actually two different disks. This caused 3 Disks failure on the vdev and the whole pool went down. We put the two disks back in the place but not sure if they were placed in the exact same place or the places were swapped.

My interpretation is that the drive should still be in the system.
From the console, do you find the drive? blkid | grep 46690272-2cfb-429a-84e5-c0f94a849906 (hope I did the spelling right).
If this would somehow return a value, it would indicate the drive is ...actually there and there is a SCALE issue to finding it.
In that case, there is some hope again.
(The general outline of such hope would be to reinstall TN13CORE, and attempt to import the pool.)

Dice · Jun 24, 2022

adeelleo said:
Following is what i would implement moving forward:

1) Stay away from TrueNAS Scale for Now and go back to a stable release of TrueNAS Core.
2) Configure RAIDZ3 instead of RAIDZ2 as well as configure 2 Spares at least for the Storage.
3) Proactive monitoring of the storage (If the user had been monitoring the storage actively and replaced the first disk which failed almost 2 months before the second disk failed. All this could have been avoided.)
4) Maybe compromise on performance and configure separate pools for each vDev so that in case a vDev fails only that specific pool fails instead of the entire storage.
5) Label each Disk Caddy with the Serial Number of the Disk installed. Since its extremely difficult to identify the faulty disk in the storage since TrueNAS does not give any error indication physically on the expansion enclosure.

Any further suggestions you might have are also welcome.

I'm pleased to read this :)

I'll offer my opinion:

1) agree.
2) There is a few things to unpack here.
2a) Z3 vs Z2. If the case for choosing Z3 is to 'alleviate the problem when admin does not replace failed drives in time' I don't necessarily agree. IMO, the primary case for Z3 is to enable higher space utilization (wider vdevs = more drives per vdev) and accept slower performance. The secondary case would be to alleviate long resilvertimes due to large drives (which also depends somewhat on pool utilization characteristics etc).
2b) Case utilization. Your box carries 60 drives. I'd like to see a "non maxed out" configuration of vdevs, to allow space for spares in the system. I dont think "Hot Spares" makes a lot of sense other than in mirrored vdev configurations. From my experience the hotspare tends to get put into use, sometimes prematurely. For example a read/write error of a drive (1 single bit) may cause the entire process resilvering process to kick in, when the failure is maybe a single point of a single drive that can continue working for years - with the exception of having this "bit flip" corrected on the regular scheduled scrub. I've such a drive I monitor in my system. I could've replaced it already a year ago, yet it still works completely fine.
In your case - I do advocate having a few drives "non assigned" ready to go in the enclosure.
I'll come back to the configuration raidz2 vs z3 and vdev width.

3) YES!
There are a few things that (once upon a time... required manual setting up, but these days comes with somewhat sane defaults that can be tweaked).
3a) Check the Alert settings, and provide an email. I'd suggest being modest with the "warnings" to make sure there are only alarms happening on which you actually would act. On top of that, I've worked a little bit with my email inbox filtering, to make sure the emails get the proper attention.
3b) Check for the SMART status email scripts on the forums. There are a few variations. They are setup like a cronjob through the gui. I use one that basically pulls SMART data from every drive and dumps it in an email. The benefit lays in providing a history, in your mailbox of SMART data. I've managed to set this email apart to an "skip inbox" and only save in a tagged folder on gmail. I run this once every 2 weeks I think. This helps to find and track if a drive starts getting worse on certain parameters, rather than relying on "Not passing SMART tests". TrueNAS-core (scale too?) are both really not good enough on showing SMART data in the GUI to the user. This is unfortunate, but a reality that needs to be handled.
Particularly in your case, where drives are getting old.
3c) MAke sure all drives are getting the SMART data check. (Tasks -> SMART tests)

4) I disagree. This will leave huge amounts of space "dead", as each pool really ideally should not surpass some 80% utilization for performance, or 90% for sanity...

5) YES!
5b) Also make sure to have accessible the GPTID of each drive. This is what you'll need to find the correct drives in case of failures. The GUI will give you the GPTID of the missing drive. ....yeah I guess you've discovered how this works by now. No need to tell you the details :)
I mention it here to make the list/thoughts slightly more complete for the next reader :)

Pool layout idea/discussion.
In case of continuing with raidz2, Currently, 10drives wide. 6 vdevs (comparatively from calculator; usable ~110TB)

I'd look for a more conservative layout width that also leaves a few spots non active in the enclosure, where drives can live until needed (sort of "passive warm spare") without being "automatically" replaced (that mechanic in TN is not as magic as it may seem and still requires manual interventions)

I see three routes, two includes minimum 2-4 "warn spares".
First route;
Z2: width of 8. 7x vdevs. (56 drives) 4 warm spares. (comparatively; usable ~98TB)

Second route:
Z3: width of 11. 5x vdevs. (55 drives) 5 warm spares. (comparatively; usable ~95TB)

Third route;
If you consider the overall drive health to be rather good, there is also a nice fit to run with 15 wide, 4vdevs, raidz3. (comparatively; usable ~112TB)
This would probably align better with the general opinion on the forum, to steer away from thinking about hot/warm spares on raidz2, and instead focus on raidz3. This wider vdev would also entail a scenario where the risk during resilvers will be prolonged due to the wider vdev. More drives will be put under heavy stress during each resilver, and failures likely to lurk.

The reasoning to why I'm putting forward the 'warm spares' solution is because I assume that the 3tb drives have been in use a fair while, and already have failed quite a few of them recently. I'd be more comfortable in less wide vdevs, and more of them. I'd be more comfortable with having a few 'warm spares' ready.

My reasoning is based on the assumption that you'd like to have this hardware to perform a while longer, rather than decommissioning it any time soon, based on the age, and thus accepts lower total capacity)

Hope this can help your thought process :)

Cheers

Yorick · Jun 24, 2022

Excellent points by @Dice all. Indeed raidz3 does not save you from "admin doesn't replace failing drives". While you are reconsidering options here, with this many drives, I'd also consider draid. Distributed spares, near-instantaneous rebuild (hours not days), and same performance as a pool of multiple raidz vdevs. You'd still need to replace failed drives in a timely manner so you have distributed spares again.

adeelleo · Jun 28, 2022

Yorick said:
Excellent points by @Dice all. Indeed raidz3 does not save you from "admin doesn't replace failing drives". While you are reconsidering options here, with this many drives, I'd also consider draid. Distributed spares, near-instantaneous rebuild (hours not days), and same performance as a pool of multiple raidz vdevs. You'd still need to replace failed drives in a timely manner so you have distributed spares again.

Thanks Yorick.

Didnt know TrueNAS Supported DRAID.

DRAID seems like a good option.

adeelleo · Jun 28, 2022

Dice said:
I'm pleased to read this :)

I'll offer my opinion:

1) agree.
2) There is a few things to unpack here.
2a) Z3 vs Z2. If the case for choosing Z3 is to 'alleviate the problem when admin does not replace failed drives in time' I don't necessarily agree. IMO, the primary case for Z3 is to enable higher space utilization (wider vdevs = more drives per vdev) and accept slower performance. The secondary case would be to alleviate long resilvertimes due to large drives (which also depends somewhat on pool utilization characteristics etc).
2b) Case utilization. Your box carries 60 drives. I'd like to see a "non maxed out" configuration of vdevs, to allow space for spares in the system. I dont think "Hot Spares" makes a lot of sense other than in mirrored vdev configurations. From my experience the hotspare tends to get put into use, sometimes prematurely. For example a read/write error of a drive (1 single bit) may cause the entire process resilvering process to kick in, when the failure is maybe a single point of a single drive that can continue working for years - with the exception of having this "bit flip" corrected on the regular scheduled scrub. I've such a drive I monitor in my system. I could've replaced it already a year ago, yet it still works completely fine.
In your case - I do advocate having a few drives "non assigned" ready to go in the enclosure.
I'll come back to the configuration raidz2 vs z3 and vdev width.

3) YES!
There are a few things that (once upon a time... required manual setting up, but these days comes with somewhat sane defaults that can be tweaked).
3a) Check the Alert settings, and provide an email. I'd suggest being modest with the "warnings" to make sure there are only alarms happening on which you actually would act. On top of that, I've worked a little bit with my email inbox filtering, to make sure the emails get the proper attention.
3b) Check for the SMART status email scripts on the forums. There are a few variations. They are setup like a cronjob through the gui. I use one that basically pulls SMART data from every drive and dumps it in an email. The benefit lays in providing a history, in your mailbox of SMART data. I've managed to set this email apart to an "skip inbox" and only save in a tagged folder on gmail. I run this once every 2 weeks I think. This helps to find and track if a drive starts getting worse on certain parameters, rather than relying on "Not passing SMART tests". TrueNAS-core (scale too?) are both really not good enough on showing SMART data in the GUI to the user. This is unfortunate, but a reality that needs to be handled.
Particularly in your case, where drives are getting old.
3c) MAke sure all drives are getting the SMART data check. (Tasks -> SMART tests)

4) I disagree. This will leave huge amounts of space "dead", as each pool really ideally should not surpass some 80% utilization for performance, or 90% for sanity...

5) YES!
5b) Also make sure to have accessible the GPTID of each drive. This is what you'll need to find the correct drives in case of failures. The GUI will give you the GPTID of the missing drive. ....yeah I guess you've discovered how this works by now. No need to tell you the details :)
I mention it here to make the list/thoughts slightly more complete for the next reader :)

Pool layout idea/discussion.
In case of continuing with raidz2, Currently, 10drives wide. 6 vdevs (comparatively from calculator; usable ~110TB)

I'd look for a more conservative layout width that also leaves a few spots non active in the enclosure, where drives can live until needed (sort of "passive warm spare") without being "automatically" replaced (that mechanic in TN is not as magic as it may seem and still requires manual interventions)

I see three routes, two includes minimum 2-4 "warn spares".
First route;
Z2: width of 8. 7x vdevs. (56 drives) 4 warm spares. (comparatively; usable ~98TB)

Second route:
Z3: width of 11. 5x vdevs. (55 drives) 5 warm spares. (comparatively; usable ~95TB)

Third route;
If you consider the overall drive health to be rather good, there is also a nice fit to run with 15 wide, 4vdevs, raidz3. (comparatively; usable ~112TB)
This would probably align better with the general opinion on the forum, to steer away from thinking about hot/warm spares on raidz2, and instead focus on raidz3. This wider vdev would also entail a scenario where the risk during resilvers will be prolonged due to the wider vdev. More drives will be put under heavy stress during each resilver, and failures likely to lurk.

The reasoning to why I'm putting forward the 'warm spares' solution is because I assume that the 3tb drives have been in use a fair while, and already have failed quite a few of them recently. I'd be more comfortable in less wide vdevs, and more of them. I'd be more comfortable with having a few 'warm spares' ready.

My reasoning is based on the assumption that you'd like to have this hardware to perform a while longer, rather than decommissioning it any time soon, based on the age, and thus accepts lower total capacity)

Hope this can help your thought process :)

Cheers

Excellent suggestions Dice.

I would surely incorporate all suggestions into the next deployment.

What do you think about DRAID as Yorick suggested?

Dice · Jun 28, 2022

adeelleo said:
What do you think about DRAID as Yorick suggested?

On paper it looks fantastic. Like in the OpenZFS manual.

The down side is that it is super new.
Reasons to get dRAID would only apply to power/testers that have rather large pools at hand (ie, more than a few vdevs), that happened to be ready to try this out after TN13CORE just got released.
A few threads of speculation, there is basically no action on the forums on dRAID yet.

To bring a "scale" to how new it is;
It's not even mentinoned in the manual for vdev layouts yet:

Pools

Tutorials about managing storage pools in TrueNAS CORE.

www.truenas.com

I'd stay away from dRAID in any sort of production until the 'support supply lines' have thickened a bit.

adeelleo · Jun 29, 2022

Dice said:
On paper it looks fantastic. Like in the OpenZFS manual.

The down side is that it is super new.
Reasons to get dRAID would only apply to power/testers that have rather large pools at hand (ie, more than a few vdevs), that happened to be ready to try this out after TN13CORE just got released.
A few threads of speculation, there is basically no action on the forums on dRAID yet.

To bring a "scale" to how new it is;
It's not even mentinoned in the manual for vdev layouts yet:

Pools

Tutorials about managing storage pools in TrueNAS CORE.

www.truenas.com

I'd stay away from dRAID in any sort of production until the 'support supply lines' have thickened a bit.

Thanks. That really does make sense.

SnoppyFloppy · Jul 1, 2022

AlexGG said:
I was thinking more of working through all 60 drives regardless of the state reported by ZFS. I fully expect there to be more problems than reported, and these will interfere with whatever recovery attempts may be taken.

I wonder, is there any change that cloning an OFFLINE disk with SMART errors to a fresh drive can make TN succeed with the resilvering?

Johnny Fartpants · Jul 1, 2022

using glabel to name your drives before building the pool is also not a bad idea that way you will know which drive is located in which slot avoiding the issue in the first place.

AlexGG · Jul 1, 2022

SnoppyFloppy said:
I wonder, is there any change that cloning an OFFLINE disk with SMART errors to a fresh drive can make TN succeed with the resilvering?

I don't know. "zpool clear" maybe, if the pool is not too far gone already. Maybe.

adeelleo · Jul 5, 2022

Dice said:
On paper it looks fantastic. Like in the OpenZFS manual.

The down side is that it is super new.
Reasons to get dRAID would only apply to power/testers that have rather large pools at hand (ie, more than a few vdevs), that happened to be ready to try this out after TN13CORE just got released.
A few threads of speculation, there is basically no action on the forums on dRAID yet.

To bring a "scale" to how new it is;
It's not even mentinoned in the manual for vdev layouts yet:

Pools

Tutorials about managing storage pools in TrueNAS CORE.

www.truenas.com

I'd stay away from dRAID in any sort of production until the 'support supply lines' have thickened a bit.

Hi Dice,

After a couple of reboots the machine was not booting at all since it could not even find the boot-pol.

So i entered the below two commands to import the pools directly from he console in front of the server:

zpool import -f "boot-pool"
zpool import -f "RAIDZ2-60-Disk-Pool"

boot-pool was available immediately.
RAIDZ2-60-Disk-Pool took several hours to get imported. Its visible in the GUI once more but now its showing lots of disks as degraded. These many disks cant go faulty so quickly. This is looking like a bug in TrueNAS Scale.

below is the output of the zpool status command.

root@truenas[~]# zpool status
pool: RAIDZ2-60-Disk-Pool
state: UNAVAIL
status: One or more devices could not be used because the label is missing
or invalid. There are insufficient replicas for the pool to continue
functioning.
action: Destroy and re-create the pool from
a backup source.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
scan: scrub in progress since Tue Jul 5 06:22:29 2022
15.7T scanned at 23.0G/s, 182G issued at 267M/s, 34.6T total
0B repaired, 0.51% done, 1 days 13:30:47 to go
config:

NAME STATE READ WRITE CKSUM
RAIDZ2-60-Disk-Pool UNAVAIL 0 0 0 insufficient replicas
raidz2-0 ONLINE 0 0 0
sddo2 ONLINE 0 0 0
sdct2 ONLINE 0 0 0
sdci2 ONLINE 0 0 0
sdbz2 ONLINE 0 0 0
sdcn2 ONLINE 0 0 0
bac37446-6f50-490c-a44f-c4e1ab8f7ef8 ONLINE 0 0 0
5b1a6483-c445-4a24-a224-4d0456e45849 ONLINE 0 0 0
sdbw2 ONLINE 0 0 0
sddk2 ONLINE 0 0 0
4b0d8331-a6b8-4c95-9b24-57d83637b40a ONLINE 0 0 0
raidz2-1 UNAVAIL 0 0 0 insufficient replicas
sddn2 DEGRADED 0 0 78 too many errors
7943903d-1350-4525-8990-24f0af5f369e DEGRADED 0 0 78 too many errors
8104194342383810658 FAULTED 0 0 0 was /dev/disk/by-partuuid/9ef29c28-f765-4437-bd47-3ed024c0d304
9ef29c28-f765-4437-bd47-3ed024c0d304 FAULTED 0 0 78 corrupted data
sdcu2 DEGRADED 0 0 78 too many errors
824b295c-2005-47c9-a69b-736177172a3b DEGRADED 0 0 78 too many errors
9cdf0691-7bc9-49c9-abb2-83c4ebf9360f DEGRADED 0 0 78 too many errors
0b47dc37-fa92-46a4-9ceb-c75509fd3076 DEGRADED 0 0 78 too many errors
5901721142178421157 UNAVAIL 0 0 0 was /dev/disk/by-partuuid/46690272-2cfb-429a-84e5-c0f94a849906
sddj2 DEGRADED 0 0 78 too many errors
raidz2-2 DEGRADED 0 0 0
96f6cc99-3f3d-40a1-87fe-eb137b51e060 ONLINE 0 0 0
7ac064ea-5541-4224-b011-5a4b857d3002 ONLINE 0 0 0
7924a000-ead1-423d-90e2-4d34cc4f1256 ONLINE 0 0 0
sdde2 ONLINE 0 0 0
sddc2 ONLINE 0 0 0
sddl2 ONLINE 0 0 0
sdr2 ONLINE 0 0 0
sddb2 ONLINE 0 0 0
ef511943-e2a1-4d20-9d06-f8297552e06d DEGRADED 0 0 0 too many errors
sddg2 ONLINE 0 0 0
raidz2-3 DEGRADED 0 0 0
sddq2 ONLINE 0 0 0
sdcz2 ONLINE 0 0 0
sddh2 ONLINE 0 0 0
sde2 ONLINE 0 0 0
e44b8008-259d-4523-9368-9cbb54dfee47 ONLINE 0 0 0
839e152b-721c-4ca4-83f8-3d2abfe5ebe8 ONLINE 0 0 0
sddm2 ONLINE 0 0 0
sdda2 ONLINE 0 0 0
sdf2 ONLINE 0 0 0
d785063a-f5ee-455d-ae58-f0326863aef6 DEGRADED 0 0 0 too many errors
raidz2-4 DEGRADED 0 0 0
sdl2 DEGRADED 0 0 0 too many errors
sdj2 DEGRADED 0 0 0 too many errors
0b4ceca1-e8bb-4717-9c8c-2dec6e61b1f0 DEGRADED 0 0 0 too many errors
262f80e0-e7da-4d34-8d76-4bd659c4a20d ONLINE 0 0 1
de61ac8f-1011-41a6-aa97-bf0775d95c0e ONLINE 0 0 1
sdn2 DEGRADED 0 0 0 too many errors
sdi2 DEGRADED 0 0 0 too many errors
sdg2 DEGRADED 0 0 0 too many errors
207accf8-c935-4850-bf56-429536b4dd0a DEGRADED 0 0 1.33K too many errors
b0a81c8a-5f74-4c4b-b540-d306a92338bb DEGRADED 0 0 0 too many errors
raidz2-5 ONLINE 0 0 0
sdv2 ONLINE 0 0 0
sdu2 ONLINE 0 0 0
sdt2 ONLINE 0 0 0
sdx2 ONLINE 0 0 0
sds2 ONLINE 0 0 0
sdz2 ONLINE 0 0 0
sdw2 ONLINE 0 0 0
ac866713-0d8a-4123-9430-e59b1e6985b0 ONLINE 0 0 0
sdy2 ONLINE 0 0 0
de42b7b5-b53a-4507-af4f-e64c33183db3 ONLINE 0 0 1

errors: 937197 data errors, use '-v' for a list

pool: boot-pool
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 00:00:45 with 0 errors on Mon Jul 4 03:45:47 2022
config:

NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
sda3 ONLINE 0 0 0

errors: No known data errors
root@truenas[~]# 2022 Jul 5 06:35:26 truenas Device: /dev/sdcy, new Self-Test Log error at hour timestamp 54600

Important Announcement for the TrueNAS Community.

RAIDZ2-60-Disk-Pool Unavailable after a RAIDZ2 vdev Failed

Dabbler

Wizard

Wizard

Dabbler

Dabbler

Wizard

Wizard

Dabbler

Dabbler

Wizard

Wizard

Wizard

Dabbler

Dabbler

Wizard

Dabbler

Explorer

Guru

Contributor

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "RAIDZ2-60-Disk-Pool Unavailable after a RAIDZ2 vdev Failed"

Similar threads