RAIDZ2-60-Disk-Pool Unavailable after a RAIDZ2 vdev Failed

adeelleo · Jun 1, 2022

Hi Guys,

I have a 60 x 3TB Disks Pool
The pool has 6 x vDevs with 10 Disks each.

I recently observed several disk failures. A disk failed in all vDevs except one.
The second vDev had two disks in "FAULTED" state and one disk in "DEGRADED" state. the overall status of the vDev was "DEGRADED".

Below is a screenshot of the GUI. As per the GUI two "FAULTED" Disks have the same name "sdcr" from which i assumed that this was the same disk. The GUI was just reporting it twice. So i went ahead and replaced the "DEGRADED" Disk first with a new disk. It started the Resilvering process but it took the whole vDev offline and with it the entire pool. The Resilvering took the whole night but the replaced disks still shows as "UNAVAIL" and the vDev and Pool is still offline.

Below is the output for zpool status:

root@truenas[/]# zpool status RAIDZ2-60-Disk-Pool
pool: RAIDZ2-60-Disk-Pool
state: UNAVAIL
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed Jun 1 18:24:49 2022
20.0T scanned at 2.74G/s, 17.0T issued at 2.34G/s, 34.6T total
54.6G resilvered, 49.33% done, 02:07:51 to go
config:

NAME STATE READ WRITE CKSUM
RAIDZ2-60-Disk-Pool UNAVAIL 0 0 0 insufficient replicas
raidz2-0 ONLINE 0 0 0
sddo2 ONLINE 0 0 0
sdct2 ONLINE 0 0 0
sdci2 ONLINE 0 0 0
sdbz2 ONLINE 0 0 0
sdcn2 ONLINE 0 0 0
bac37446-6f50-490c-a44f-c4e1ab8f7ef8 ONLINE 0 0 0
5b1a6483-c445-4a24-a224-4d0456e45849 ONLINE 0 0 0
sdbx2 ONLINE 0 0 0
sddk2 ONLINE 0 0 0
4b0d8331-a6b8-4c95-9b24-57d83637b40a ONLINE 0 0 0
raidz2-1 UNAVAIL 260 2 0 insufficient replicas
sddn2 DEGRADED 0 0 2.00M too many errors
7943903d-1350-4525-8990-24f0af5f369e DEGRADED 0 0 2.00M too many errors
8104194342383810658 FAULTED 0 0 0 was /dev/disk/by-partuuid/9ef29c28-f765-4437-bd47-3ed024c0d304
9ef29c28-f765-4437-bd47-3ed024c0d304 FAULTED 0 0 0 corrupted data
sdcu2 DEGRADED 0 0 2.00M too many errors
824b295c-2005-47c9-a69b-736177172a3b DEGRADED 0 0 2.00M too many errors
9cdf0691-7bc9-49c9-abb2-83c4ebf9360f DEGRADED 0 0 2.00M too many errors
0b47dc37-fa92-46a4-9ceb-c75509fd3076 DEGRADED 0 0 2.00M too many errors
46690272-2cfb-429a-84e5-c0f94a849906 UNAVAIL 266 428 2.00M
sddj2 DEGRADED 0 0 2.00M too many errors
raidz2-2 DEGRADED 0 0 0
96f6cc99-3f3d-40a1-87fe-eb137b51e060 ONLINE 0 0 0
7ac064ea-5541-4224-b011-5a4b857d3002 ONLINE 0 0 0
7924a000-ead1-423d-90e2-4d34cc4f1256 ONLINE 0 0 0
sdde2 ONLINE 0 0 0
sddc2 ONLINE 0 0 0
sddl2 ONLINE 0 0 0
sddd2 UNAVAIL 265 723 0
sddb2 ONLINE 0 0 0
ef511943-e2a1-4d20-9d06-f8297552e06d DEGRADED 0 0 116K too many errors
sddg2 ONLINE 0 0 0
raidz2-3 DEGRADED 0 0 0
sddq2 ONLINE 0 0 0
sdcz2 ONLINE 0 0 0
sddh2 ONLINE 0 0 0
sde2 ONLINE 0 0 0
e44b8008-259d-4523-9368-9cbb54dfee47 ONLINE 0 0 0
839e152b-721c-4ca4-83f8-3d2abfe5ebe8 ONLINE 0 0 0
sddm2 ONLINE 0 0 0
sdda2 ONLINE 0 0 0
82bd5a27-c761-48ea-8f79-17b4f5533722 ONLINE 0 0 0
d785063a-f5ee-455d-ae58-f0326863aef6 DEGRADED 0 0 118K too many errors
raidz2-4 DEGRADED 0 0 0
sdm2 ONLINE 0 0 0
sdj2 ONLINE 0 0 0
0b4ceca1-e8bb-4717-9c8c-2dec6e61b1f0 ONLINE 0 0 0
262f80e0-e7da-4d34-8d76-4bd659c4a20d ONLINE 0 0 0
de61ac8f-1011-41a6-aa97-bf0775d95c0e ONLINE 0 0 0
sdn2 ONLINE 0 0 0
sdi2 ONLINE 0 0 0
sdg2 ONLINE 0 0 0
10244600934143008752 UNAVAIL 0 0 0 was /dev/disk/by-partuuid/207accf8-c935-4850-bf56-429536b4dd0a
b0a81c8a-5f74-4c4b-b540-d306a92338bb ONLINE 0 0 1
raidz2-5 DEGRADED 0 0 0
sdv2 ONLINE 0 0 0
sdu2 ONLINE 0 0 0
sdt2 ONLINE 0 0 0
sdx2 ONLINE 0 0 0
sds2 ONLINE 0 0 0
sdz2 ONLINE 0 0 0
sdw2 ONLINE 0 0 0
15727050119303651601 UNAVAIL 0 0 0 was /dev/disk/by-partuuid/ac866713-0d8a-4123-9430-e59b1e6985b0
sdy2 ONLINE 0 0 0
de42b7b5-b53a-4507-af4f-e64c33183db3 ONLINE 0 0 0 (resilvering)

errors: 40322114 data errors, use '-v' for a list

Any suggestions on how to recover from this and make the pool available once again would be highly appreciated.

adeelleo · Jun 1, 2022

Not sure why format is not maintaining for the zpool status output.
Attaching in a text file.

adeelleo · Jun 1, 2022

TrueNAS Version:
TrueNAS-SCALE-22.02-RC.2

morganL · Jun 1, 2022

There have been some drive scalability issues with TrueNAS SCALE.
The RELEASE and U1 version are progressively better, but U2 has some additional fixes - due later in June.
TrueNAS 13.0 is particularly solid in that regard. We have improved both TrueNAS and FreeBSD.
For SCALE 22.02.2 we are working through the differences between Linux and FreeBSD disk management.

adeelleo · Jun 1, 2022

Thanks for the suggestion morganL.

Would consider the upgrade surely once i am able to resolve the current issue.

Any insights on how i can resolve the current situation?

I tried rebooting the server to see if that fixed the issue.

After the reboot the pool is totally offline.

the zpool status states that there is no pool and zpool import does not import the pool since it has errors:

root@truenas[~]# zpool status RAIDZ2-60-Disk-Pool
cannot open 'RAIDZ2-60-Disk-Pool': no such pool
root@truenas[~]# zpool import
pool: RAIDZ2-60-Disk-Pool
id: 5957202920622349234
state: UNAVAIL
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
config:

RAIDZ2-60-Disk-Pool UNAVAIL insufficient replicas
raidz2-0 ONLINE
sddo2 ONLINE
sdct2 ONLINE
sdci2 ONLINE
sdbz2 ONLINE
sdcn2 ONLINE
bac37446-6f50-490c-a44f-c4e1ab8f7ef8 ONLINE
5b1a6483-c445-4a24-a224-4d0456e45849 ONLINE
sdbx2 ONLINE
sddk2 ONLINE
4b0d8331-a6b8-4c95-9b24-57d83637b40a ONLINE
raidz2-1 UNAVAIL insufficient replicas
sddn2 ONLINE
7943903d-1350-4525-8990-24f0af5f369e ONLINE
8104194342383810658 FAULTED corrupted data
9ef29c28-f765-4437-bd47-3ed024c0d304 FAULTED corrupted data
sdcu2 ONLINE
824b295c-2005-47c9-a69b-736177172a3b ONLINE
9cdf0691-7bc9-49c9-abb2-83c4ebf9360f ONLINE
0b47dc37-fa92-46a4-9ceb-c75509fd3076 ONLINE
46690272-2cfb-429a-84e5-c0f94a849906 UNAVAIL
sddj2 ONLINE
raidz2-2 ONLINE
sdcy2 ONLINE
7ac064ea-5541-4224-b011-5a4b857d3002 ONLINE
7924a000-ead1-423d-90e2-4d34cc4f1256 ONLINE
sdde2 ONLINE
sddc2 ONLINE
sddl2 ONLINE
sds2 ONLINE
sddb2 ONLINE
ef511943-e2a1-4d20-9d06-f8297552e06d ONLINE
sddg2 ONLINE
raidz2-3 ONLINE
sddq2 ONLINE
sdcz2 ONLINE
sddh2 ONLINE
sde2 ONLINE
e44b8008-259d-4523-9368-9cbb54dfee47 ONLINE
sdd2 ONLINE
sddm2 ONLINE
sdda2 ONLINE
82bd5a27-c761-48ea-8f79-17b4f5533722 ONLINE
d785063a-f5ee-455d-ae58-f0326863aef6 ONLINE
raidz2-4 ONLINE
sdm2 ONLINE
sdj2 ONLINE
sdl2 ONLINE
262f80e0-e7da-4d34-8d76-4bd659c4a20d ONLINE
de61ac8f-1011-41a6-aa97-bf0775d95c0e ONLINE
sdn2 ONLINE
sdi2 ONLINE
sdg2 ONLINE
207accf8-c935-4850-bf56-429536b4dd0a ONLINE
b0a81c8a-5f74-4c4b-b540-d306a92338bb ONLINE
raidz2-5 ONLINE
sdv2 ONLINE
sdu2 ONLINE
sdt2 ONLINE
sdx2 ONLINE
sdr2 ONLINE
sdz2 ONLINE
sdw2 ONLINE
ac866713-0d8a-4123-9430-e59b1e6985b0 ONLINE
sdy2 ONLINE
de42b7b5-b53a-4507-af4f-e64c33183db3 ONLINE
root@truenas[~]#

Dice · Jun 2, 2022

The situation looks grim for the pool, and this vdev in particular:

Code:

          raidz2-1                                UNAVAIL    260     2     0  insufficient replicas
            sddn2                                 DEGRADED     0     0 2.00M  too many errors
            7943903d-1350-4525-8990-24f0af5f369e  DEGRADED     0     0 2.00M  too many errors
            8104194342383810658                   FAULTED      0     0     0  was /dev/disk/by-partuuid/9ef29c28-f765-4437-bd47-3ed024c0d304  (awaiting resilver)
            9ef29c28-f765-4437-bd47-3ed024c0d304  FAULTED      0     0     0  corrupted data  (awaiting resilver)
            sdcu2                                 DEGRADED     0     0 2.00M  too many errors
            824b295c-2005-47c9-a69b-736177172a3b  DEGRADED     0     0 2.00M  too many errors
            9cdf0691-7bc9-49c9-abb2-83c4ebf9360f  DEGRADED     0     0 2.00M  too many errors
            0b47dc37-fa92-46a4-9ceb-c75509fd3076  DEGRADED     0     0 2.00M  too many errors
            46690272-2cfb-429a-84e5-c0f94a849906  UNAVAIL    266   428 2.00M
            sddj2                                 DEGRADED     0     0 2.00M  too many errors

This vdev is far to gone.
Probably your last reboot pushed the last light of hope for some of the drives into oblivion.
I can see your efforts in changing and resilvering drives of other vdevs.
Unfortunately, you've surpassed the fault tolerance on vdev raidz2-1. Loosing one vdev looses you the pool.

To me, I see only sliver of hopes that can be done to save this particular pool.
That would be if the "46690272-2cfb-429a-84e5-c0f94a849906 UNAVAIL" on raidz2-1 would magically wake up enough to be able to continue replacing drives.

Code:

          raidz2-1                                UNAVAIL  insufficient replicas
            sddn2                                 ONLINE
            7943903d-1350-4525-8990-24f0af5f369e  ONLINE
            8104194342383810658                   FAULTED  corrupted data
            9ef29c28-f765-4437-bd47-3ed024c0d304  FAULTED  corrupted data
            sdcu2                                 ONLINE
            824b295c-2005-47c9-a69b-736177172a3b  ONLINE
            9cdf0691-7bc9-49c9-abb2-83c4ebf9360f  ONLINE
            0b47dc37-fa92-46a4-9ceb-c75509fd3076  ONLINE
            46690272-2cfb-429a-84e5-c0f94a849906  UNAVAIL
            sddj2                                 ONLINE

Looking ahead there are a few things to do.
1. Restoring from backup is the first.
2. REALLY look into routines on how the situation was allowed to spiral this far out of control. (I'd very much like to hear about your analysis from this - it ought to include valuable lessons)

adeelleo · Jun 2, 2022

Dice said:
The situation looks grim for the pool, and this vdev in particular:

Code:
raidz2-1 UNAVAIL 260 2 0 insufficient replicas sddn2 DEGRADED 0 0 2.00M too many errors 7943903d-1350-4525-8990-24f0af5f369e DEGRADED 0 0 2.00M too many errors 8104194342383810658 FAULTED 0 0 0 was /dev/disk/by-partuuid/9ef29c28-f765-4437-bd47-3ed024c0d304 (awaiting resilver) 9ef29c28-f765-4437-bd47-3ed024c0d304 FAULTED 0 0 0 corrupted data (awaiting resilver) sdcu2 DEGRADED 0 0 2.00M too many errors 824b295c-2005-47c9-a69b-736177172a3b DEGRADED 0 0 2.00M too many errors 9cdf0691-7bc9-49c9-abb2-83c4ebf9360f DEGRADED 0 0 2.00M too many errors 0b47dc37-fa92-46a4-9ceb-c75509fd3076 DEGRADED 0 0 2.00M too many errors 46690272-2cfb-429a-84e5-c0f94a849906 UNAVAIL 266 428 2.00M sddj2 DEGRADED 0 0 2.00M too many errors

This vdev is far to gone.
Probably your last reboot pushed the last light of hope for some of the drives into oblivion.
I can see your efforts in changing and resilvering drives of other vdevs.
Unfortunately, you've surpassed the fault tolerance on vdev raidz2-1. Loosing one vdev looses you the pool.

To me, I see only sliver of hopes that can be done to save this particular pool.
That would be if the "46690272-2cfb-429a-84e5-c0f94a849906 UNAVAIL" on raidz2-1 would magically wake up enough to be able to continue replacing drives.

Code:
raidz2-1 UNAVAIL insufficient replicas sddn2 ONLINE 7943903d-1350-4525-8990-24f0af5f369e ONLINE 8104194342383810658 FAULTED corrupted data 9ef29c28-f765-4437-bd47-3ed024c0d304 FAULTED corrupted data sdcu2 ONLINE 824b295c-2005-47c9-a69b-736177172a3b ONLINE 9cdf0691-7bc9-49c9-abb2-83c4ebf9360f ONLINE 0b47dc37-fa92-46a4-9ceb-c75509fd3076 ONLINE 46690272-2cfb-429a-84e5-c0f94a849906 UNAVAIL sddj2 ONLINE

Looking ahead there are a few things to do.
1. Restoring from backup is the first.
2. REALLY look into routines on how the situation was allowed to spiral this far out of control. (I'd very much like to hear about your analysis from this - it ought to include valuable lessons)

Thanks for your input Dice.
Yes some lessons learned. The hard way unfortunately. :-(
I should have taken the backup before trying to resilver the vDevs. But too late now.
If there is any suggestions anyone might have on things to try i would give it a go.

Dice · Jun 2, 2022

adeelleo said:
If there is any suggestions anyone might have on things to try i would give it a go.

You should be aware that certain commands has the potential to worsen the state, to where it is non reversible. Therefore it is also important to work in sequence.

I emphasize these advice are what I would have done to my system. I emphasize I do not assume any responsibility for outcomes of these commands on your system. May cause harm.

We're out on long-shot territory. Either run the commands from a tmuxed session on the host itself, or gui. Some of the commands may take a good while, and you'd like to monitor (and share plz) output of your progress.

I'd have another drive or 2 ready for resilvering - only run one at a time at this point. Preferably don't remove a bad drive just yet, add a new and attempt resilver. The less reboots the better.
Anything and everything hangs in the balance of saving that vdev.

Then; What's the output of:
zpool import -fn RAIDZ2-60-Disk-Pool (remove -n if it doesnt accept -n)
If this works, continue with resilver process.

Else;
zpool import -f -Fn RAIDZ2-60-Disk-Pool
Hardcore forcing, with -n for determining if it can happen. If it seems so, proceed to run the command without -n.

Else:
zpool import -fn RAIDZ2-60-Disk-Pool
If this works, continue with resilver process.

Else:
zpool import -f FXn RAIDZ2-60-Disk-Pool
Hardcore forcing on steroids. This has the potential to cause some harm as I read the notes below. If it seems to accept, proceed to run the command without -n.
This will sort of conduct a littlebit of a rollback to a state zfs thinks it has a better chance to repair, as I understand it.

If this works, continue with resilver process.

Else:
zpool import -fD RAIDZ2-60-Disk-Pool
Assumes the pool is destroyed, and gives a yolo import attempt.

Else:
.....

Explanation zpool import manual:

-f Forces import, even if the pool appears to be potentially active.
-F Recovery mode for a non-importable pool. Attempt to return the pool to an importable state by discarding the last few transactions. Not all damaged pools can be recovered by using this option. If successful, the data from the discarded transactions is irretrievably lost. This option is ignored if the pool is importable or already imported.

-n Used with the -F recovery option. Determines whether a non-importable pool can be made importable again, but does not actually perform the pool recovery. For more details about pool recovery mode, see the -F option, above.

-X Used with the -F recovery option. Determines whether extreme measures to find a valid txg should take place. This allows the pool to be rolled back to a txg which is no longer guaranteed to be consistent. Pools imported at an inconsistent txg may contain uncorrectable checksum errors. For more details about pool recovery mode, see the -F option, above. WARNING: This option can be extremely hazardous to the health of your pool and should only be used as a last resort.

-D Imports destroyed pool. The -f option is also required.

Good luck on this one.

AlexGG · Jun 2, 2022

It is probably a good idea to sort out the mechanical state of the drives themselves. Shut down the server, pull out drives one by one, examine SMART information. Have enough blank drives on hand, and whichever shows bad on the SMART, clone it onto a blank. Good drive goes back into the server, anything less then good gets replaced with a clone. Make sure to verify the clone is actually good after you make it.

We are quite deep into a multiple failure territory after all.

Dice · Jun 2, 2022

AlexGG said:
Have enough blank drives on hand, and whichever shows bad on the SMART, clone it onto a blank.

If there is indexing enough to do so from the current information, to at least pull the bad drives from raidz2-1, it is an idea.
Given how this scenario developed in the first place, I doubt there is information about drive location, matched with serial, and TN UUID to pick out the relevant drives.

AlexGG · Jun 2, 2022

Dice said:
If there is indexing enough to do so from the current information, to at least pull the bad drives from raidz2-1, it is an idea.
Given how this scenario developed in the first place, I doubt there is information about drive location, matched with serial, and TN UUID to pick out the relevant drives

I was thinking more of working through all 60 drives regardless of the state reported by ZFS. I fully expect there to be more problems than reported, and these will interfere with whatever recovery attempts may be taken.

Dice · Jun 2, 2022

AlexGG said:
I was thinking more of working through all 60 drives regardless of the state reported by ZFS. I fully expect there to be more problems than reported, and these will interfere with whatever recovery attempts may be taken.

I picked up on that idea,
My thinking was that such a maneuver may at this point upset the state of things.
Whatever is still working enough to import the pool, is good news at this point. Plus it already survived a few resilvers, hm.
As long as the pool can be imported, then later steps would include the 'other procedures' and safety-nets.
Here I'm thinking SMART's evaluations, scrub again, and probably adding a few 'hot spares' for future imminent failures of these probably quite old drives.

AlexGG · Jun 2, 2022

Dice said:
My thinking was that such a maneuver may at this point upset the state of things.

Yes, this is a valid point.

Dice said:
Whatever is still working enough to import the pool, is good news at this point. Plus it already survived a few resilvers, hm.
As long as the pool can be imported, then later steps would include the 'other procedures' and safety-nets.

My understanding, however, is that the pool cannot now be imported, and it did not, in fact, survive a few resilvers. Which understanding may be wrong.

Well, anyway, this kind of decision cannot be made without factoring in the value of the data. How much money and effort would be okay to put into the attempts, what resources are available, are there any time constraints, and whatever other factors I can't even think about, so that's up to OP.

adeelleo · Jun 11, 2022

Dice said:
You should be aware that certain commands has the potential to worsen the state, to where it is non reversible. Therefore it is also important to work in sequence.

I emphasize these advice are what I would have done to my system. I emphasize I do not assume any responsibility for outcomes of these commands on your system. May cause harm.

We're out on long-shot territory. Either run the commands from a tmuxed session on the host itself, or gui. Some of the commands may take a good while, and you'd like to monitor (and share plz) output of your progress.

I'd have another drive or 2 ready for resilvering - only run one at a time at this point. Preferably don't remove a bad drive just yet, add a new and attempt resilver. The less reboots the better.
Anything and everything hangs in the balance of saving that vdev.

Then; What's the output of:
zpool import -fn RAIDZ2-60-Disk-Pool (remove -n if it doesnt accept -n)
If this works, continue with resilver process.

Else;
zpool import -f -Fn RAIDZ2-60-Disk-Pool
Hardcore forcing, with -n for determining if it can happen. If it seems so, proceed to run the command without -n.

Else:
zpool import -fn RAIDZ2-60-Disk-Pool
If this works, continue with resilver process.

Else:
zpool import -f FXn RAIDZ2-60-Disk-Pool
Hardcore forcing on steroids. This has the potential to cause some harm as I read the notes below. If it seems to accept, proceed to run the command without -n.
This will sort of conduct a littlebit of a rollback to a state zfs thinks it has a better chance to repair, as I understand it.

If this works, continue with resilver process.

Else:
zpool import -fD RAIDZ2-60-Disk-Pool
Assumes the pool is destroyed, and gives a yolo import attempt.

Else:
.....

Explanation zpool import manual:

Good luck on this one.

Hi Dice,

Thanks for your comprehensive response and suggestions on possible steps:

I have ran all zpool import commands in sequence as per your recommendation.
But still unable to import the pool. Since one of the Z2 arrays in the pool is unavailable due to 3 Disks being unavailable.

Below is the RAIDZ2 array that has insufficient replicas and because of which the Pool can not be imported.

Out of the 3 Disks the 2 Faulted Disks actually failed previously. But the 3rd Unavailable Disk actually did not fail. Someone accidentally replaced this disk instead of the Faulted Disk. Is there any way to locate which disk was replaced from this slot and put that disk back in place to recover the Array?

Dice · Jun 11, 2022

adeelleo said:
Someone accidentally replaced this disk instead of the Faulted Disk. Is there any way to locate which disk was replaced from this slot and put that disk back in place to recover the Array?

There is hope!

Locate the drive you think contains the data, and insert it too.
Ideally, whenever replacing a drive, don't remove the faulty drive from the system before it has been "offlined" as per the manual.

Disk Replacement

Describes how to replace a disk and restore the hot spare in TrueNAS CORE.

www.truenas.com

A couple of years ago, the idea was to leave a partially faulted drive in the system to assist in a non critical manner to a pool during resilvering.
This behavior may have changed. Someone may correct here.

How the absolutely most critical drive 'accidentially' got removed, how that happened is a great learning outcome on how to improve adminstrative processes in the future.

Dice · Jun 11, 2022

Are you running TN13?
If still at TN12, it's fine.

Have a look at the release notes. There is a bug impending on using hte GUI to replace drives.

adeelleo · Jun 11, 2022

Dice said:
Are you running TN13?
If still at TN12, it's fine.

Have a look at the release notes. There is a bug impending on using hte GUI to replace drives.

TrueNAS Version:
TrueNAS-SCALE-22.02-RC.2

Dice · Jun 11, 2022

Nevermind then, you've also obviously made resilvers successfully already. I forgot about that for a second.

adeelleo · Jun 11, 2022

Dice said:
There is hope!

Locate the drive you think contains the data, and insert it too.
Ideally, whenever replacing a drive, don't remove the faulty drive from the system before it has been "offlined" as per the manual.

Disk Replacement

Describes how to replace a disk and restore the hot spare in TrueNAS CORE.

www.truenas.com

A couple of years ago, the idea was to leave a partially faulted drive in the system to assist in a non critical manner to a pool during resilvering.
This behavior may have changed. Someone may correct here.

How the absolutely most critical drive 'accidentially' got removed, how that happened is a great learning outcome on how to improve adminstrative processes in the future.

Thanks Dice for taking out the time. Much appreciated.

The drive that was accidentally removed is also currently installed in the system. But its mixed with other disks. Is there any way to identify which disk it was from command line. The GUI actually lists all the disks with serial numbers in each array that is part of the pool. But since the pool can not be imported there is no way to get this information from the GUI.

If we are able to identify the disk and place it back we might be able to recover this pool.

Dice · Jun 11, 2022

How is it "mixed" with other disks?
How is it "removed" but still installed inthe system?

If correct drive, expected to contain the correct data, is already in the system, and it still did not import or was found - then there light of hope went away.

Edit,
well, maybe. Depending on exactly what happened to the drive.
Can you describe exactly what steps it went through when it was accidentally removed?
I

Important Announcement for the TrueNAS Community.

RAIDZ2-60-Disk-Pool Unavailable after a RAIDZ2 vdev Failed

Dabbler

Dabbler

Attachments

Dabbler

Captain Morgan

Dabbler

Attachments

Wizard

Dabbler

Wizard

Contributor

Wizard

Contributor

Wizard

Contributor

Dabbler

Wizard

Wizard

Dabbler

Wizard

Dabbler

Wizard

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "RAIDZ2-60-Disk-Pool Unavailable after a RAIDZ2 vdev Failed"

Similar threads