Missing Drive from Pool after replacement

ShadowChaser

Cadet
Joined
Jun 10, 2022
Messages
5
Hello, this is my first time using TrueNAS as a NAS and I am still unfamiliar with a lot of the tools and functionality presented to me.

Yesterday I simulated a complete drive failure by disconnecting it from my system. I am running a RAIDZ2 on 5 drives and as expected the pool status became degraded.
I put in a different drive and tried to replace it, but ran into the problem described here: https://www.truenas.com/community/threads/replacing-disk-does-not-work-in-truenas-core-v13-0.101381/
I was able to successfully replace the disk with the CLI tool - the resilvering operation happened this morning - but the pool remains degraded as a separate drive has now been removed from the pool so there are still only 4 operational drives as part of this pool. I am very confused on how this could have happened and have no idea what I can do to put the wayward drive back into the pool. Below is an overview of what happened

DrivesOriginal SetupDrive RemovedAfter Resilvering
Bay 1ada1ada1Missing from pool, present in system/gui
Bay 2ada2ada2ada2
Bay 3ada3gptid/xxxxxEmpty
Bay 4ada4ada4ada4
Bay 5ada5ada5ada5
Bay 6Emptyada3 (new)ada3

Any suggestions would be much appreciated.
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
Your table doesn't really help explain what's going on.

Can you run zpool list and provide the output here?
 

ShadowChaser

Cadet
Joined
Jun 10, 2022
Messages
5
Your table doesn't really help explain what's going on.

Can you run zpool list and provide the output here?
1654906662787.png

is this what you are looking for? How do I list the disks in the pool?
 
Joined
Oct 18, 2018
Messages
969
Hi @Nick2253. You can use zpool status pool1 to get the status of the pool which will show the disks.

It would be helpful if you gave a rough outline here of the steps you took and when. My best reading of your prior thread is something like
  1. You set up a 5-disk pool in RAIDZ1
  2. You pulled a disk and reformatted it to test drive replacement
  3. You ran into issues with 13.0 so you did drive replacement via the CLI
  4. After resilvering completed, you look at your pool and you now see it is still degraded somehow?
Are you certain that you resilvered the correct disk? Are you also certain that the connection for all disks is secure?

Also note that generally the bay and dev number, such as ada1, are not considered static. They can change. I'm not sure about 13.0 but I know in older systems (11.x for example) the gptid was as static identifier for a disk you could rely on. zpool status should give us the unique identifiers that zfs expects to be a part of your pool.
 

ShadowChaser

Cadet
Joined
Jun 10, 2022
Messages
5
Hi @Nick2253. You can use zpool status pool1 to get the status of the pool which will show the disks.

It would be helpful if you gave a rough outline here of the steps you took and when. My best reading of your prior thread is something like
  1. You set up a 5-disk pool in RAIDZ1
  2. You pulled a disk and reformatted it to test drive replacement
  3. You ran into issues with 13.0 so you did drive replacement via the CLI
  4. After resilvering completed, you look at your pool and you now see it is still degraded somehow?
Are you certain that you resilvered the correct disk? Are you also certain that the connection for all disks is secure?

Also note that generally the bay and dev number, such as ada1, are not considered static. They can change. I'm not sure about 13.0 but I know in older systems (11.x for example) the gptid was as static identifier for a disk you could rely on. zpool status should give us the unique identifiers that zfs expects to be a part of your pool.
Thank you for your input, I apologize for not knowing what relevant information to include for this problem as I couldn't find anything similar through my searches to reference.

Here's my amended order of operation:
  1. I set up a 5 disk RAIDZ2 pool
  2. I pulled a disk to test drive replacement
  3. I replaced the pulled drive via CLI - I did double check the uuid to be replaced and the dev number replacing it as that is what the CLI utility expects
  4. A different drive (verified via serial #) was removed from the pool somehow
The missing drive was still seen by the system and just not assigned to the correct pool and I was pulling my hair out trying to figure out how to add a singular drive back into a pool of (now) 4 drives without matching the vdev configuration.

I ultimately wiped it and added it in as a hot spare, onlined the missing gptid/uuid, and had it resilver overnight. I don't think this is the intended way to resolve this problem but it was the best I could do in the situation without taking a deeper dive into the CLI than I was comfortable with.
 
Joined
Oct 18, 2018
Messages
969
how to add a singular drive back into a pool
There are two useful things to differentiate here. One is when you offline/online a drive, and another is when you resilver a drive.

To offline device causes zfs to ignore the device in the pool. New data will not written to it. Doing this will likely cause your pool to become degraded. To online a drive is to tell zfs "hey, that disk you used to care about that I offlined, care about it again".

To resilver is to say "hey zfs, this one drive is bad. I'd like to replace it with this other drive". You may end up first offlining the offending drive before starting the resilver but just know that to resilver is not the same as to online a drive.

From your other post it looks like you're using 13.0 which has some bugs so you're using the command line. Did you attempt to use the tool provided in your other thread found at https://www.truenas.com/docs/core/corereleasenotes/#cli-disk-replacements?

In order to get unstuck, the output of zpool status {pool} would be very useful. So too would be the list of drives in your system via pools -> disks via the UI or gpart list | grep "rawuuid|name:.
 

ShadowChaser

Cadet
Joined
Jun 10, 2022
Messages
5
@PhiloEpisteme
I was able to put the drive back into the pool using the hot spare method I wrote about above, still I would like to know how I could have placed the disk back into the pool it was taken out of without needing to wipe it or needing to match vdev layouts.

If you need it, here's the zpool status of the fixed pool regardless.
1654981478930.png
 
Joined
Oct 18, 2018
Messages
969
Sorry, are you asking how to offline and then online the same disk? Or are you asking how to replace a disk with a new one?
 

ShadowChaser

Cadet
Joined
Jun 10, 2022
Messages
5
Sorry, are you asking how to offline and then online the same disk? Or are you asking how to replace a disk with a new one?
Don't know where the offline/online thing came from but my original question was how to put back a disk that was removed from a pool and I don't mean physically.

After using the CLI tool to replace a disk a different disk was removed from the pool but still present in software - it just was unassigned from pool1 or something and I could not assign it to pool1 again.
 
Joined
Oct 18, 2018
Messages
969
Don't know where the offline/online thing came from but my original question was how to put back a disk that was removed from a pool and I don't mean physically.
To online/offline disks vs resilvering are two ways the concepts of "assigning" or "unassigning" a disk to a pool may be mapped to zfs concepts.

Without more information, such as exactly which commands you ran, I cannot say with certainty what caused your issue. It could be related to the very fresh version of Truenas you're using. Alternatively, one issue I've seen before is that folks remove a disk and inadvertently bump another cable such that it becomes loose and the connection is unreliable.

Resilvering drive A with drive B should not impact drive C. Additionally, removing drive A physically from the system should not impact drive C.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
removing drive A physically from the system should not impact drive C.
Unless you get a little too heavy-handed and touch the cable(s) connected to drive C in a way which causes a disconnection (even if not obviously visible).

I see you did already mention that though...
 
Top