Truenas 12 pool offline after failed drives removed

Art

Dabbler
Joined
Dec 30, 2015
Messages
22
My Configuration:

Truenas 12.0U1.1
SuperMicro Server w/93008i8e HBA controller (main machine)
-> SOC Xeon
-> 128GB ECC memory
-> 2xUSB boot Drive
-> 8x8TB HD
-> 2x1TB SSD
(external) Supermicro JBOD enclosure (connected via HBA controller from main machine)
-> 13x14TB HD
-> 8x16TB HD

Pools:
SSD
-> 1 vdev + 0 hotspare
--> 2disk mirror (1TB SSD)
Data
-> 4 vdevs + 3 hotspares
--> 8disk raidz2 (8TB) vdev1
--> 6disk raidz2 (14TB) vdev2
--> 6disk raidz2 (14TB) vdev3
--> 6disk raidz2 (16tb) vdev4

I had 2 disk failures in one of my vdevs (vdev2). The hotspares automatically replaced the failing disks and the pool operated in a degraded state.

The 2 disk failures are under warranty so I removed them and shipped them back to the manufacturer.

I then rebooted Truenas and upon boot I get the BTX Halted error and Freenas does not boot.
Screenshot 2021-01-30 at 19.04.29.png


Things I tried:
1. If I disconnect the (external) JBOD enclosure then Truenas boots correctly but with my Data pool "offline".

2. One other thing I tried was during the boot of Truenas I connected the JBOD enclosure (instead of start of boot) and Freenas boots and finds my disks but the data pool is still "offline" and I see all these disks now under Storage->Multipaths.

3. I turned on JBOD enclosure after Truenas was fully booted. Similiar experience to #2 try but only the 2 hot spares show up in the Multipaths.
Console Info:
Screenshot 2021-01-30 at 19.11.25.png
Multipath Info:
Screenshot 2021-01-30 at 18.44.02.png



I assumed the data pool with the hotspares in use (with the faulty drives removed) it would operate normally, but this does not appear to be the case.

Maybe it is because I didn’t mark the failed drives as offline?

How do I resolve this issue with Freenas seeing my Data pool normally with the hotspares in use? Or do I need to wait for the replacement disks to be added before things are back to "normal"?

Any help would be appreciative, and thanks in advance.

Regards,

Art
 
Last edited:

Art

Dabbler
Joined
Dec 30, 2015
Messages
22
The latest status of the pool was:
(the unavailable disks are the hot spares and the faulted ones were physically removed after this alert)

Code:
* Pool data state is DEGRADED: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state.

The following devices are not healthy:

Disk ATA ST14000NE0008-2J ZHZ6A3L1 is UNAVAIL
Disk ATA ST16000NE000-2RW ZL2AM3GK is UNAVAIL
Disk 6177272037671306492 is FAULTED
Disk 6412481714084147987 is FAULTED


I am looking at the zpool cli command and curious if I can do zpool offline -f data 6177272037671306492 to force the removed disk to be offline. I do not have much experience with zpool troubleshooting (or fixing them). So I am reluctant to touch anything until someone else with more knowledge can chime in.

Thanks.
 
Last edited:

Art

Dabbler
Joined
Dec 30, 2015
Messages
22
So took a leap of faith.. Got the pool back online in a degraded state..

Started Truenas
Once fully started, then started JBOD instance
From Truenas UI - disconnected Data pool
From Truenas UI - imported data pool

Now the pool is accessible again but in degraded state.

Here is the following status of the vdev with both spares being used.
Screenshot 2021-01-31 at 11.22.35.png


But now I see all spares as unvailable.
Screenshot 2021-01-31 at 11.22.41.png

Screenshot 2021-01-31 at 11.22.48.png


How should I recover the 3rd spare correctly?

Once the replacement hard drives arrive I will replace them via the UI.
 

Art

Dabbler
Joined
Dec 30, 2015
Messages
22
Update:

I was unable to detach the drives from the UI (python null pointer errors).
I had to manually detach zpool detach data <id> for both of the in-use drives.
After that the 2 spares were now available again.

As for the multipath disk:
I also manually destroyed the multipath/disk7 gmultipath destroy disk7 and removed the drive from the pool using the UI.
I then erased the disk using the UI.
Rebooted Truenas (as disk still showed up at multipath name).
After reboot disk name was normal again.
Re-added the disk to the pool as a spare.

All good and back to normal :)
 
Last edited:
Top