HDD failed, offlined, now pool is gone

sgt_jamez

Explorer
Joined
Jul 30, 2021
Messages
88
This morning, I had a HDD tank on me. I offlined it. Shut down the box to swap out the drive. And now when I go into the Storage Dashboard and click on Manage Devices for that pool, it lists no devices. The Data VDEVs says mixed capacity VDEVs. All three of the devices in that pool are 8TB drives. I don't know how to replace the failed drive with the one I added. I don't know how to find that long disk ID shown below for the new drive. What should I do?

zpool status shows
Code:
pool: main_vault
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: resilvered 201M in 00:00:20 with 0 errors on Thu Sep 15 08:36:45 2022
config:

        NAME                                      STATE     READ WRITE CKSUM
        main_vault                                DEGRADED     0     0     0
          raidz1-0                                DEGRADED     0     0     0
            3f02a7b3-3d25-4e98-999b-b6457ba6179e  OFFLINE      0     0     0
            b9ba47af-978e-4633-ac3e-fa8027ffab7e  ONLINE       0     0     0
            a024a392-428c-452f-854d-c3c8fcff905d  ONLINE       0     0     0
        cache
          1229538a-e846-4fcf-b556-9db79aa1f5db    ONLINE       0     0     0

errors: No known data errors
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
This morning, I had a HDD tank on me. I offlined it. Shut down the box to swap out the drive. And now when I go into the Storage Dashboard and click on Manage Devices for that pool, it lists no devices. The Data VDEVs says mixed capacity VDEVs. All three of the devices in that pool are 8TB drives. I don't know how to replace the failed drive with the one I added. I don't know how to find that long disk ID shown below for the new drive. What should I do?

zpool status shows
Code:
pool: main_vault
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: resilvered 201M in 00:00:20 with 0 errors on Thu Sep 15 08:36:45 2022
config:

        NAME                                      STATE     READ WRITE CKSUM
        main_vault                                DEGRADED     0     0     0
          raidz1-0                                DEGRADED     0     0     0
            3f02a7b3-3d25-4e98-999b-b6457ba6179e  OFFLINE      0     0     0
            b9ba47af-978e-4633-ac3e-fa8027ffab7e  ONLINE       0     0     0
            a024a392-428c-452f-854d-c3c8fcff905d  ONLINE       0     0     0
        cache
          1229538a-e846-4fcf-b556-9db79aa1f5db    ONLINE       0     0     0

errors: No known data errors

Please describe the software version.. when did you upgrade
and the hardware you are running on.
 

sfatula

Guru
Joined
Jul 5, 2022
Messages
608
The disk ids are found in /dev/disk/by-uuid. You can match the /dev/sd? device there as well to determine what is what. Your pool looks ok (other than offline drive but that's to be expected in this case) and ready for the drive to be replaced. I presume you shut down the box and took out the bad drive and put in the good drive?

I've never used the truenas UI but surely it has a way to do the zpool replace command via the UI, someone else can comment on that. If for some reason the UI is broken (you say the pool doesn't show any devices!?) then it can be done from the command line. The syntax is pretty simple. Just do :

man zpool-replace

and you will see the syntax.

To make things simpler, I put a small sticker on each drive of the last 4 or 5 digits of the id. That way, when replacing a drive, it's obvious on sight which drive is which with the case open.
 

sgt_jamez

Explorer
Joined
Jul 30, 2021
Messages
88
Please describe the software version.. when did you upgrade
and the hardware you are running on.
My current software version is 22.12-MASTER-20220915-034321, installed yesterday.
Hardware is all consumer grade stuff. A Frankenputer I built to replace my old Synology NAS with TrueNAS.

CPU: Intel Core i7-2600K
Motherboard: Asrock Z77 Extreme4
RAM: Corsair CML32GX3M4A1600C10 4x8GB
HBA: LSI SAS9211-8i
HDDs in pool: Seagate 8TB IronWolf NAS drive x3
 

sgt_jamez

Explorer
Joined
Jul 30, 2021
Messages
88
The disk ids are found in /dev/disk/by-uuid. You can match the /dev/sd? device there as well to determine what is what. Your pool looks ok (other than offline drive but that's to be expected in this case) and ready for the drive to be replaced. I presume you shut down the box and took out the bad drive and put in the good drive?

I've never used the truenas UI but surely it has a way to do the zpool replace command via the UI, someone else can comment on that. If for some reason the UI is broken (you say the pool doesn't show any devices!?) then it can be done from the command line. The syntax is pretty simple. Just do :

man zpool-replace

and you will see the syntax.

To make things simpler, I put a small sticker on each drive of the last 4 or 5 digits of the id. That way, when replacing a drive, it's obvious on sight which drive is which with the case open.
The /dev/sd? device IDs change from when the bad drive is in to when I shut down, replace, and restart. It's not always the same ID. When I first put in the new drive, it came up as /dev/sdf. After going back to the degraded drive, the IDs changed. And then going back to the good drive again, the IDs were different again. I definitely do not understand how the /dev/sd? IDs get assigned.

I tried to run this command:
Code:
zpool replace -f 3f02a7b3-3d25-4e98-999b-b6457ba6179e /dev/disk/by-id/ata-ST8000VN004-2M2101_WSD8LNK7


But it gave me an error about needing to start with a letter.

I'm going to try to roll back to an older SCALE version and see if I can get this replaced there. The fact that the UI isn't showing my pool seems to indicate there's an issue in that version of the nightly build. I'll post back with my results.

Edit:
I dropped back to a version from late June, and the afflicted drive has been replaced with the new drive and resilvering is under way!
 
Last edited:

sfatula

Guru
Joined
Jul 5, 2022
Messages
608
Yes, the /dev/sd? ids change, but that is not relevant to the task. The UUID does not change. I was merely saying you can obtain the sd id for other purposes, such as running smartctl, matching up error messages to those in the log, etc.

Your syntax is wrong, the second part of the replace command you are using /dev/disk.... / is not a letter. You need the UUID from the /dev/disk/by-uuid directory. If the drives are already named by uuid, why not keep it that way for the replacement disk? You can use the by-id, but I'd keep it the same and use UUID. So...

zpool replace -f 3f02a7b3-3d25-4e98-999b-b6457ba6179e whatevertheotheruuidiswithoutanyslashesordirectories
 

sgt_jamez

Explorer
Joined
Jul 30, 2021
Messages
88
Yes, the /dev/sd? ids change, but that is not relevant to the task. The UUID does not change. I was merely saying you can obtain the sd id for other purposes, such as running smartctl, matching up error messages to those in the log, etc.

Your syntax is wrong, the second part of the replace command you are using /dev/disk.... / is not a letter. You need the UUID from the /dev/disk/by-uuid directory. If the drives are already named by uuid, why not keep it that way for the replacement disk? You can use the by-id, but I'd keep it the same and use UUID. So...

zpool replace -f 3f02a7b3-3d25-4e98-999b-b6457ba6179e whatevertheotheruuidiswithoutanyslashesordirectories
Good information. I knew I was going wrong, just didn't know where to go next. I called it a night and went to bed! I do appreciate the reply. Hopefully I wont be replacing anything any time soon.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
My current software version is 22.12-MASTER-20220915-034321, installed yesterday.
Hardware is all consumer grade stuff. A Frankenputer I built to replace my old Synology NAS with TrueNAS.

CPU: Intel Core i7-2600K
Motherboard: Asrock Z77 Extreme4
RAM: Corsair CML32GX3M4A1600C10 4x8GB
HBA: LSI SAS9211-8i
HDDs in pool: Seagate 8TB IronWolf NAS drive x3


This is Bluefin BETA (really nightly) software... its best to discuss Bluefin on that special subforum.

So, a HDD failed within a day of a sofware update?

It is fresh BETA-quality software and there are quite a few changes to pool management UI. It is very possible there are bugs. Go back to SCALE 22.02.3 if you are worried about your data.
 
Top