Register for the iXsystems Community to get an ad-free experience and exclusive discounts in our eBay Store.

SOLVED drive faulted, can't offline it to replace it

digity

Member
Joined
Apr 24, 2016
Messages
126
I have a drive that has faulted and has degraded the pool. In the web UI, I try to offline the disk, but nothing happens - the status still says "FAULTED". I tried several times, even logging out and in, nothing. I then noticed the drive ID in the alerts and the drive status page are different (da26, da10, respectively), with the latter being it's old ID from a pool export/import months ago. Both IDs point to the same drive (verified by the serial number). I'm assuming the ID issue is messing with TrueNAS' ability to properly offline the faulted drive...?

With these issues, how can I replace this failing drive?

P.S. - I haven't used the "Replace" function, because I don't have a free slot available until that faulted drive comes out.
 

sretalla

Wizened Sage
Joined
Jan 1, 2016
Messages
3,922
I'm assuming the ID issue is messing with TrueNAS' ability to properly offline the faulted drive...?

With these issues, how can I replace this failing drive?
Faulted is already a kind of offline status (the system isn't using it in the pool already).

You can just pull the drive and replace it when it's shown as unavailable (and you insert an appropriate replacement drive to replace it with)

I then noticed the drive ID in the alerts and the drive status page are different (da26, da10, respectively),
Don't get hung up on those designations. They aren't used by ZFS to identify the right member disk of a pool/vdev. ZFS uses the gptid to identify a disk no matter where it appears in the device order.
 

digity

Member
Joined
Apr 24, 2016
Messages
126
Faulted is already a kind of offline status (the system isn't using it in the pool already).

You can just pull the drive and replace it when it's shown as unavailable (and you insert an appropriate replacement drive to replace it with)


Don't get hung up on those designations. They aren't used by ZFS to identify the right member disk of a pool/vdev. ZFS uses the gptid to identify a disk no matter where it appears in the device order.
I pulled the faulted drive and inserted the replacement drive. It doesn't show up as a new disk ID, but as the old disk ID, da26, so I can't use the replace function (StoragePools -> Pool -> Status) to get it into the pool. It has the old ID even though the make, model number and serial number reflect the new drive (Storage -> Disks).

Any ideas?
 

sretalla

Wizened Sage
Joined
Jan 1, 2016
Messages
3,922
so I can't use the replace function (StoragePools -> Pool -> Status
Have you actually tried that? you can indeed replace a disk from there regardless of its identifier. Is it that the list is empty when you try the replace?
 

digity

Member
Joined
Apr 24, 2016
Messages
126
Have you actually tried that? you can indeed replace a disk from there regardless of its identifier. Is it that the list is empty when you try the replace?
Heh, it's working now... I think.

But, yes I did try the Replace function initially, but I got an error. It actually listed the drive in question as multipath/disk15 (Storage -> Pools -> Pool Status -> Replace). I thought the multipath thing was causing the issue so I performed "gmultipath destroy disk15" and "gpart recover /dev/da26" (and "gpart recover /dev/da10"), but I got the invalid argument error with the gpart commands. Went back to try the Replace function anyway and the drive wasn't listed as available at all this time (da26/10 or multipath/disk15). I popped the drive out and then back in and it still listed it as da26/10 (Storage -> Disks). Got frustrated and I left it at that.

I saw your reply and went to Replace to check if it actually lists da26 (or da10) as available and it didn't, but now had a new entry listed as multipath/disk1. I started to run the gmultipath and gpart commands again for this multipath drive, but figured I should try to add it as is (multipath/disk1) and BAM! It added without spitting an error and started resilvering.

I don't know what was wrong or why it's working now, but I'll take it.

Thanks for your help
 
Top