disk relacement issue

Joined
Sep 2, 2019
Messages
4
i have a pool with 5 z1 vdevs in it, they are three disks each and there is 2 hot spares. a disk failed and the hot spare took over, the disk that failed was replaced but the vdev is still using the hotspare. what would be the recommended next steps?. i have attached the ui view of the vdev in question. at the moment the vdev has 4 disks online?

i am running FreeNAS-11.2-U5 on dual E5-2680 xeons with a supermicro board with 12 internal disks and a md1000 storage array with 14 disks on sas.
 

Attachments

  • vdev.PNG
    vdev.PNG
    19.8 KB · Views: 249

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
The full output of zpool status would help us to see what's happening here... the limited screenshot is not very helpful if we're going to comment based on factual information.
 
Joined
Sep 2, 2019
Messages
4
attached zpool status in image I cant save a txt version it seems
 

Attachments

  • zpoolstatus.PNG
    zpoolstatus.PNG
    78 KB · Views: 264

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Where you can, you should follow the GUI process, but I think this will require some CLI depending on how you want to go in the end.

https://www.ixsystems.com/documentation/freenas/11.2-U5/storage.html#replacing-a-failed-disk

For the CLI part, you need to decide if you will permanently use a spare in the pool, replace the disk in the pool and return the spare.

This article (although it's strictly not Open ZFS like we use on FreeNAS) covers the process.
https://docs.oracle.com/cd/E19253-01/819-5461/gcvcw/index.html
 
Joined
Sep 2, 2019
Messages
4
the confusion at the moment is that is is saying it is replacing the failed disk with another (Bold) and that the spare is in use, but it has been in this state for ~2 weeks now and the replace just isnt happening. i have had disks fail before and did all work through the UI and it was fine.

spare-1 DEGRADED 0 0 0
replacing-0 DEGRADED 0 0 0
gptid/fdb0fe15-2085-11e8-92f9-001e673762cd ONLINE 0 0 0
16350047576090472210 UNAVAIL 0 0 0 was /dev/gptid/80643263-c329-11e9-8acf-0025901f01a8

gptid/3125381f-865f-11e8-8c89-001e673762cd ONLINE 0 0 0
it is possible i made a bit of a mistake somewhere but i still dont really know which commands will get be out of the pickle.?
 
Joined
Sep 2, 2019
Messages
4
thanks for the above I think I now have a solution that looks to work in test.

zpool remove 16350047576090472210 (this will remove the old artefact that was left after the replace.)

this leaves just a spare and the new disk

zpool remove gptid/3125381f-865f-11e8-8c89-001e673762cd (this will remove the spare from the pool and return it to its real location as all is back online now.

the first step worked in test and things a nolonger degraded but the spare is still in use so the second step should remove the spare as welll.

does this sound right.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Replacing is exactly what a spare should do... the resilvering is complete (although was only a fe MB according to the status output... don't think too much of that as data distribution in the pool isn't necessarily even).

You now need to decide as I mentioned if you want to replace the failed disk permanently with another one and return the spare to the spares list or if you just want the spare to permanently replace the failed disk... the procedures for those two options were in the links.
 
Top