Replace failed disk

Status
Not open for further replies.

remonv76

Dabbler
Joined
Dec 27, 2014
Messages
49
I need some help with repacing a failed disk.

Setup:
Multipath connected disk array
21x 300GB SAS
2x hotspares

3x RAIDZ2 vDevs in zpool

When i noticed the failed disk, i took the disk offline, so the free spare would kick in and took over.
I then physically replaced the failed drive with a new one.
The new disk is seen as multipath disk19 (da11/da53)
I issued the command "zpool replace piggy 11006946617288866407 multipath/disk19" and resilvering started.
After the resilver, i detached the hotspare "zpool detach piggy gptid/dfba57f5-f64c-..."
Now i'm still seeing the multipath disk19 and the offline disk and don't know what to do. I tried "zpool clear piggy 11006946617288866407", but that doesn't work. I'm not sure what will happen if i do "zpool detach piggy 11006946617288866407"

This is all on a live system with vm's running. So i need to be carefull. I do have backups and replicate every 2 hours to another smaller freenas server.

I really need some help to figure out how to get the pool out of degraded state.
 

Attachments

  • piggy.png
    piggy.png
    128.3 KB · Views: 521

Mr_N

Patron
Joined
Aug 31, 2013
Messages
289
If your hot spare replaced the failing disk, you just add another hot spare, pulling the initial hot spare will just degrade the pool again...
 

remonv76

Dabbler
Joined
Dec 27, 2014
Messages
49
Huh? Have you read my message? I did not pull the spare, but the failed drive and replaced it with a new one.
The new disk is seen as mulitpath/disk19
The old failed disk is also still seen as 110069... so i did "zpool replace <pool> 110069.... multipath/disk19"
The failed disk was not replaced, but resilver did start on disk19.
This morning i still had the spare and disk19 as part of the vdev and the old failed drive id was also still part of the vdev.

Because the spare was still a part of the raidz2, i issued a zpool detach <pool> <spare>. That worked.
So now you can see that disk19 and the old failed drive id is part of the vdev and the zpool is still in degraded state

I have no idea what to do next.

Ps.
When i took the failed drive offline, the spare took over and resilvered. After the resilver, the zpool was still in degraded state. I learned that the spare is only there temporarily, until the failed drive has been replaced.
 
Last edited:

remonv76

Dabbler
Joined
Dec 27, 2014
Messages
49
So what i am thinking of is the command
"zpool detach <pool> 110069...." which detaches the old failed drive, which is currently offline and has been pulled from the array. It has been physically replaced with disk19.

I'm just not sure what will happen to disk19.....
Please take a look at the screenshot!
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
I would try the detach. If disk19 has any issues, offline it and online it.
 

remonv76

Dabbler
Joined
Dec 27, 2014
Messages
49
I think i will try that. I'm running a scrub right now, just to be sure nothing is wrong.
 

remonv76

Dabbler
Joined
Dec 27, 2014
Messages
49
Ok so I detached the failed disk 11006946617288866407 and everything seems normal. Zpool is healthy.

None the less, if you look at the screenshot, the multipath disk19 does not have an gptid and is added as multipath/disk19.

I've got a feeling i did something wrong. I don't think it has any impact though, but still, i would think disk19 would get it's own gptid label. But that is not the case.
 

Attachments

  • piggy2.png
    piggy2.png
    121.7 KB · Views: 509

darkwarrior

Patron
Joined
Mar 29, 2015
Messages
336
You are probably getting this output because you did things through command line, behind the back of the GUI ....
GUI is supposed to be in charge of everything on your "appliance "
 

Mr_N

Patron
Joined
Aug 31, 2013
Messages
289
Isn't the point of hotspares to replace failed/failing drives...
You follow the manual for replacing a drive (using the hotspare), and then at some point after everything is back to normal you physically add another hotspare and remove the failed drive at same time?
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
If you want a drive to be added by gptid you need to add it by gptid. Or use the GUI.

You can fix by offlining, wiping and replacing with itself. Or the other hot spare. (From the GUI)
 

remonv76

Dabbler
Joined
Dec 27, 2014
Messages
49
Well that was the problem. I used the very buggy FreeNAS 9.10.2.U1. Never use this version btw.
I was not able to use the GUI to replace the failed drive. The GUI got stuck, multiple times. This is probably because i use multipath to the DAS

The failed drive was replaced by the hotspare, but the complete zpool remained in degraded state and the failed drive was "replacing-5" state. I think I should have left the spare as the new disk and then just detached the failed drive. After that everything should have been ok.

I will write down this procedure, just in case it happens again. Thank you all for the reply. I will leave it for now, because it's just a label which isn't right. The disk itself is running perfectly.
 
Last edited by a moderator:

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
hm, so, if I interpret this information right, a hot spare drive is not smoothly incorporated into a failing raidz vdev in the sense that the system has understood and 'relabeled' the hotspare to become a 'native' drive in the raidz?
...but rather that the hot spare is a "temporary fix" that still require the manual replacement of the missing drive in the failing raidz. The system needs to be told that the hot spare is the <new expected drive> to fill this spot? Is this correct?
 
Status
Not open for further replies.
Top