Replace failed disk

remonv76 · Mar 5, 2017

I need some help with repacing a failed disk.

Setup:
Multipath connected disk array
21x 300GB SAS
2x hotspares

3x RAIDZ2 vDevs in zpool

When i noticed the failed disk, i took the disk offline, so the free spare would kick in and took over.
I then physically replaced the failed drive with a new one.
The new disk is seen as multipath disk19 (da11/da53)
I issued the command "zpool replace piggy 11006946617288866407 multipath/disk19" and resilvering started.
After the resilver, i detached the hotspare "zpool detach piggy gptid/dfba57f5-f64c-..."
Now i'm still seeing the multipath disk19 and the offline disk and don't know what to do. I tried "zpool clear piggy 11006946617288866407", but that doesn't work. I'm not sure what will happen if i do "zpool detach piggy 11006946617288866407"

This is all on a live system with vm's running. So i need to be carefull. I do have backups and replicate every 2 hours to another smaller freenas server.

I really need some help to figure out how to get the pool out of degraded state.

Mr_N · Mar 5, 2017

If your hot spare replaced the failing disk, you just add another hot spare, pulling the initial hot spare will just degrade the pool again...

remonv76 · Mar 5, 2017

Huh? Have you read my message? I did not pull the spare, but the failed drive and replaced it with a new one.
The new disk is seen as mulitpath/disk19
The old failed disk is also still seen as 110069... so i did "zpool replace <pool> 110069.... multipath/disk19"
The failed disk was not replaced, but resilver did start on disk19.
This morning i still had the spare and disk19 as part of the vdev and the old failed drive id was also still part of the vdev.

Because the spare was still a part of the raidz2, i issued a zpool detach <pool> <spare>. That worked.
So now you can see that disk19 and the old failed drive id is part of the vdev and the zpool is still in degraded state

I have no idea what to do next.

Ps.
When i took the failed drive offline, the spare took over and resilvered. After the resilver, the zpool was still in degraded state. I learned that the spare is only there temporarily, until the failed drive has been replaced.

remonv76 · Mar 5, 2017

So what i am thinking of is the command
"zpool detach <pool> 110069...." which detaches the old failed drive, which is currently offline and has been pulled from the array. It has been physically replaced with disk19.

I'm just not sure what will happen to disk19.....
Please take a look at the screenshot!

rs225 · Mar 5, 2017

I would try the detach. If disk19 has any issues, offline it and online it.

remonv76 · Mar 5, 2017

I think i will try that. I'm running a scrub right now, just to be sure nothing is wrong.

remonv76 · Mar 5, 2017

Ok so I detached the failed disk 11006946617288866407 and everything seems normal. Zpool is healthy.

None the less, if you look at the screenshot, the multipath disk19 does not have an gptid and is added as multipath/disk19.

I've got a feeling i did something wrong. I don't think it has any impact though, but still, i would think disk19 would get it's own gptid label. But that is not the case.

darkwarrior · Mar 6, 2017

You are probably getting this output because you did things through command line, behind the back of the GUI ....
GUI is supposed to be in charge of everything on your "appliance "

Robert Trevellyan · Mar 6, 2017

remonv76 said:
disk19 does not have an gptid and is added as multipath/disk19

I seem to remember seeing a thread where this was resolved by replacing a drive with itself.

Mr_N · Mar 7, 2017

Isn't the point of hotspares to replace failed/failing drives...
You follow the manual for replacing a drive (using the hotspare), and then at some point after everything is back to normal you physically add another hotspare and remove the failed drive at same time?

Stux · Mar 7, 2017

If you want a drive to be added by gptid you need to add it by gptid. Or use the GUI.

You can fix by offlining, wiping and replacing with itself. Or the other hot spare. (From the GUI)

remonv76 · Mar 14, 2017

Well that was the problem. I used the very buggy FreeNAS 9.10.2.U1. Never use this version btw.
I was not able to use the GUI to replace the failed drive. The GUI got stuck, multiple times. This is probably because i use multipath to the DAS

The failed drive was replaced by the hotspare, but the complete zpool remained in degraded state and the failed drive was "replacing-5" state. I think I should have left the spare as the new disk and then just detached the failed drive. After that everything should have been ok.

I will write down this procedure, just in case it happens again. Thank you all for the reply. I will leave it for now, because it's just a label which isn't right. The disk itself is running perfectly.

Dice · Mar 19, 2017

hm, so, if I interpret this information right, a hot spare drive is not smoothly incorporated into a failing raidz vdev in the sense that the system has understood and 'relabeled' the hotspare to become a 'native' drive in the raidz?
...but rather that the hot spare is a "temporary fix" that still require the manual replacement of the missing drive in the failing raidz. The system needs to be told that the hot spare is the <new expected drive> to fill this spot? Is this correct?

Important Announcement for the TrueNAS Community.

Replace failed disk

remonv76

Dabbler

Attachments

Mr_N

Patron

remonv76

Dabbler

remonv76

Dabbler

rs225

Guru

remonv76

Dabbler

remonv76

Dabbler

Attachments

darkwarrior

Patron

Robert Trevellyan

Pony Wrangler

Mr_N

Patron

Stux

MVP

remonv76

Dabbler

Dice

Wizard

Similar threads

Important Announcement for the TrueNAS Community.

Replace failed disk

Dabbler

Attachments

Patron

Dabbler

Dabbler

Guru

Dabbler

Dabbler

Attachments

Patron

Pony Wrangler

Patron

MVP

Dabbler

Wizard

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Replace failed disk"

Similar threads