SOLVED All disks online but array shows degraded.

Joined
Dec 31, 2013
Messages
13
So here's the back story -

I woke up this morning to find my array in a degraded state. Without thinking too much, I identified the faulty drive and pulled it from the chassis. I replaced it with another drive. When I noticed it didn't re-silver right away, I started poking around. I used the gui option to extend the array with the new disk and added it as a spare.

Immediately it started resilvering and eventually completed. I thought to myself "Yay! I didn't break it" but lo and behold it's still in a degraded state.

I'm sure it's actually fine and needs some sort of metadata cleanup or something because I botched a proper procedure somewhere. All shares are up and speed seems nominal. Can anybody advise me on what to do next to clean things up? I've attached a screengrab at what I'm looking at in the GUI. The offline disk is the one I removed and for some reason, the new one (da15p2) is in two places now.

Thanks for any help on this.

Ed
 

Attachments

  • 2020-08-17 18_30_34-Window.png
    2020-08-17 18_30_34-Window.png
    47 KB · Views: 202
Last edited:
Joined
Oct 18, 2018
Messages
969
I'm sure it's actually fine and needs some sort of metadata cleanup or something because I botched a proper procedure somewhere.
In general, it is recommended that for any procedure which affects drives you check the User Guide and double-check your steps. In this case, you pulled a drive but had hot spares configured. The drive you pulled was then replaced with a hot spare, this is why it started resilvering.

When a hot spare replaces a drive due to failure it is considered temporary until you either replace the originally failed drive, in which case the hot spare goes back to being a hot spare, or formally remove the failed disk from the pool, in which case the hot spare is promoted to a permanent member of the pool. Have you attempted to click the ... for /dev/gptid/ . . . disk and removing it from the pool? Once that drive is removed from the pool, da15p2 should be promoted to permanent member and the degraded status removed.
 
Joined
Dec 31, 2013
Messages
13
So if I'm reading what you're saying correctly - My array is in fact good for the moment. The spare has taken over for the dead drive but is designated as a 'spare in use' rather then a a proper array member. I hope I understand that correctly.

In order to re-designate the drive, I have to remove the dead drive. Unfortunately, in the '...' menu I have no option to remove it. Only edit (which does nothing), online, or replace.

Also, my Freenas version is "FreeNAS-11.3-U3.2" if that helps. Do I need an update or is there a console way of removing the dead drive?

Thanks
 
Joined
Oct 18, 2018
Messages
969
So if I'm reading what you're saying correctly - My array is in fact good for the moment. The spare has taken over for the dead drive but is designated as a 'spare in use' rather then a a proper array member. I hope I understand that correctly.
More or less, yes. If you were running zfs from the command line you'd just remove the "failed" or removed drive from the pool and that should fix it.

The bug you listed certainly seems like it could be related.

I also wanted to mention a few bits about using hot spares. They are great if a drive outright fails or is removed. In these cases, as has happened here, the spare steps in and replaces the bad drive. However, they only do that when a full failure has occurred. If you set up proper reporting (you may have already done this) and replace a drive as soon as it starts to show issues such as SMART errors, you can typically replace it before it outright fails. In these latter cases the spare is never used. IMO the most typical value to be gotten out of hot spares is in situations where you do not have ready access to the server to replace failed disk.
 
Joined
Dec 31, 2013
Messages
13
Thanks for all that info. Just wanted to say that the update was definitely needed. Once done, the detach option appeared and I was able to remove the bad drive. In my bumbling around I'd have probably done that sooner as it would have made more sense to me and avoided this whole ordeal. Whether you want to say it was user error or the bug, I learned something and managed not to lose any data in the process. So I'm happy.

Thanks again.
 
Top