Utilizing Hot Spare For Failing Drive

jbbender22

Dabbler
Joined
Aug 15, 2014
Messages
18
Hi All,

I'm having a little trouble understanding the hot spare in my FreeNAS 11.2-U6 server. I installed one a few months ago as I had a drive on hand and space in my server. The pool is a RaidZ2 with 6 x 4TB WD Reds plus another WD Red 4TB as a hot spare. I woke up this morning to an alert that says the following:

Device: /dev/ada5, Self-Test Log error count increased from 0 to 1

Since the drive hasn't straight up failed yet, I think I get why the hot spare didn't take over. I offlined the drive that had the errors, but the spare doesn't seem to be taking over. Do I need to physically remove the offlined drive for the hot spare to kick in? I wanted to get some clarification before I do that. I also tried replacing the offline drive, but the only drive I can replace it with it itself. The hot spare doesn't show up in the list. At this point would it be easier to just remove the spare from the pool and replace the failing drive?

If this is the case, I'm not sure what the hot spare is good for, unless you had a drive completely disappear.

Here's the output from zpool status:

Code:
  pool: datapool
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: scrub repaired 0 in 0 days 05:05:48 with 0 errors on Tue Oct  1 07:05:55 2019
config:

        NAME                                            STATE     READ WRITE CKSUM
        datapool                                        DEGRADED     0     0 0
          raidz2-0                                      DEGRADED     0     0 0
            gptid/289b97e7-4cf5-11e5-9fbc-0cc47a72093a  ONLINE       0     0 0
            gptid/1c564128-669f-11e7-8333-0cc47a72093a  ONLINE       0     0 0
            1117511553544088323                         OFFLINE      0     0 0  was /dev/gptid/29627420-4cf5-11e5-9fbc-0cc47a72093a
            gptid/29c47921-4cf5-11e5-9fbc-0cc47a72093a  ONLINE       0     0 0
            gptid/2a26c443-4cf5-11e5-9fbc-0cc47a72093a  ONLINE       0     0 0
            gptid/2a870bed-4cf5-11e5-9fbc-0cc47a72093a  ONLINE       0     0 0
        spares
          gptid/525c9f62-8e29-11e9-a608-0cc47a72093a    AVAIL

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:02:17 with 0 errors on Wed Oct  2 03:47:18 2019
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada0p2  ONLINE       0     0     0
            ada1p2  ONLINE       0     0     0

errors: No known data errors


Any help or other suggestions would be greatly appreciated.

Thanks in advance!
 

garm

Wizard
Joined
Aug 19, 2017
Messages
1,555
would it be easier to just remove the spare from the pool and replace the failing drive?
Yes
I'm not sure what the hot spare is good for, unless you had a drive completely disappear.
Exactly

You have a smart report indication of an issue with the drive. ZFS isn’t smart aware per say, what ZFS does is checks blocks agains their checksums. A hot spare will be put in production if a drive in a vdev completely fails.
 

jbbender22

Dabbler
Joined
Aug 15, 2014
Messages
18
Yes

Exactly

You have a smart report indication of an issue with the drive. ZFS isn’t smart aware per say, what ZFS does is checks blocks agains their checksums. A hot spare will be put in production if a drive in a vdev completely fails.

Thanks garm. I completely follow you here. Thanks for the reminder on how things work in regards to ZFS and SMART. This is exactly what I ended up doing. My new drive arrives on Thursday, and I'm thinking I'll add the new one as a hot spare in the event something much worse happens, but if I just want to replace a drive that's beginning to throw errors, I can remove the spare and replace the failing drive like I did today.

Does that seem like a wise decision, or would you advise against that for some reason?

Thank you for your help!
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,975
My new drive arrives on Thursday, and I'm thinking I'll add the new one as a hot spare
Is this a mission critical server? Is it remotely located and/or not easily accessible for drive replacement?

For your average user it would seem to me that a better option is to have a burned in cold spare available as a replacement rather than a hot spare that has been running for however long it's been installed in the server.
 

jbbender22

Dabbler
Joined
Aug 15, 2014
Messages
18
Is this a mission critical server? Is it remotely located and/or not easily accessible for drive replacement?
No, it's not, even if I'd like to think it is sometimes! It's location isn't too terrible, I recently moved so it has a number of things around it I have to move to get to it, but it's manageable.

For your average user it would seem to me that a better option is to have a burned in cold spare available as a replacement rather than a hot spare that has been running for however long it's been installed in the server.

That's fair. The fact the drive would be running unused for an unknown amount of time was a cause for concern for me too. I just figured it was a little safer to have it all ready to go. I see your point thought. When the new drive arrives, I'll do a burn in test and store it in a safe and easy to grab place.

Thanks for all your input. I really appreciate it!
 
Top