Hot Spare not detaching from pool. Help me, Obi-Wan Kenobi. You're my only hope.

drwoodcomb

Explorer
Joined
Sep 15, 2016
Messages
74
4 days ago I had a drive failure and the hot-spare kicked in. I replaced the defective drive and it resilvered. I read in another thread that the hot-spare should automatically go back to being a hot-spare once the replaced drive is resilvered. The resilivering finished 4 days ago with 0 Errors but the hot-spare is still in my pool. If I try to remove it I get the following error:
libzfs.ZFSException: Pool busy; removal may already be in progress

If I try and detach it I get the following error:
AttributeError: 'NoneType' object has no attribute 'type'

Please help me. I have 2 other drives throwing up SMART errors and would like to get my pool in top health before I lose data. Any help would be greatly appreciated. Here is my zpool status:
Code:
root@freenas[~]# zpool status
  pool: NASvolume
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: resilvered 644K in 00:00:04 with 0 errors on Thu Oct 14 03:35:58 2021
config:

        NAME                                              STATE     READ WRITE CKSUM
        NASvolume                                         ONLINE       0     0     0
          raidz2-0                                        ONLINE       0     0     0
            spare-0                                       ONLINE       0     0     0
              gptid/2b0f7b56-2b75-11ec-a0b9-0cc47a710628  ONLINE       0     0     0
              gptid/fea4b15e-c410-11ea-a06c-0cc47a710628  ONLINE       0     0     0
            gptid/0d02354d-c3eb-11ea-a06c-0cc47a710628    ONLINE       0     0     0
            gptid/d056aee7-5ffc-11e9-b7a3-0cc47a710628    ONLINE       0     0     0
            gptid/d606c37c-5ffc-11e9-b7a3-0cc47a710628    ONLINE       0     0     0
            gptid/dbf993cb-5ffc-11e9-b7a3-0cc47a710628    ONLINE       0     0     0
            gptid/9b05a82e-0f5e-11eb-9d64-0cc47a710628    ONLINE       0     0     0
            gptid/e797fdc4-5ffc-11e9-b7a3-0cc47a710628    ONLINE       0     0     0
            gptid/ed3fc731-5ffc-11e9-b7a3-0cc47a710628    ONLINE       0     0     0
            gptid/f32cef26-5ffc-11e9-b7a3-0cc47a710628    ONLINE       0     0     0
            gptid/f8ffbe3e-5ffc-11e9-b7a3-0cc47a710628    ONLINE       0     0     0
        spares
          gptid/fea4b15e-c410-11ea-a06c-0cc47a710628      INUSE     currently in use

errors: No known data errors

  pool: freenas-boot
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 00:01:19 with 0 errors on Tue Oct 12 03:46:19 2021
config:

        NAME          STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          ada2p2      ONLINE       0     0     0

errors: No known data errors
root@freenas[~]#


I should also mention that some really weird stuff happened after the drive failure. The hot spare I installed had been thoroughly tested to be a known good drive and was even having short and long SMART tests being run on it every month. When it kicked in to take over for the bad drive it suddenly showed 559 READ errors (see picture below). After restarting the TrueNAS machine those errors were gone. I have no idea whats going on. I've had drives fail before but this is the first time I tried having a hot-spare. It seems like it's more headache than its worth
hot-spare read erros.png
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
After restarting the TrueNAS machine those errors were gone.
That's the same as doing a zpool clear (which sets the error counts to zero), so nothing unexpected there.

if you want to return the spare, zpool detach NASvolume gptid/fea4b15e-c410-11ea-a06c-0cc47a710628 should cover it.

You may then want to look into if that drive is really OK or not by looking at the SMART data (and doing a long test if you haven't recently) smartctl -a /dev/ada3 and if you need to launch a long test smartctl -t long /dev/ada3
 

drwoodcomb

Explorer
Joined
Sep 15, 2016
Messages
74
That's the same as doing a zpool clear (which sets the error counts to zero), so nothing unexpected there.

if you want to return the spare, zpool detach NASvolume gptid/fea4b15e-c410-11ea-a06c-0cc47a710628 should cover it.

You may then want to look into if that drive is really OK or not by looking at the SMART data (and doing a long test if you haven't recently) smartctl -a /dev/ada3 and if you need to launch a long test smartctl -t long /dev/ada3

Oh my god! Thank you! I’ve been loosing sleep over this and now kicking myself that I didn’t notice your response to my post until just now.

Thank you, it worked like a charm.
 
Joined
Jul 3, 2015
Messages
926
if you want to return the spare, zpool detach NASvolume gptid/fea4b15e-c410-11ea-a06c-0cc47a710628 should cover it.
This is helpful but is there any reason why over 12 months later this still isn't fixed in the appliance? Hot-spares used to work in older versions of FreeNAS without needing this command to return normality to your pool yet this command is still required in TrueNAS Core 13.0-U3.1.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
This is helpful but is there any reason why over 12 months later this still isn't fixed in the appliance?
I don't recall ever finding a way to deal with it in the GUI... can you specify which version that was (where it last worked) and I'll try to find how that worked and see if it can also work in the current version.
 
Joined
Jul 3, 2015
Messages
926
Im fairly sure it worked in 11.2 but can't say exactly when it stopped working.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Im fairly sure it worked in 11.2 but can't say exactly when it stopped working.
So when I go to 11.2 (legacy UI) with a spare configured and kill a disk to have the spare kick in, I see options to replace or detach the missing disk.

I guess that covers the option where you want the spare to become a permanent member of the pool. And for the pool to no longer have spares (until you add another one).

I don't see how to get to the other option from the GUI though (returning the spare to spares after replacing the original failed disk).

I'll give that a shot under 13...

On the way there via 12, I confirm 12 works the same as 11.2

And same for 13...

I'm not convinced there was ever a working cohesive handling of spares in the GUI for the (IMHO more useful) "return spare to spares" option.

In 12 and 13, the option is there to detach the spare, but fails claiming (incorrectly) that a top level VDEV can't be removed.

in 11.2, the option wasn't there at all.
 
Joined
Jul 3, 2015
Messages
926
I’ll need to test on 11.2 but it defo worked in older versions. If you pulled a disk in your pool with a hot-spare the spare would activate. If you then replaced the missing/failed drive once resilver is complete the spare would return to being a spare.
 
Joined
Jul 3, 2015
Messages
926
Yes you’re right it doesn’t work on 11.2. Just trying 11.1-U7 but can’t find anything earlier on the download site. If this doesn’t work it must have been the version 9 days of old.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
In any case, 2 more things...

1. I was being an idiot and not remembering that of course detaching won't work in 12 or 13 if the pool was created on 11.2 and not upgraded since that feature only came in 12.something... I'll try that again on 13 and see... actually nope, still the same even with an upgraded pool (works from CLI in both cases anyway).

2. Even if it did once upon a time work (I suspect maybe you're remembering HP Smart Array or Dell PERC modes of operation rather than TrueNAS somehow), the fact remains that there are 2 (some would say equally) valid ways to exit the spare in use situation, so I don't know why iX would have chosen to force one of them on the users.
 
Joined
Jul 3, 2015
Messages
926
Thanks for the info. I’ve only ever used LSI cards so won’t have been a HP or Dell thing. As I haven’t actually used hot-spares in production for the last 6-7 years my guess would be it was a thing in 9.10 or 9.3. But as I can’t find those versions anymore I can’t test. I’m told the feature works in SCALE although I haven’t tried it yet but if so it’s just a BSD thing.
 
Top