SOLVED Drive Swap With Spare

Joined
May 2, 2017
Messages
211
So I had a drive start throwing bad blocks. I offlined it in the GUI, my spare drive kicked in... Great.

I shut down, pulled the drive and plugged in another disk, and rebooted. Now I have this.

1.png


2.png


So where to from here? I've tried clicking the "/dev/gptid..." and selecting the replace command. It resilvers and comes back here, but now "replace" doesn't have a member disk to replace it with. Where is my "ada3p2" that the spare is supposed to be filling in for?
 
Joined
May 2, 2017
Messages
211
Tried a reboot.

Now "ada3p2" is back but my spare which used to be "ada5p2" is gone and it's currently replacing "ada4p2"? I have to say, the process to swap a drive is incredibly confusing. It seems like my spare drive is more of a hassle than it's worth. This should have been a "pop out failed and replace with new drive... reboot".
3.png
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I have written this quite a few times in other posts (so maybe some searching for replace spare could have helped)...

There is more than one possible path to follow when a spare has been activated, so the system can't just pick one for you (since you might not have wanted that option)...

Either:

Keep the spare where it is and detach the failed drive (optionally adding a new spare at the end of that)

Replace the failed drive and return the spare to being spare.

zpool detach is the command you will need to indicate which of the disks you want to either remove from the pool or return to spares.
 
Joined
May 2, 2017
Messages
211
Well, I see lots of different scenarios when searching, and I would prefer not to break things in my specific case.

So, before I screw this up... In the exact same slot, I added another drive and wiped it. It ends "N9U" and at the moment is not a part of my pool.

1.png


Even after a reboot, my spare did not kick in. Perhaps I'm misunderstanding the concept of a spare, but shouldn't it now use the disk shown above to resilver and go back to being a spare? It kicked in to automatically replace the offlined "ada3", but it doesn't know to rebuild after it's replaced? The documentation and GUI are not clear, which is why the forums are littered with these types of questions, I suppose.

Right now, the drive ending "536" below is the old disconnected drive.

2.png


What steps should I do right now to remove the missing "536" above, and tell the spare to recover using the "N9U" I added? Should I detach the "536" and then the spare automatically recovers?

Thanks for the help.
Steve
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Click the 3 dots next to the removed drive, and select Replace. From the pulldown, select your new unassigned drive. This will return your spare drive to spare status, and start a resilver with the new drive.
 
Joined
May 2, 2017
Messages
211
Did that. Shows it's resilvering. I did this step before, and it didn't show the new drive as an option. It simply went from being a 5 disk pool to a 4 disk pool after booting. This is the part I think causes confusion. I'm never sure if the spare automatically kicks in, when it really needs a reboot for the drive to show up. I think people do things thinking it's not working, when it in fact doesn't reflect properly in the GUI in some cases. Maybe another reboot did the trick.

Anyway, thanks for the help. I'll report back if it fails for some reason.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Yes, the GUI assumes your system has a hot-swap drive backplane. If you don't, you end up with display issues like this.
 
Joined
May 2, 2017
Messages
211
So why, even after a reboot, does zpool status still show my spare in use? In use for what? The pool is now healthy, yet it still shows like it's not a spare anymore.

1.png


2.png
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Huh. What does zpool status -v HURTLOCKER show? You may need yet another reboot cycle.
 
Joined
May 2, 2017
Messages
211
Same thing as above... Rebooted, no change.

4.png
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
OK, try taking ada5p2 offline, and then rebooting again.
 
Joined
May 2, 2017
Messages
211
So now, the pool is degraded. Now, zpool status shows the spare as available.

11.png


But the pool is expecting that spare to be a part of the vdev now?

10.png
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
OK, try clicking the 3 dots next to ada5p2 (the OFFLINE one, not the ONLINE one), and select Remove.
 
Joined
May 2, 2017
Messages
211
OK, try clicking the 3 dots next to ada5p2 (the OFFLINE one, not the ONLINE one), and select Remove.
Detach? There is no remove...
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Yes, Detach.
 
Joined
May 2, 2017
Messages
211
And now this...

Error.png


Error: concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 94, in main_worker
res = MIDDLEWARE._run(*call_args)
File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 45, in _run
return self._call(name, serviceobj, methodobj, args, job=job)
File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 39, in _call
return methodobj(*params)
File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 39, in _call
return methodobj(*params)
File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 977, in nf
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 235, in detach
self.__zfs_vdev_operation(name, label, lambda target: target.detach())
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 226, in __zfs_vdev_operation
op(target, *args)
File "libzfs.pyx", line 391, in libzfs.ZFS.__exit__
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 226, in __zfs_vdev_operation
op(target, *args)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 235, in <lambda>
self.__zfs_vdev_operation(name, label, lambda target: target.detach())
File "libzfs.pyx", line 2070, in libzfs.ZFSVdev.detach
AttributeError: 'NoneType' object has no attribute 'type'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 138, in call_method
result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self,
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1205, in _call
return await methodobj(*prepared_call.args)
File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 973, in nf
return await f(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool.py", line 1062, in detach
await self.middleware.call('zfs.pool.detach', pool['name'], found[1]['guid'])
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1248, in call
return await self._call(
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1213, in _call
return await self._call_worker(name, *prepared_call.args)
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1219, in _call_worker
return await self.run_in_proc(main_worker, name, args, job)
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1146, in run_in_proc
return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1120, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
AttributeError: 'NoneType' object has no attribute 'type'
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Joined
May 2, 2017
Messages
211
Shows 2? One shows swap and one is zfs?

gpt1.png


gpt2.png


Which? And should I do it with this drive this screwed up?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
It's the 2nd partition's raw UUID.
 
Joined
May 2, 2017
Messages
211
Okay. BTW, I appreciate all the help...

So we're on the same page, I put the offline disk back online and the pool is now healthy again... Right now, I am here.

1.png


I filed a bug, and I think I want to start a scrub to make sure we are good right now. All this makes me a bit nervous. LOL

Do you think the next step should be to use the command line to detach the drive? Right now, with both of the versions online as shown, the gpart list command still shows only two sections for swap and zfs as before. So the drive showing twice online didn't create a third entry. It's just shown twice which might be a middleware bug.

I'm going to do a scrub, and if it's still there after, I'll do whatever you think next. Maybe leave them both online and do the command line detach to see if that works? Will detach move it back as a spare if that works?
 
Top