[TrueNAS 12.0-U1] Taking disk offline

tanjix

Dabbler
Joined
Dec 6, 2020
Messages
15
Hi Guys,

I have a little problem. On my TrueNAS 12.0-U1 I have a pool consisting of 16x Samsung PM1643a SSDs with 3,84TB each.
One of these disks is shown as "REMOVED" in the pool status.
Now, when trying to put that disk offline, I get this kind of long error message:

Code:
Error: Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 137, in call_method
    result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self,
  File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1195, in _call
    return await methodobj(*prepared_call.args)
  File "/usr/local/lib/python3.8/site-packages/middlewared/schema.py", line 973, in nf
    return await f(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/pool.py", line 1091, in offline
    await self.middleware.call('disk.swaps_remove_disks', [disk])
  File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1238, in call
    return await self._call(
  File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1195, in _call
    return await methodobj(*prepared_call.args)
  File "/usr/local/lib/python3.8/site-packages/middlewared/service.py", line 42, in l_fn
    return await fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/middlewared/schema.py", line 972, in nf
    args, kwargs = clean_and_validate_args(args, kwargs)
  File "/usr/local/lib/python3.8/site-packages/middlewared/schema.py", line 930, in clean_and_validate_args
    value = attr.clean(args[args_index + i])
  File "/usr/local/lib/python3.8/site-packages/middlewared/schema.py", line 470, in clean
    raise Error(self.name, 'Item#{0} is not valid per list types: {1}'.format(index, found))
middlewared.schema.Error: [disks] Item#0 is not valid per list types: [disk] null not allowed


What can I do to take this disk offline?

Any help would be appreciated! Thanks a lot!
 

Attachments

  • ed375666-6c7e-44ba-a31c-dee7e7d9da7b.png
    ed375666-6c7e-44ba-a31c-dee7e7d9da7b.png
    490.5 KB · Views: 194
Joined
Jan 18, 2017
Messages
525
I may be mistaken but I believe you will have to replace the removed drive before that entry will go away, the pool is degraded and will stay that way until you do.
 

tanjix

Dabbler
Joined
Dec 6, 2020
Messages
15
Joined
Jan 18, 2017
Messages
525
This is known bug and appears to be fixed in the next update.
Right now you should make sure you have a backup of the data, definitively identify which device has failed, shut down the machine (if your environment allows it), remove the failed drive, install the new tested drive and reboot the machine. You should have no problem following the rest of the replacement procedure after that.
 

blanchet

Guru
Joined
Apr 17, 2018
Messages
516
Try the command line:
Code:
zpool offline HighPerfSSD45TB  gptid/2034c4ae-42f8-11eb-8ec3-90e2ba48ff6c
 

tanjix

Dabbler
Joined
Dec 6, 2020
Messages
15
Try the command line:
Code:
zpool offline HiPerfSSD45TB  gptid/2034c4ae-42f8-11eb-8ec3-90e2ba48ff6c

That worked, the disk is now shown as "OFFLINE". :wink:
Can I do a rescan of the controller now and try to re-add that disk to the pool? If so, how?
The reason behind that ist, because I do not think that disk is defective. They all were bought 3 weeks ago and tested and worked all fine.
 

blanchet

Guru
Joined
Apr 17, 2018
Messages
516
Easier version
Code:
reboot

More difficult version
Code:
camcontrol rescan all
zpool online HiPerfSSD45TB  gptid/2034c4ae-42f8-11eb-8ec3-90e2ba48ff6c


If something goes wrong, then check if the disk is really present or not with disklist.pl
Code:
disklist.pl -all
 

tanjix

Dabbler
Joined
Dec 6, 2020
Messages
15
Hi,

thanks for your input!

Easier version
Code:
reboot

That's unfortunately not possible as it's a production server.

More difficult version
Code:
camcontrol rescan all
zpool online HiPerfSSD45TB  gptid/2034c4ae-42f8-11eb-8ec3-90e2ba48ff6c

root@iscsi01[~]# camcontrol rescan all
Re-scan of bus 0 was successful
Re-scan of bus 1 was successful
Re-scan of bus 2 was successful
Re-scan of bus 3 was successful
Re-scan of bus 4 was successful
Re-scan of bus 5 was successful
Re-scan of bus 6 was successful
Re-scan of bus 7 was successful
Re-scan of bus 8 was successful
Re-scan of bus 9 was successful
Re-scan of bus 10 was successful
Re-scan of bus 11 was successful
Re-scan of bus 12 was successful
Re-scan of bus 13 was successful
Re-scan of bus 14 was successful
root@iscsi01[~]# zpool online HiPerfSSD45TB gptid/2034c4ae-42f8-11eb-8ec3-90e2ba48ff6c
warning: device 'gptid/2034c4ae-42f8-11eb-8ec3-90e2ba48ff6c' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present
root@iscsi01[~]#

If something goes wrong, then check if the disk is really present or not with disklist.pl
Code:
disklist.pl -all

That's indeed not showing that disk, I only see 15 of these 3,84TB disks instead of 16. So it might be really faulty.
root@iscsi01[~]# ./disklist.pl -all
partition fs label zpool zpool-location zpool-mount device sector disk size type serial rpm
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
da4p2 freebsd-zfs gptid/16b82a80-42f8-11eb-8ec3-90e2ba48ff6c HiPerfSSD45TB HiPerfSSD45TB/raidz2-0 /mnt/HiPerfSSD45TB da4 512 SAMSUNG MZILT3T8HBLS/007 3840 SSD S5G0NE0N910356 0
da5p2 freebsd-zfs gptid/1ec4df3b-42f8-11eb-8ec3-90e2ba48ff6c HiPerfSSD45TB HiPerfSSD45TB/raidz2-0 /mnt/HiPerfSSD45TB da5 512 SAMSUNG MZILT3T8HBLS/007 3840 SSD S5G0NE0N910424 0
da6p2 freebsd-zfs gptid/27d23be5-42f8-11eb-8ec3-90e2ba48ff6c HiPerfSSD45TB HiPerfSSD45TB/raidz2-0 /mnt/HiPerfSSD45TB da6 512 SAMSUNG MZILT3T8HBLS/007 3840 SSD S5G0NE0N910475 0
da7p2 freebsd-zfs gptid/1ca7069c-42f8-11eb-8ec3-90e2ba48ff6c HiPerfSSD45TB HiPerfSSD45TB/raidz2-0 /mnt/HiPerfSSD45TB da7 512 SAMSUNG MZILT3T8HBLS/007 3840 SSD S5G0NE0N909989 0
da12p2 freebsd-zfs gptid/21e128c0-42f8-11eb-8ec3-90e2ba48ff6c HiPerfSSD45TB HiPerfSSD45TB/raidz2-0 /mnt/HiPerfSSD45TB da12 512 SAMSUNG MZILT3T8HBLS/007 3840 SSD S5G0NE0N910471 0
da13p2 freebsd-zfs gptid/2485675e-42f8-11eb-8ec3-90e2ba48ff6c HiPerfSSD45TB HiPerfSSD45TB/raidz2-0 /mnt/HiPerfSSD45TB da13 512 SAMSUNG MZILT3T8HBLS/007 3840 SSD S5G0NE0N910466 0
da14p2 freebsd-zfs gptid/23271678-42f8-11eb-8ec3-90e2ba48ff6c HiPerfSSD45TB HiPerfSSD45TB/raidz2-0 /mnt/HiPerfSSD45TB da14 512 SAMSUNG MZILT3T8HBLS/007 3840 SSD S5G0NE0N910417 0
da15p2 freebsd-zfs gptid/21977da4-42f8-11eb-8ec3-90e2ba48ff6c HiPerfSSD45TB HiPerfSSD45TB/raidz2-0 /mnt/HiPerfSSD45TB da15 512 SAMSUNG MZILT3T8HBLS/007 3840 SSD S5G0NE0N910359 0
da16p2 freebsd-zfs gptid/283b7276-42f8-11eb-8ec3-90e2ba48ff6c HiPerfSSD45TB HiPerfSSD45TB/raidz2-0 /mnt/HiPerfSSD45TB da16 512 SAMSUNG MZILT3T8HBLS/007 3840 SSD S5G0NE0NA00187 0
da17p2 freebsd-zfs gptid/28d5ff1c-42f8-11eb-8ec3-90e2ba48ff6c HiPerfSSD45TB HiPerfSSD45TB/raidz2-0 /mnt/HiPerfSSD45TB da17 512 SAMSUNG MZILT3T8HBLS/007 3840 SSD S5G0NE0N910474 0
da18p2 freebsd-zfs gptid/28fa30b1-42f8-11eb-8ec3-90e2ba48ff6c HiPerfSSD45TB HiPerfSSD45TB/raidz2-0 /mnt/HiPerfSSD45TB da18 512 SAMSUNG MZILT3T8HBLS/007 3840 SSD S5G0NE0N910363 0
da19p2 freebsd-zfs gptid/26bade72-42f8-11eb-8ec3-90e2ba48ff6c HiPerfSSD45TB HiPerfSSD45TB/raidz2-0 /mnt/HiPerfSSD45TB da19 512 SAMSUNG MZILT3T8HBLS/007 3840 SSD S5G0NE0NA00057 0
da20p2 freebsd-zfs gptid/1c1cc716-42f8-11eb-8ec3-90e2ba48ff6c HiPerfSSD45TB HiPerfSSD45TB/raidz2-0 /mnt/HiPerfSSD45TB da20 512 SAMSUNG MZILT3T8HBLS/007 3840 SSD S5G0NE0N910476 0
da21p2 freebsd-zfs gptid/24fb1633-42f8-11eb-8ec3-90e2ba48ff6c HiPerfSSD45TB HiPerfSSD45TB/raidz2-0 /mnt/HiPerfSSD45TB da21 512 SAMSUNG MZILT3T8HBLS/007 3840 SSD S5G0NE0NA00188 0
da22p2 freebsd-zfs gptid/25a2c1c3-42f8-11eb-8ec3-90e2ba48ff6c HiPerfSSD45TB HiPerfSSD45TB/raidz2-0 /mnt/HiPerfSSD45TB da22 512 SAMSUNG MZILT3T8HBLS/007 3840 SSD S5G0NE0N910467 0
da0p2 freebsd-zfs gptid/28c33eff-0b33-11eb-94e6-90e2ba48ff6c Pool Pool/raidz2-0 /mnt/Pool da0 512 ATA CT2000MX500SSD1 2000 SSD 1948E229C934 0
da1p2 freebsd-zfs gptid/2902845b-0b33-11eb-94e6-90e2ba48ff6c Pool Pool/raidz2-0 /mnt/Pool da1 512 ATA CT2000MX500SSD1 2000 SSD 1948E229C94C 0
da2p2 freebsd-zfs gptid/28ec05ec-0b33-11eb-94e6-90e2ba48ff6c Pool Pool/raidz2-0 /mnt/Pool da2 512 ATA CT2000MX500SSD1 2000 SSD 1948E229CA3F 0
da3p2 freebsd-zfs gptid/27e9c095-0b33-11eb-94e6-90e2ba48ff6c Pool Pool/raidz2-0 /mnt/Pool da3 512 ATA CT2000MX500SSD1 2000 SSD 1948E229C21D 0
da8p2 freebsd-zfs gptid/2845d5ef-0b33-11eb-94e6-90e2ba48ff6c Pool Pool/raidz2-0 /mnt/Pool da8 512 ATA CT2000MX500SSD1 2000 SSD 1948E229B673 0
da9p2 freebsd-zfs gptid/28ccc586-0b33-11eb-94e6-90e2ba48ff6c Pool Pool/raidz2-0 /mnt/Pool da9 512 ATA CT2000MX500SSD1 2000 SSD 1948E229CA49 0
da10p2 freebsd-zfs gptid/28549c86-0b33-11eb-94e6-90e2ba48ff6c Pool Pool/raidz2-0 /mnt/Pool da10 512 ATA CT2000MX500SSD1 2000 SSD 1948E229B5E0 0
da11p2 freebsd-zfs gptid/284638fb-0b33-11eb-94e6-90e2ba48ff6c Pool Pool/raidz2-0 /mnt/Pool da11 512 ATA CT2000MX500SSD1 2000 SSD 1948E229B681 0
ada0p2 freebsd-zfs gptid/150241a7-455d-11eb-bcfc-90e2ba48ff6c boot-pool boot-pool/mirror-0 ada0 512 SSDSC2BB120G7R 120 SSD PHDV804600D3150MGN 0
ada1p2 freebsd-zfs gptid/152760da-455d-11eb-bcfc-90e2ba48ff6c boot-pool boot-pool/mirror-0 ada1 512 SSDSC2BB120G7R 120 SSD PHDV808502AM150MGN 0

25 selected disk(s)

Is there a way to identify, which disk-slot the faulty one is? I am asking becaouse quite obviously the da-devices have been mixed up. da0-da7 Have been those 2TB Crucial SSDs (before adding those 16 4TB devices). Now, where those 16 additional SSDS have been installed, da4-da7 are now mapped to the 4TB devices.
So, can I enable something that will make the respective drive bay illuminating a light so I know which drive bay contains the faulty disk? I cannot rely any more on the da-numbering as they have been mixed up...
 

blanchet

Guru
Joined
Apr 17, 2018
Messages
516
On my Supermicro servers, disklist.pl displays the disk slot number in the last column, like this
Code:
da1p2      gptid/0d8fbbd8-[...]  SAS3008(0):2#1

so then, I can use sas3ircu LOCATE to illuminate the bay.

On your server it does not work, so you can try sesutil instead
Code:
sesutil locate da0 on
 

tanjix

Dabbler
Joined
Dec 6, 2020
Messages
15
I was now able to replace the disk, as I was on-site.
However, the sesutil command worked in general (by giving no error), but no light indication was visible on the server.

But I was able to identify the failed disk and replace it. Replacing the disk through the GUI was then no problem later on.
My pool is now in a good state again, after the resilvering process finished.

Thanks for your help!
 
Top