Problem with importing pools on restart TrueNAS-SCALE-22.02-RC.1-2

cannfoddr

Dabbler
Joined
Nov 28, 2021
Messages
12
Hi, I am new to truenas and ZFS so I appologise if I am talking nonsense.

I have a test server up and running with scale and added a 5 drive raidz2 pool and started creating some shared folders etc... everything was working well apart froma problem with AVAHI and shares not showing up on my macs. I copied a few TB of files across without issue from both Macs and Windows machines using SMB and I decided to do a planned restart of the server.

After the reboot I was stuck on importing zpool step with seemingly no way forwards. I read an article that suggested booting without the pool attached. I tried this and it worked for me. The pool 'pool1' was showing as offline in the GUI. I reconnected the disks and ran a zpool list in the shell and could see the pool but it was still offline in the gui. I ran a zpool export command from the shell which worked without error and I also ran the export from the gui. The pool is now gone:

Code:
root@truenas[~]# zpool list       
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
boot-pool   216G  16.5G   199G        -         -     0%     7%  1.00x    ONLINE  -
test        236G  31.0G   205G        -         -     2%    13%  1.00x    ONLINE  /mnt


If I go to the web gui -> storage and import a pool, it finds the pool:

pool1 | 16109214946869334733

If I then run the import I get an error:

[ENOENT] Pool 16109214946869334733 not found.

and nothing shows up in the GUI.

If I go back to the shell the pool is there and I can see my data:

Code:
root@truenas[~]# zpool list
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
boot-pool   216G  16.5G   199G        -         -     0%     7%  1.00x    ONLINE  -
pool1      18.2T  4.25T  13.9T        -         -     0%    23%  1.00x    ONLINE  /mnt
test        236G  31.0G   205G        -         -     2%    13%  1.00x    ONLINE  /mnt
root@truenas[~]# 


I am now trying a reboot with the pool online to see if that helps....it didnt the pool is still missing from the gui and missing in zpool list

I can delete the pool and start again but I would really like to try to understand what I have done wrong.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
It's not something that would be expected to happen. How did you reboot the system after setup?
Please document your hardware setup.
 

cannfoddr

Dabbler
Joined
Nov 28, 2021
Messages
12
It was a standard reboot triggered from the web GUI, I had stopped all external activity to the NAS ie. no file access.

Hardware:

Gigabyte B550M AORUS PRO MB
AMD Ryzen 5 3600 6-Core Processor
2 x 16GB Crucial DDR4-3200 ECC UDIMM
2 x M2 SSD Boot Mirror
1 x SATA SSD test pool
IBM SAS controller (reflashed)
NVIDIA Corporation G92 [GeForce 8800 GT] (rev a2)
5 x WD 4TB RED CMR SATA drives connected to IBM SAS

There have been no power outages I do have a bunch of core files being reported for around the time of the reboot I am not sure if they are relevant, or what to do with them?

root@truenas[~]# lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 7
01:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. A2000 NVMe SSD (rev 03)
02:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 43ee
02:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] Device 43eb
02:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43e9
03:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea
03:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea
03:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43ea
04:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
05:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. A2000 NVMe SSD (rev 03)
06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 16)
07:00.0 VGA compatible controller: NVIDIA Corporation G92 [GeForce 8800 GT] (rev a2)
08:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function
09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP
09:00.1 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP
09:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
09:00.4 Audio device: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Apart from SMB, were there any Apps running?
Unless someone see a potential cause in the next day, I'd recommend you "report a bug".
 

cannfoddr

Dabbler
Joined
Nov 28, 2021
Messages
12
Thanks for your feedback...

I have some Apps running: Plex, Traefik and Nextcloud.

I will give it a day or so then maybe slash and burn the existing setup and start over with a new Pool - this is still very much a test and learning exercise. My Synology is still serving the main 'production' data
 

truecharts

Guru
Joined
Aug 19, 2021
Messages
788
We can confirm this issue happens and is, in fact, a known bug that happens at times.
Be sure to file a bugreport on Jira and attach a debug dump as well.

There does not seem to be a correlation between specific Apps and this "[ENOENT] Pool 16109214946869334733 not found." error

K.S.
 

cannfoddr

Dabbler
Joined
Nov 28, 2021
Messages
12
K.S.
We can confirm this issue happens and is, in fact, a known bug that happens at times.
Be sure to file a bugreport on Jira and attach a debug dump as well.

There does not seem to be a correlation between specific Apps and this "[ENOENT] Pool 16109214946869334733 not found." error

K.S.
I will file a ticket.

What is the recommendation on next steps if any?

Adrian
 

cannfoddr

Dabbler
Joined
Nov 28, 2021
Messages
12
How long should I wait after opening a ticket before giving up on my current setup and trying again. I want to be helpful to the developers but I also want to get on with my own testing?
 

waqarahmed

iXsystems
iXsystems
Joined
Aug 28, 2019
Messages
136
@cannfoddr can you please if you have key encrypted datasets ? There was a known issue with failure to import pools on boot with system being stuck indefinitely. If not, can you please share a debug of your system and i can go through it and see what might be going ( kindly email me the debug at waqar at the rate of ixsystems.com ). Thanks!
 

cannfoddr

Dabbler
Joined
Nov 28, 2021
Messages
12
@waqarahmed To the best of my knowledge there are no encrypted datasets in this pool - I've not got arround to testing this yet :smile:
I have reported a bug on this: NAS-113532. The Debug file is attached to this ticket.
 

mattyv316

Dabbler
Joined
Sep 13, 2021
Messages
27
I am having a similar issue as @cannfoddr.
I used this tutorial to install and have 2 mirrored 400gb SSD drives partition with a boot pool and an application pool using the rest of the space. The power went out and when Scale came back up, the app pool called ssd-storage was not connected I tried to re-import it as it shows if I try to import, but I get an 'Error importing pool' message with the following detail. Is there a way to recover, or is this similar enough to the above problem where I just need to wait for a fix?

Little more info:
Pools are not encrypted
Setup is a Dell R730 XD w/ 128gb RAM
Mirrored pair of drives are in the rear 2 SFF bays.
lmk if anyone feels I am missing valuable info.
Thanks


Code:
Error: concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 97, in main_worker
    res = MIDDLEWARE._run(*call_args)
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 45, in _run
    return self._call(name, serviceobj, methodobj, args, job=job)
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 39, in _call
    return methodobj(*params)
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 39, in _call
    return methodobj(*params)
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1267, in nf
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 393, in import_pool
    self.logger.error(
  File "nvpair.pxi", line 404, in items
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 387, in import_pool
    zfs.import_pool(found, new_name or found.name, options, any_host=any_host)
  File "libzfs.pyx", line 1105, in libzfs.ZFS.import_pool
  File "libzfs.pyx", line 1133, in libzfs.ZFS.__import_pool
libzfs.ZFSException: one or more devices is currently unavailable
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/job.py", line 382, in run
    await self.future
  File "/usr/lib/python3/dist-packages/middlewared/job.py", line 418, in __run_body
    rv = await self.method(*([self] + args))
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1263, in nf
    return await func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1131, in nf
    res = await f(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/pool.py", line 1489, in import_pool
    await self.middleware.call('zfs.pool.import_pool', pool['guid'], {
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1310, in call
    return await self._call(
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1275, in _call
    return await self._call_worker(name, *prepared_call.args)
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1281, in _call_worker
    return await self.run_in_proc(main_worker, name, args, job)
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1208, in run_in_proc
    return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1182, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
libzfs.ZFSException: ('one or more devices is currently unavailable',)
 
Last edited:

mattyv316

Dabbler
Joined
Sep 13, 2021
Messages
27
I am having a similar issue as @cannfoddr.
I used this tutorial to install and have 2 mirrored 400gb SSD drives partition with a boot pool and an application pool using the rest of the space. The power went out and when Scale came back up, the app pool called ssd-storage was not connected I tried to re-import it as it shows if I try to import, but I get an 'Error importing pool' message with the following detail. Is there a way to recover, or is this similar enough to the above problem where I just need to wait for a fix?

Little more info:
Pools are not encrypted
Setup is a Dell R730 XD w/ 128gb RAM
Mirrored pair of drives are in the rear 2 SFF bays.
lmk if anyone feels I am missing valuable info.
Thanks


Code:
Error: concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 97, in main_worker
    res = MIDDLEWARE._run(*call_args)
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 45, in _run
    return self._call(name, serviceobj, methodobj, args, job=job)
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 39, in _call
    return methodobj(*params)
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 39, in _call
    return methodobj(*params)
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1267, in nf
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 393, in import_pool
    self.logger.error(
  File "nvpair.pxi", line 404, in items
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 387, in import_pool
    zfs.import_pool(found, new_name or found.name, options, any_host=any_host)
  File "libzfs.pyx", line 1105, in libzfs.ZFS.import_pool
  File "libzfs.pyx", line 1133, in libzfs.ZFS.__import_pool
libzfs.ZFSException: one or more devices is currently unavailable
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/job.py", line 382, in run
    await self.future
  File "/usr/lib/python3/dist-packages/middlewared/job.py", line 418, in __run_body
    rv = await self.method(*([self] + args))
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1263, in nf
    return await func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1131, in nf
    res = await f(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/pool.py", line 1489, in import_pool
    await self.middleware.call('zfs.pool.import_pool', pool['guid'], {
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1310, in call
    return await self._call(
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1275, in _call
    return await self._call_worker(name, *prepared_call.args)
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1281, in _call_worker
    return await self.run_in_proc(main_worker, name, args, job)
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1208, in run_in_proc
    return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1182, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
libzfs.ZFSException: ('one or more devices is currently unavailable',)
Update today to TrueNAS-SCALE-22.02-RC.2, but still getting the error above. Not sure if anyone has any advice on how to get this pool back with my apps on it or not.
 

mattyv316

Dabbler
Joined
Sep 13, 2021
Messages
27
@waqarahmed , I have re-installed since this issue, but ran into a different problem that looks to be resolved by cross-flashing my H330 Mini HBA to IT mode. My post is here.
I am wondering if my HBA might have been causing this issue as well. I might re-load and re-test this config again to see
 

enjoywithme

Dabbler
Joined
Dec 23, 2014
Messages
13
I'm using TrueNAS-SCALE-22.02.3 (migrated from freenas). If the host shutdown totally it failed to import ZFS at the first starting. I have to restart it again then the ZFS volume comes to be imported without problem.
Almost every time I have to restart it once again after shutdown.
 
Top