Only one dataset in pool is mounting correctly and 214% scrub progress

Cydget

Cadet
Joined
Dec 11, 2023
Messages
7
Alright guys, I'm probably the stereotype for how not to setup of truenas configuration(at bottom of post), but it was a first time learning experience. My system is hosted on a consumer desktop running Proxmox with an external HBA card connecting all my drives. My system has severed me well for the last year and a half, so I figured why fix something that isn't broke. Then it broke. About 2 weeks ago, I noticed the the scrub on the main pool MainRaidZ2 was taking a while(5 days and got 45% complete) , but after looking online it seemed like that long scrubs we an expected thing. On the 6th day, the server locked up and I had to power cycle computer. Proxmox boots like normal, and then I launch Truenas inside proxmox like normal. Proxmox takes about 30mins to 1hr to boot up where the web interface is active. The MainRaidZ2 pool is mounted, but only one of the datasets is accessible at /mnt/MainRaidZ2/rootdataset The other two pools do not mount, but the disks are present. If I release them in the GUI -> storage-> pools , I can reimport them and those two pools(PurplePool/VMssd) work again. I notice that there is a task pool.on_boot.import (something like that) boot that never finished even after waiting a day. Next, I try powering off the system, pulling all the drives from the MainRaidZ2 pool and boot back up proxmox and truenas. Truenas boots nearly instantly this time, and both the (PurplePool/VMssd) pools are present. I then reinsert the MainRaidZ2 drives, and truenas detects all 11 of them. I export the entire pool. and I was going to try reimporting it again like I did with the (PurplePool/VMssd) pools. This time it doesnt import on first try as it said /mnt/MainRaidZ2/some sub folders already existed and that I needed to use the force option. I enabled force, and it mounts in a locked state. I realized the scrub was still at 45% and was going much faster, so I decided to wait until it finished before I tried unlocking the pool. I come back the next day, and the pool is at 214% scrubbed.

Capture.PNG


I decide to try unlocking the pool now. It asks me for my key file and I supply it. All seems well, but I'm back in the same state as before where one of 3 datasets in the MainRaidZ2 pool doesnt mount to /mnt/MainRaidZ2/Rootdataset/PlexDrive.

Capture2.PNG


1702348976899.png

Looking at the /mnt/ folder I'm able to see files in the /mnt/MainRaidZ2/Rootdataset/Sharednetworkdrive dataset, but there is no files in the /mnt/MainRaidZ2/Rootdataset/PlexDrive folder.

There are also some other folders in /mnt, but they do not have files in them. (one of them actually has folders that should be in plex drive folder, but those folders are empty)
1702349049053.png

  • Motherboard make and model
    • b450 chipset I think
  • CPU make and model
    • Ryzen 2700x
  • RAM quantity
    • 64gb but only 32gb allocated in Proxmox to truenas
  • Hard drives, quantity, model numbers, and RAID configuration, including boot drives
    • 3x 14TB WD Elements(shucked) + 8x14tb seagate EXOS X16 Hard drives (all together in one pool named: MainRaidZ2) (Raid Z2 configuration)(encryption enabled/I have the key file) (this pool is about 24% full /27TB used)
    • 1x256 gb ssd for truenas scale apps (pool name: VMssd)(no encryption) (this pool is about 3% full)
    • 1x2tb wd purple (pool name: PurplePool)(no encryption) (this pool is about 5% full)
    • 1x50gb virtual drive from proxmox (boot drive) (no encryption)-> real drive is 256gb ssd.
  • Hard disk controllers
    • Dell 012DNW H200 HBA raid controller(flashed to lsi IT mode)(took long time to get it flashed)
    • EMC KTN-STL3 15-Bay used as jbod enclosure ( holds all drives listed above with the exception of the virtual boot drive)
  • Network cards
    • virtual network card(VirtIO)
 

Cydget

Cadet
Joined
Dec 11, 2023
Messages
7
When I click on PlexDrive in the Datasets dropdown, I get the error:
[EFAULT] Failed retreiving USER quotas for MainRaidZ2/RootDataset/PlexDrive
add_circle_outline

Code:
 Error: concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 760, in get_quota
    quotas = resource.userspace(quota_props)
  File "libzfs.pyx", line 465, in libzfs.ZFS.__exit__
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 760, in get_quota
    quotas = resource.userspace(quota_props)
  File "libzfs.pyx", line 3532, in libzfs.ZFSResource.userspace
libzfs.ZFSException: cannot get used/quota for MainRaidZ2/RootDataset/PlexDrive: dataset is busy

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 115, in main_worker
    res = MIDDLEWARE._run(*call_args)
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 46, in _run
    return self._call(name, serviceobj, methodobj, args, job=job)
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call
    return methodobj(*params)
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call
    return methodobj(*params)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 762, in get_quota
    raise CallError(f'Failed retreiving {quota_type} quotas for {ds}')
middlewared.service_exception.CallError: [EFAULT] Failed retreiving USER quotas for MainRaidZ2/RootDataset/PlexDrive
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 204, in call_method
    result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self)
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1344, in _call
    return await methodobj(*prepared_call.args)
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1378, in nf
    return await func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/pool.py", line 4112, in get_quota
    quota_list = await self.middleware.call(
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1395, in call
    return await self._call(
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1352, in _call
    return await self._call_worker(name, *prepared_call.args)
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1358, in _call_worker
    return await self.run_in_proc(main_worker, name, args, job)
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1273, in run_in_proc
    return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1258, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
middlewared.service_exception.CallError: [EFAULT] Failed retreiving USER quotas for MainRaidZ2/RootDataset/PlexDrive
 

Cydget

Cadet
Joined
Dec 11, 2023
Messages
7
I've tried quite a few things.
  • Removed other pools/drives from system (removed VMssd and PurplePool)
  • Dumped logs and then reverted to old truenas config( 2months ago )
  • (rolling back to previous config on truenas)->
    • Same issue where all other datasets on the MainRaidZ2 pool load except for PlexDrive
    • Made me think that the issue is not tied to truenas, but possibly the pool itself.
  • Tried not using Truenas and just mounting the drives in proxmox directly.
    • zpool import MainRaidZ2
    • zfs load-key MainRaidZ2
    • zfs set mountpoint=/mnt/MainRaidZ2 MainRaidZ2
    • zfs import -f /mnt MainRaidZ2
    • zfs mount -a ( Proxmox webshell stays stuck here) opened new tab, and was able to see other folders mounted
  • 1702444034906.png
    1702444068844.png
Essentially, I think this shows the PlexDrive dataset inside that pool is having issues mounting/unlocking as it should have 24TB of files in that folder but the folder appears empty. I don't believe this is a permission issue, but I'm open to suggestions on how to proceed
 

Cydget

Cadet
Joined
Dec 11, 2023
Messages
7
I've tried quite a few things.
  • Removed other pools/drives from system (removed VMssd and PurplePool)
  • Dumped logs and then reverted to old truenas config( 2months ago )
  • (rolling back to previous config on truenas)->
    • Same issue where all other datasets on the MainRaidZ2 pool load except for PlexDrive
    • Made me think that the issue is not tied to truenas, but possibly the pool itself.
  • Tried not using Truenas and just mounting the drives in proxmox directly.
    • zpool import MainRaidZ2
    • zfs load-key MainRaidZ2
    • zfs set mountpoint=/mnt/MainRaidZ2 MainRaidZ2
    • zfs import -f /mnt MainRaidZ2
    • zfs mount -a ( Proxmox webshell stays stuck here) opened new tab, and was able to see other folders mounted
  • View attachment 73387 View attachment 73388
Essentially, I think this shows the PlexDrive dataset inside that pool is having issues mounting/unlocking as it should have 24TB of files in that folder but the folder appears empty. I don't believe this is a permission issue, but I'm open to suggestions on how to proceed
1702444638222.png

Confirmed that it did not mount. I guess I need to figure out how to make zfs mount that datashare
 

Cydget

Cadet
Joined
Dec 11, 2023
Messages
7
View attachment 73389
Confirmed that it did not mount. I guess I need to figure out how to make zfs mount that datashare
Holy cow. Came back today after work, and it looks like the pool/mainraidz2 dataset actually did mount in proxmox.
1702511431314.png

1702511481210.png

^ above pictures are two shells I had open open to proxmox. I entered the commands yesterday both shells appeared to have hung after entering in the last command
1702509115493.png

^picture shows files present in drive(this is the first shell instance I had to proxmox. It hung so I opened the other ones)

Long story short. If you run into the same symptoms as me with one of your datasets not mounting in truenas scale, try using a different os(proxmox in my case). I plan on backing up the 24 tb to another pool and then destroying and recreating the MainRaidZ2 pool but without any encryption/deduplication settings enabled to prevent any future related issues.
 
Last edited:
Top