Rescue Full Pool

davidmyers

Cadet
Joined
Aug 14, 2017
Messages
6
My server is currently running Core 12.2 and my primary pool is currently offline. Unfortunately while waiting on a new storage server our existing one was completely filled with data (95%+). After some searching I've discovered (as others have) that high utilization of pool storage is not only less than ideal but potentially catastrophic. With that being said, I've seen mention of the pool going into a "read-only" state but that never appeared to happen for us. We were able to stop adding any new data to the server a little while ago as we waited for our new server to arrive. We went from fully-functioning to "offline" seemingly overnight without a clear indication of something that changed.

When viewing pools in the GUI, my pool is listed but is shown as "Offline" with the only option available being to "Export/Disconnect". When running "zpool status" only the boot drive is shown and my primary pool is not listed. When running "zpool import" I get the following:

pool: Primary
id: 9802...
state: ONLINE
status: Some supported features are not enabled on the pool.
action: The pool can be imported using its name or numeric identifier, though some features will not be available without an explicit 'zpool upgrade'.
config:
Primary ONLINE
raidz2-0 ONLINE
gptid/cf... ONLINE
gptid/d1... ONLINE
gptid/d2... ONLINE
gptid/d3... ONLINE
gptid/d3... ONLINE
gptid/d4... ONLINE
gptid/d4... ONLINE
gptid/d5... ONLINE
gptid/d4... ONLINE
gptid/d4... ONLINE
gptid/d4... ONLINE
gptid/d4... ONLINE
gptid/d4... ONLINE
gptid/d4... ONLINE
gptid/d4... ONLINE
gptid/d4... ONLINE

When trying to run "zpool import Primary" I receive the error: "cannot import 'Primary': I/O error; Destroy and re-create the pool from a backup source.

My interpretation of the situation is that the pool got so full that TrueNAS can no longer import it for some reason. Presumably there's no way to make changes to the datastore without importing the pool and so clearing space is not an option. Similarly, expanding the existing vdev/pool is presumably not an option without the pool being online.

I have our new server on hand and ready to go with double the capacity of our previous server. Is there any way at all to rescue our existing pool? Presumably this is rooted in ZFS being copy on write and there needing to be some amount of free space on the vdev to perform filesystem operations. Is there any way to add a cache disk to be used for this purpose so that the pool is functional and can be operated on?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
If it's really full and that's the issue, you can import it this way to read the data off:
zpool import -o readonly=on poolname

if not, you may need to try some other recovery options like -fF to potentially lose the last few transactions to get to a pool that will import.
 

davidmyers

Cadet
Joined
Aug 14, 2017
Messages
6
Thanks for the suggestions! Unfortunately I'm getting the same "I/O error" as before. I tried the "-o readonly=on" option on it's own, the "-fF" options on their own, and both of them together with all of them returning the same error.

From what I can tell, all of the hardware is functioning properly as I'm still seeing all of the disks as expected and don't have any other errors on the server. I'm assuming that the problem is rooted in the pool being full but if that were the real problem wouldn't you expect those import options to help/work? Perhaps there's another issue causing problems.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
I don't think that it is a full pool issue.

Please describe all your hardware completely, especially include disk make, model and how connected to the system board.


One note, 16 disks in a RAID-Zx vDev is a bit too many. Not as bad as we have seen on other user's NASes. But, generally 10 to 12 is considered the maximum for good speed in most cases. It is doubtful that is a problem for importing the pool. Something else is likely hanging it up.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
Thanks for the suggestions! Unfortunately I'm getting the same "I/O error" as before. I tried the "-o readonly=on" option on it's own, the "-fF" options on their own, and both of them together with all of them returning the same error.

From what I can tell, all of the hardware is functioning properly as I'm still seeing all of the disks as expected and don't have any other errors on the server. I'm assuming that the problem is rooted in the pool being full but if that were the real problem wouldn't you expect those import options to help/work? Perhaps there's another issue causing problems.
I am confused with your statements. Is the pool online or offline?
I suspect that maybe the pool is already imported but doesn't appear properly in the GUI or something like that.
You could try rebooting with an earlier boot environment. This way the details about the pool could still be there without having to re-import the pool.
 

davidmyers

Cadet
Joined
Aug 14, 2017
Messages
6
I am confused with your statements. Is the pool online or offline?
I suspect that maybe the pool is already imported but doesn't appear properly in the GUI or something like that.
You could try rebooting with an earlier boot environment. This way the details about the pool could still be there without having to re-import the pool.
GUI shows pool as being offline:

The pool is definitely offline/unavailable.

I'm not sure what you mean by "rebooting with an earlier boot environment". I didn't change the OS or "boot environment" at all so there's nothing "earlier".
 

davidmyers

Cadet
Joined
Aug 14, 2017
Messages
6
I don't think that it is a full pool issue.

Please describe all your hardware completely, especially include disk make, model and how connected to the system board.


One note, 16 disks in a RAID-Zx vDev is a bit too many. Not as bad as we have seen on other user's NASes. But, generally 10 to 12 is considered the maximum for good speed in most cases. It is doubtful that is a problem for importing the pool. Something else is likely hanging it up.
Specs:
Motherboard: X9DRi-LN4+
CPU: 2 x Xeon E5-2660v1
RAM: 64GB
HBA: LSI SAS 9211-4i in IT mode
Drives: 16 x ST33000650SS

When running "dmesg | grep mps" I get the following info:
mps0: <Avago Technologies (LSI) SAS2004> port 0xf000-0xf0ff mem 0xfbec0000-0xfbec3ffff,0xfbe80000-0xfbebffff irq 50 at device 0.0 numa-domain 1 on pci11
mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
ses1 at mps0 bus 0 scbus8 target 24 lun 0
da0 at mps0 bus 0 scbus8 target 8 lun 0
da1 at mps0 bus 0 scbus8 target 9 lun 0
da3 at mps0 bus 0 scbus8 target 11 lun 0
da4 at mps0 bus 0 scbus8 target 12 lun 0
da5 at mps0 bus 0 scbus8 target 13 lun 0
da2 at mps0 bus 0 scbus8 target 10 lun 0
da7 at mps0 bus 0 scbus8 target 15 lun 0
da6 at mps0 bus 0 scbus8 target 14 lun 0
da9 at mps0 bus 0 scbus8 target 17 lun 0
da10 at mps0 bus 0 scbus8 target 18 lun 0
da11 at mps0 bus 0 scbus8 target 19 lun 0
da15 at mps0 bus 0 scbus8 target 23 lun 0
da13 at mps0 bus 0 scbus8 target 21 lun 0
da12 at mps0 bus 0 scbus8 target 20 lun 0
da8 at mps0 bus 0 scbus8 target 16 lun 0
da14 at mps0 bus 0 scbus8 target 22 lun 0
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
GUI shows pool as being offline:

The pool is definitely offline/unavailable.

I'm not sure what you mean by "rebooting with an earlier boot environment". I didn't change the OS or "boot environment" at all so there's nothing "earlier".
Go to "System" => "Boot" section as seen on my screenshot:
1706058798505.png

You should have something similar.
Just pick an earlier version, make it active and reboot.

Another thought, your primary pool is also used as the "System Dataset".
Can you relocate the "System Dataset" to another pool or the boot pool?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Looks like your hardware is reasonable, (at least from my perspective).

It may be time to start investigating various zpool import options. But, I have little skill in that area, (as I have not needed to roll back transactions or use extreme measures...). Check the manual page for zpool-import for details, (that is where I get my information from).
 

davidmyers

Cadet
Joined
Aug 14, 2017
Messages
6
Go to "System" => "Boot" section as seen on my screenshot:
View attachment 74983
You should have something similar.
Just pick an earlier version, make it active and reboot.

Another thought, your primary pool is also used as the "System Dataset".
Can you relocate the "System Dataset" to another pool or the boot pool?
Ah, I've never looked into this menu. So my currently active "boot environment" was created in 2021 and was when I upgraded from FreeNAS 11 to TrueNAS Core. I made that active and attempted to boot but it never successfully booted. I pulled up the console and reverted to the most recent boot environment and was able to get the system to boot again.

During the boot I kept an eye on the terminal and saw the error that TrueNAS is encountering while attempting to load the pool during startup:
"spa_load_verify" failed and found 1 metadata error. I'm not sure what this metadata would be or how to go about fixing it. Searching the error resulted in some posts but nothing conclusive so far.
 

davidmyers

Cadet
Joined
Aug 14, 2017
Messages
6
The "spa_load_verify" error was able to lead me to a solution! I discovered this post with a similar problem: https://www.truenas.com/community/threads/zpool-import-fail.112785/post-787511

The solution was essentially to disable the "spa_load_verify" check for metadata and then reattempt the read-only import. That worked for me and my pool is back online. My original assumption that the issue was caused by the pool being completely full still seems plausible as there is only ~30MB free out of the total ~38TB.

Thank you for all of your help!
 
Top