Pool not found, unknown status. Tried Recovering, ended with a Bootloop. Please Help!

Jaxseven · Aug 23, 2023

So I've been working way too many hours at work, and finally discovered my TrueNAS Core Server was not detecting its pools. Going through the shell, it said it would be possible to recover my pool if I entered "zpool import -F "pool_name". Well now it's stuck in a bootloop and I'm not sure what options I've got at this point. When booting up it says the following and then restarts to display the following again:

Waiting on devd connection
Enumerating system disks
Enumerating deom disk XML information
Enumerating disk information from database
Syncing disk 1/5
Syncing disk 2/5
Syncing disk 3/5
Syncing disk 4/5
Syncing disk 5/5
Syncing all disks complete!
Alarm clock
Starting file system checks:
Mounting local filesystems: .
Beginning pools import
Importing "pool_name"
spa.c: 8367:spa_async_request(): spa=$import async request task=2048
spa_misc.c:419:spa_load_note(): spa_load($import, config trusted): LOADED
spa_misc.419:spa_load_note(): spa_load($import, config trusted): UNLOADING
spa.c:6107:spa_import(): spa_import: importing "pool_name"
spa_misc.c:419:spa_load_note(): spa_load("pool_name", config trusted) LOADING
vdev.c:161:vdev_dbgmsg(): disk vdev '/dev/gptid/3eb784a9-0cfd-11ec-a203-4ccc6a070f4a'
 : best uberblock found for spa "pool_name". txg 12169827
spa_misc.c:419:spa_load_note():spa_load("pool_name", config untrusted): using uberblock with txg=12169827
spa_misc.c:419:spa_load_note(): spa_load("pool_name", config trusted): read 14 log space maps (14 total blocks - blksz = 131072 bytes) in 28 ms
[B]panic: VERIFY3(range_tree_space(smla->smla_rt) + sme->sme_run <= smla->smla_sm->sm_size) failed (17197940736 <= 17179869184

cpuid = 3
time = 1692834364
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0167c2c830
vpanic() at vpanic+0x3a/frame 0xfffffe0167c2c880
spl_panic() at spl_panic+0x3a/frame 0xfffffe0167c2c8e0
space_map_load_callback() at space_map_load_callback+0x7b/frame 0xfffffe0167c2c900
space_map_iterate() at space_map_iterate+0x20e/frame 0fffffe0167c2c9b0
space_map_load_length() at space_map_load_length+0x5f/frame 0xfffffe0167c2ca00
metaslab_load() at metaslab_load+0x38c/frame 0xfffffe0167c2cad0
metaslab_activate() at metaslab_activate+0x2f/frame 0xfffffe0167c2cb20
metaslab_alloc_dva() at metaslab_alloc_dva+0x8fa/frame 0xfffffe0167c2cc50
metaslab_alloc() at metaslab_alloc+0x19e/frame 0fffffe0167c2ccf0
zio_dva_allocate() at zio_dva_allocate+0xe5/frame 0xfffffe0167c2ce00
zio_execute() at zio_execute+0x9f/frame 0xfffffe0167c2ce40
taskqueue_run_locked() at taskqueue_run_locked+0x181/frame 0xfffffe0167c2cec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe0167c2cef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe0167c2cf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0167c2cf30
--- trap 0x80af7244, rip = 0x332200000000000, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100641 ]
Stopped at      kdb_enter+0x37: movq   $0,0x141d87e(%rip)
db:0:kdb.enter.defult> write cn_mute 1
cn_mute          0             =            0x1
db:0:kdb.enter.default> textdump dump
db:0:kdb.enter.default> reset
cpu_reset: Restarting BSP
cpu_reset_proxy: topped CPU "number seems to be different with each reboot IE 3"[/B]

I'm desperate to try to salvage my 10TB of data. Yes I know, RAID is not a backup and I should have had a backup solution. I just turned my back for a few days while slammed at work to come back to this. I would be eternally grateful for a solution.

artlessknave · Aug 25, 2023

1st: STOP EVERYTHING. power it off and leave it off. every reboot has a chance of making whatever is wrong worse and something is already very wrong.

2nd: read the forum rules, particularly the parts about posting your hardware and zpool topology (raidz, mirrors, etc)

3rd: you need somewhere to put the data if you can get back to it, because if you can get it up, you might not be able to keep it up, and will want to get everything ASAP, rather than scrambling trying to find disks. a single 10TB+ disk would be better than nothing.

4th: hopefully, by getting the information required to being helping, one of the members who might be able to help will be willing to try (I know a decent amount but trying to help rescue this over a forum is beyond me)

5th: you can unplug the disks and try booting the system without them to start trying to isolate if there is a hardware fault. as you havent posted your hardware, it is difficult to give anything but very generic advice.

Jaxseven · Aug 26, 2023

artlessknave said:
1st: STOP EVERYTHING. power it off and leave it off. every reboot has a chance of making whatever is wrong worse and something is already very wrong.

2nd: read the forum rules, particularly the parts about posting your hardware and zpool topology (raidz, mirrors, etc)

3rd: you need somewhere to put the data if you can get back to it, because if you can get it up, you might not be able to keep it up, and will want to get everything ASAP, rather than scrambling trying to find disks. a single 10TB+ disk would be better than nothing.

4th: hopefully, by getting the information required to being helping, one of the members who might be able to help will be willing to try (I know a decent amount but trying to help rescue this over a forum is beyond me)

5th: you can unplug the disks and try booting the system without them to start trying to isolate if there is a hardware fault. as you havent posted your hardware, it is difficult to give anything but very generic advice.

Server's been powered down for a good bit now so glad I did that. Very sorry about not reading the rules. The hardware I'm using was repurposed from an old gaming PC and is the following:

Processor: Intel i7-6700k
MoBo: MSI Z170I Gaming Pro AC Mini ITX board
RAM: 32GB of RAM. I don't remember the exact sticks, but they are not server grade or anything. Corsair Vengeance maybe.
Video card: Nvidia GTX 1070 Ti
OS Drive: Sabrent 512GB M.2 2242 NVME DRAM-Less low power
Data drives: Four 4TB Segate IronWolf Pro NAS HDD.
PSU: EVGA SFX 550w.

I know I configured my server to be able to operate with one drive failure so I believe it was in Raid-z. I ended up with 10TB of usable space in the end.

I don't believe I encrypted it if it's optional, since I don't encrypt drive when first testing a new OS so I can recover files if everything goes to hell. My immediate thought of what to do was to create an new TrueNAS installation of the same version on a removable USB drive and then see if I could import the drive pool to access the files. If that works, I'd replicate the configuration on the SSD and then get everything up an running to a baseline level. Boot up Clonezilla to clone the OS drive in case this happens again and store that on a separate drive. Unless I can pull up a shell before entering the boot loop (or using a Linux live drive to explore the OS drive) I'm not sure I'll be able to pull any log files.

NugentS · Aug 26, 2023

Just as a comment - you do not need to clone the boot disk. Make sure you keep a copy of the config file (not on the NAS) and you can rebuild the whole NAS very quickly. See @joeschmuck's multi-report script for one way of automating this process

I would want to test the hardware. If you have any spare disks (even just one) and a USB flash drive then pull all the disks from the machine,

Boot memtest and test the memory for 24 hours. Deal with any issues found

Then use a test disk (for a test Pool) and a USB Thumb drive. Build TrueNAS on the hardware, build a new pool on the test disk and make sure everything is working.

Only then put the original HDD's back in and try to import to the test TN.

My 2p worth

artlessknave · Aug 26, 2023

ok, that hardware and the general kernel panic, supported my gut suspicion: your hardware is likely the problem, not the pool. quite likely RAM.
this leaves you 2 general options:

diagnose the current hardware and try to replace only what is borked. if its the motherboard you will pretty much have to replace the whole thing.
skip the diagnostics and replace the whole thing.

my recommendation would be to get a new base system. mobo/RAM/CPU. the parts you have are recommended against due to limited reliability.
a example baseline build that is 100% on the recomendations list and can usually be found sub300$ on ebay: x9scm/e3-1230v2/32GB ECC

now, on to some basic troubleshooting. at a minimum, you need to get a memtest86+ running. this should tell you pretty quickly if RAM errors are the source of your current woes.

another other likely possibility is your disk controller, as the kernel panic appears to occur while importing storage

PSU can also be a source of this, but the one you have looks like it's an OK size and evga PSUs are usually decently reliable, but it's still a possibility.

you have a mITX mobo, but havent posted the case; is the chassis mitx only?

Jaxseven · Aug 27, 2023

artlessknave said:
ok, that hardware and the general kernel panic, supported my gut suspicion: your hardware is likely the problem, not the pool. quite likely RAM.
this leaves you 2 general options:

diagnose the current hardware and try to replace only what is borked. if its the motherboard you will pretty much have to replace the whole thing.

skip the diagnostics and replace the whole thing.

my recommendation would be to get a new base system. mobo/RAM/CPU. the parts you have are recommended against due to limited reliability.
a example baseline build that is 100% on the recomendations list and can usually be found sub300$ on ebay: x9scm/e3-1230v2/32GB ECC

now, on to some basic troubleshooting. at a minimum, you need to get a memtest86+ running. this should tell you pretty quickly if RAM errors are the source of your current woes.

another other likely possibility is your disk controller, as the kernel panic appears to occur while importing storage

PSU can also be a source of this, but the one you have looks like it's an OK size and evga PSUs are usually decently reliable, but it's still a possibility.

you have a mITX mobo, but havent posted the case; is the chassis mitx only?

Oh yeah, didn't think the case was of note. It's a Fractal Node 304. I was thinking I should probably get something more reliable because of this so any hardware recomendations I'd appreciate. I do use my server as a Plex server so should I be pairing these parts with a specific GPU for 4K transcoding? My other concern looking at the board I only see 4 SATA ports so I'm not sure where my boot drive is going to live unless it has an M.2 slot I'm not seeing.

I luckily found an old configuration backup (truenas....tar file) so I'm hopeful I can restore using that. I'd assume I wouldn't be able to move from TrueNAS CORE to SCALE with my tar file since it's FreeBSD vs Linux, but if it is possible that would be neat.

I just started the memtest86 and I'm seeing a few errors already so seems like you're right with the memory being an issue.

Jaxseven · Aug 27, 2023

Looking at hardware now, should I look to a Dell Poweredge server blade as better sever hardware? With my current hardware I'd have to move all those files to a 2.5" drives, so I'd need to fix my issue first and then build a second server I guess. I suppose I could resell the parts after the transfer.

Honestly I had these parts after an upgrade and thought it was time to experiment with TrueNAS, but I got lazy and put off building a permanent solution. Ideally I'd like to get hardware that is powerful enough for 4k transcoding and running VMs, and reliable.

I'm looking for buying guides and the first one just highlighted the Poweredge R630 but I'll keep looking for others including ones that might use the powers previously recommended.

NugentS · Aug 27, 2023

You don't want 2.5" drives - which are some off vastly expensive, low space, SMR or other issues. Great for SSD's though

With Plex you need either an Nvidia GPU or use CPU transcoding. For the first - the world is your oyster. Some have difficulties getting it to work. For GPU transcoding you need a recently recent i3,5,7,9 - but preferably not bleeding edge. Xeon will not do (there might be some workstation Xeon's that might do). Some of the i3's support ECC

Memory issues - with luck thats an easy fix

Jaxseven · Aug 27, 2023

NugentS said:
You don't want 2.5" drives - which are some off vastly expensive, low space, SMR or other issues. Great for SSD's though

With Plex you need either an Nvidia GPU or use CPU transcoding. For the first - the world is your oyster. Some have difficulties getting it to work. For GPU transcoding you need a recently recent i3,5,7,9 - but preferably not bleeding edge. Xeon will not do (there might be some workstation Xeon's that might do). Some of the i3's support ECC

Memory issues - with luck thats an easy fix

I'd like to at least reuse my GTX 1070Ti if possible. I'll need to find a Xeon that can use ECC and have a PCIE slot for my GPU. The X9SCM board seems to lack PCIE.

Jaxseven · Aug 27, 2023

Jaxseven said:
I'd like to at least reuse my GTX 1070Ti if possible. I'll need to find a Xeon that can use ECC and have a PCIE slot for my GPU. The X9SCM board seems to lack PCIE.

Sorry for the spamming. I did find this on eBay. It's a Supermicro x9SCI-LN4F board with a Xeon E3-1230v2 and 32GB of DDR3 ECC included for $65 + shipping. The board also includes PCIE 2.0 so I can at least connect my GTX 1070 Ti for my Plex transcoding. If no one has any objections I'll grab that, throw it in a Antec P101 ATX mid tower case to keep in my office closet. Not the sleekest, but it should be a decent replacement for my current hardware. Probably going to reuse my SFX PSU because I'm pretty positive the PSU is good.

artlessknave · Aug 27, 2023

Jaxseven said:
Dell Poweredge server blade

a blade server requires a chassis, so no.

Jaxseven said:
The X9SCM board seems to lack PCIE.

it has 4 PCIE. unless you mean the lack of a x16 slot? if so, yes, that is true. that's why I said baseline. anything similar or better.

Jaxseven said:
Supermicro x9SCI-LN4F

an x10 board would be desirable, as the html5 ipmi console is much less of a pain to deal with vs the java console (has both anyway). but as this is a rescue build whatever fits your budget and purposes is fine.

Jaxseven said:
includes PCIE 2.0

PCIe 1.0 should be fine for transcoding. even high end gaming video cards struggle to saturate pcie 2.0 x16 or pcie 3.0 x16 links, and the 1070 is not a high end video card.

artlessknave · Aug 27, 2023

Jaxseven said:
I luckily found an old configuration backup (truenas....tar file)

I greatly suspect you can just use the truenas install on new hardware.
aditionally, if you can sort out it's problems, you can repurpose the old build for a backup server, because having no backups...sucks.

Jaxseven · Aug 30, 2023

artlessknave said:
I greatly suspect you can just use the truenas install on new hardware.
aditionally, if you can sort out it's problems, you can repurpose the old build for a backup server, because having no backups...sucks.

Yeah I'm going to be practicing much better backup hygiene in the future.

I finished the memtest86, or at least had it running for 30 hours. Not an expert on memory tests but this doesn't look fantastic.

NugentS · Aug 30, 2023

Its not good. You say you have multiple sticks - so.....

Remove All sticks and clean the edge connectors with isopropyl alcohol
Insert One stick and test in 1st slot.
1. If good - then test same stick in each slot - this will prove the slots hopefully
2. If Fail - then test second stick in first slot.

The objective is to find a good stick that tests clean in every slot. Once you know the M/B is good you can test the rest of the sticks until you find the (or many) bad sticks.

Jaxseven · Aug 30, 2023

NugentS said:
Its not good. You say you have multiple sticks - so.....

Remove All sticks and clean the edge connectors with isopropyl alcohol

Insert One stick and test in 1st slot.

If good - then test same stick in each slot - this will prove the slots hopefully

If Fail - then test second stick in first slot.

The objective is to find a good stick that tests clean in every slot. Once you know the M/B is good you can test the rest of the sticks until you find the (or many) bad sticks.

I can try doing that, though it sounds like I should be buying new parts anyways so I can get a board and chip that support ECC memory. I don't really have a use for this board other than a NAS, so should I try reviving the hardware or just move my disks to something more appropriate?

NugentS · Aug 30, 2023

Deponds on budget, how much time you have and what you want to achieve

Jaxseven · Oct 4, 2023

HoneyBadger said:
Try mounting it read-only, as you're getting the panic from a DDT update:

zfs mount -o ro Storage

If you can do this, you'll want to copy the files to an entirely separate pool (not just a dataset)

Really dumb question, having a similar problem but I'm unsure where and how to enter the command. Is there a shell I can access before my server tries to boot up, or do I need to do a fresh install of TrueNAS and enter that into the shell before importing the pool? Is "Storage" the name of my pool?

HoneyBadger · Oct 4, 2023

Jaxseven said:
Really dumb question, having a similar problem but I'm unsure where and how to enter the command. Is there a shell I can access before my server tries to boot up, or do I need to do a fresh install of TrueNAS and enter that into the shell before importing the pool? Is "Storage" the name of my pool?

Hey @Jaxseven

The OP in the linked post was having stalls on deduplication table updates, but from the looks of your tracebacks you're having spacemap corruption which is a different issue - possibly brought on by the memory errors above in this thread. Those could be difficult to recover from and require a rollback.

You can try to boot TrueNAS in a single-user mode, or you can do a fresh install (on new media) and attempt to import read-only with zpool import -o readonly=on - you've already tried the first-level recovery, so the next level is -FX - I would recommend doing this first with -FXn where the n character means to attempt but not actually do the import. If it succeeds, remove the n and re-run the command to import the pool.

Jaxseven · Oct 13, 2023

HoneyBadger said:
Hey @Jaxseven

The OP in the linked post was having stalls on deduplication table updates, but from the looks of your tracebacks you're having spacemap corruption which is a different issue - possibly brought on by the memory errors above in this thread. Those could be difficult to recover from and require a rollback.

You can try to boot TrueNAS in a single-user mode, or you can do a fresh install (on new media) and attempt to import read-only with zpool import -o readonly=on - you've already tried the first-level recovery, so the next level is -FX - I would recommend doing this first with -FXn where the n character means to attempt but not actually do the import. If it succeeds, remove the n and re-run the command to import the pool.

Thank you for replying to me in my own thread. Am I running the commands you mentioned in the shell interface of the web GUI, or do I need to boot a specific shell on the metal? n is just a numeric value, like first attempt is 1, right? the shell interface on the web GUI, or am I doing it on the metal? I honestly bought all new hardware at this point and was deciding on how I was designing my home lab, but wanted to just copy my files over before nuking it.

Important Announcement for the TrueNAS Community.

Pool not found, unknown status. Tried Recovering, ended with a Bootloop. Please Help!

Dabbler

Wizard

Dabbler

MVP

Wizard

Dabbler

Dabbler

MVP

Dabbler

Dabbler

Wizard

Wizard

Dabbler

MVP

Dabbler

MVP

Dabbler

actually does care

Dabbler

Similar threads