Kernel panic - not syncing: VERIFY3

CaCTuCaTu4ECKuu · Mar 2, 2023

I'm kinda in loss, can find any mentions about this problem.
Lets start from the problem

Code:

VERIFY3(rs_get_end((rs, rt) >= end) failed (10359540006912 >= 10359534558144)
PANIC at range_tree.c:482:range_tree_remove_impl()
Kernel panic - not syncing: VERIFY3(rs_get_end((rs, rt) >= end) failed (10359540006912 >= 10359534558144)
CPU: 2 PID: 5553 Comm: z_wr_iss Tainted: P              IE         5.15.79+truenas #1
Hardware name: xxxxx
Call Trace:
<TASK>
dump_stack_lvl+0x46/0x5e
panic+0xf3/0x2bf
spl_panic+0xcc/0xe9 [spl]
? bt_grow_leaf+0xdc/0xe0 [zfs]
? zfs_btree_find_in_buf+0x59/0xb0 [zfs]
? pn_free+0x30/0x30 [zfs]
? pn_free+0x30/0x30 [zfs]
? zfs_btree_find_in_buf+0x59/0xb0 [zfs]
? pn_free+0x30/0x30 [zfs]
? zfs_btree_find_in_buf+0x59/0xb0 [zfs]
range_tree_remove_impl+0x39d/0x460 [zfs]
space_map_load_callback+0x22/0x90 [zfs]
space_map_iterate+0x1a6/0x3f0 [zfs]
? rs_get_start+0x20/0x20 [zfs]
space_map_load_length+0x61/0xe0 [zfs]
metaslab_load_impl+0xc8/0x4e0 [zfs]
? gethrtime+0x1c/0x50 [zfs]
? metaslab_should_allocate+0x82/0xd0 [fsz]
? find_valid_metaslab+0x148/0x240 [zfs]
? arc_all_memory+0xa/0x20 [zfs]
? metaslab_potentially_evict+0x44/0x260 [zfs]
metaslab_load+0x6a/0xd0 [zfs]
metaslab_activate+0x44/0x100 [zfs]
metaslab_group_alloc_normal+0x1bb/0x610 [zfs]
metaslab_group_alloc+0x30/0xd0 [zfs]
metaslab_alloc_dva+0x266/0x690 [zfs]
metaslab_alloc+0xcc/0x210 [zfs]
zio_dva_allocate+0xbe/0x380 [zfs]
zio_execute+0x90/0x90 [zfs]
taskq_thread+0x1ff/0x3c0 [spl]
? wake_up_q+0x90/0x90
? zio_execute_stack_check.constprop.0+0x10/0x10 [zfs]
? taskq_thread_spawn+0x60/0x60 [spl]
kthread+0x127/0x150
? set_kthread_struct+0x50/0x50
ret_from_fork+0x22/0x30
</TASK>
Kernel Offset: 0x19c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Rebooting in 10 seconds

The story is that I had some kind of weird problem with metaslab and from time to time it got corrupted and I had to "repair" it. It looked like that:

Shut down server
Disconnect drives
Start server
Set recovery kernel parameters
Connect drives
Import pool
Leave it running for few days
Disable recovery kernel parameters
Reboot

If metaslab recover itself then everything OK, if not truenas will startup, but everything that works with pool will not work, most of UI pages will not load content, alert will show pool.import_on_boot stucked on 80% (waiting for a week doesnt help) and any zpool command will froze shell.

It could work if pre-init script that set recovery kernel parameters initialize before pool import, but it doesn't so if you fail step you have to start from beginning

To solve previous problem I found solution in github issues #13963 and #13483 (this one have a solution)
I didnt notice it at the beginning or thought that is not related, but at the end I noticed that system was rebooting by itself and find some mentions that it may be because of RAM, to be honest I'm not sure - there is motherboard and ram compability and while it looks OK I'm not sure about lot of things, in short - I changed it but I guess now it's already late if thats even the case.

I dont have backup, ofcourse, as if I didnt expect it could happen so I at least need a way to mount pool somehow, so please, help.

My system is J1900 MB + 4SATA PCIE card
Pool is ZRAID2 6x2Tb drives
Currently running TrueNAS-SCALE-22.12.0 (problem occured while I was on 22.02.4)

For now I'm thinking if I can disable metaslabs altogether, then install truenas core and try import pool from there, but because there is not info about this exact problem I in desperate need for help

Important Announcement for the TrueNAS Community.

Kernel panic - not syncing: VERIFY3

CaCTuCaTu4ECKuu

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

Kernel panic - not syncing: VERIFY3

CaCTuCaTu4ECKuu

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Kernel panic - not syncing: VERIFY3"

Similar threads