Kernel panic - not syncing: VERIFY3

Joined
Mar 2, 2023
Messages
1
I'm kinda in loss, can find any mentions about this problem.
Lets start from the problem
Code:
VERIFY3(rs_get_end((rs, rt) >= end) failed (10359540006912 >= 10359534558144)
PANIC at range_tree.c:482:range_tree_remove_impl()
Kernel panic - not syncing: VERIFY3(rs_get_end((rs, rt) >= end) failed (10359540006912 >= 10359534558144)
CPU: 2 PID: 5553 Comm: z_wr_iss Tainted: P              IE         5.15.79+truenas #1
Hardware name: xxxxx
Call Trace:
<TASK>
dump_stack_lvl+0x46/0x5e
panic+0xf3/0x2bf
spl_panic+0xcc/0xe9 [spl]
? bt_grow_leaf+0xdc/0xe0 [zfs]
? zfs_btree_find_in_buf+0x59/0xb0 [zfs]
? pn_free+0x30/0x30 [zfs]
? pn_free+0x30/0x30 [zfs]
? zfs_btree_find_in_buf+0x59/0xb0 [zfs]
? pn_free+0x30/0x30 [zfs]
? zfs_btree_find_in_buf+0x59/0xb0 [zfs]
range_tree_remove_impl+0x39d/0x460 [zfs]
space_map_load_callback+0x22/0x90 [zfs]
space_map_iterate+0x1a6/0x3f0 [zfs]
? rs_get_start+0x20/0x20 [zfs]
space_map_load_length+0x61/0xe0 [zfs]
metaslab_load_impl+0xc8/0x4e0 [zfs]
? gethrtime+0x1c/0x50 [zfs]
? metaslab_should_allocate+0x82/0xd0 [fsz]
? find_valid_metaslab+0x148/0x240 [zfs]
? arc_all_memory+0xa/0x20 [zfs]
? metaslab_potentially_evict+0x44/0x260 [zfs]
metaslab_load+0x6a/0xd0 [zfs]
metaslab_activate+0x44/0x100 [zfs]
metaslab_group_alloc_normal+0x1bb/0x610 [zfs]
metaslab_group_alloc+0x30/0xd0 [zfs]
metaslab_alloc_dva+0x266/0x690 [zfs]
metaslab_alloc+0xcc/0x210 [zfs]
zio_dva_allocate+0xbe/0x380 [zfs]
zio_execute+0x90/0x90 [zfs]
taskq_thread+0x1ff/0x3c0 [spl]
? wake_up_q+0x90/0x90
? zio_execute_stack_check.constprop.0+0x10/0x10 [zfs]
? taskq_thread_spawn+0x60/0x60 [spl]
kthread+0x127/0x150
? set_kthread_struct+0x50/0x50
ret_from_fork+0x22/0x30
</TASK>
Kernel Offset: 0x19c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Rebooting in 10 seconds


The story is that I had some kind of weird problem with metaslab and from time to time it got corrupted and I had to "repair" it. It looked like that:
  • Shut down server
  • Disconnect drives
  • Start server
  • Set recovery kernel parameters
  • Connect drives
  • Import pool
  • Leave it running for few days
  • Disable recovery kernel parameters
  • Reboot
If metaslab recover itself then everything OK, if not truenas will startup, but everything that works with pool will not work, most of UI pages will not load content, alert will show pool.import_on_boot stucked on 80% (waiting for a week doesnt help) and any zpool command will froze shell.

It could work if pre-init script that set recovery kernel parameters initialize before pool import, but it doesn't so if you fail step you have to start from beginning

To solve previous problem I found solution in github issues #13963 and #13483 (this one have a solution)
I didnt notice it at the beginning or thought that is not related, but at the end I noticed that system was rebooting by itself and find some mentions that it may be because of RAM, to be honest I'm not sure - there is motherboard and ram compability and while it looks OK I'm not sure about lot of things, in short - I changed it but I guess now it's already late if thats even the case.

I dont have backup, ofcourse, as if I didnt expect it could happen so I at least need a way to mount pool somehow, so please, help.

My system is J1900 MB + 4SATA PCIE card
Pool is ZRAID2 6x2Tb drives
Currently running TrueNAS-SCALE-22.12.0 (problem occured while I was on 22.02.4)

For now I'm thinking if I can disable metaslabs altogether, then install truenas core and try import pool from there, but because there is not info about this exact problem I in desperate need for help
 
Top