TrueNAS 12U1 Crash Reboot after import pool

Squigley

Dabbler
Joined
May 13, 2020
Messages
19
I have also opened this in Jira: https://jira.ixsystems.com/browse/NAS-108783

I am posting in here in case anyone has the same problem and checks in here rather than in Jira.

System was running fine. After a few days I found it to be unreachable.

Console showed it hung just after listing the drives and USB ports.

A force reset appeared to boot up properly, but after it imports the disks/pool, it has a kernel crash and reboots itself, every time.

It looks similar to NAS-108257 and NAS-107953, but I don't have an encrypted pool, and I thought maybe the issue was caused by a bunch of client NFS machines trying to access it, and all attacking it as soon as it becomes available again, causing it to immediately crash, but unplugging the network cables makes no difference.

If I unplug all the disks of the pool, then it boots up fine.

The crash message every time is attached.

For searching purposes (I used OCR from a screenshot rather than type this out manually.. I think it's correct) it includes:

Code:
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace self wrapper.0x2b/frame Oxfffffe015d6c5910
vpanic() at vpanic.0x17b/frame Oxfffffe015d6c5960
spl_panic() at spl_panic.0x3a/frame Oxfffffe015d6c59c0
avl_add() at avl_add.0x156/frame Oxfffffe015d6c5a00
dsl_livelist_iterate() at dsl_livelist iterate.OxbO/frame Oxfffffe015d6c5a60
bpobj_iterate_blkptrs() at bpobj_iterate_blkptrs.Oxff/frame Oxfffffe015d6c5b00
bpobj_iterate_impl() at bpobj_iterate_imp1.0x14d/frame Oxfffffe015d6c5bd0
dsl_process_sub_livelist() at dsl_process_sub_livelist.0x5c/frame Oxfffffe015d6c
spa_livelist_condense_cb() at spa_livelist_condense_cb.Oxal/frame Oxfffffe015d6c
zthr_procedure() at zthr_procedure.0x94/frame Oxfffffe015d6c5cf0
fork_exit() at fork exit.0x7e/frame Oxfffffe015d6c5d30
fork trampoline() at fork_trampoline.Oxe/frame Oxfffffe015d6c5d30
— trap 0 rip = 0 rsp = 0, rbp = 0 —
KDB: enter: panic
[ thread pid 38 tid 102647 ]
Stopped at kdb_enter.0x37: movg $0,0x164afc6(%rip)
db:0:kdb.enter.default> write cn_mute
cn_mute 0 0x1


Hardware is a Dell R710, with 128GB RAM (memtested fine), 2x X5670 CPUs, Ubuntu 20.04 is installed on the metal, using KVM to boot a VM, with raw access to partitions on an SSD to boot from, and it has PCI passthrough for 2 NICs (which are an LACP bond), and an LSI HBA controller with 5x 3TB disks attached.

It was working fine, for the past few weeks, and now it's just stuck in this boot loop.

The only thing I changed recently was to set the pool to have sync disabled, as the disk performance was abysmal (2MB/sec writes when trying to copy ISOs etc). (Edit: Maybe sync=disabled is a bad idea.. however the server is on a UPS, and I gave it an SSD 1.2GB SLOG device in front of the 3TB rust drives, so figured that was fairly safe..)

I tried booting up a different VM, but using the same disks, and the result was the same.
 

Attachments

  • Screenshot at 2020-12-22 14-34-30.png
    Screenshot at 2020-12-22 14-34-30.png
    38.1 KB · Views: 457
Last edited:
Joined
Oct 22, 2019
Messages
3,641
I'm no expert in this regard, but to try and extract some more information that others might find helpful to diagnose and troubleshoot: is it a particular pool that triggers this upon re-importing? How many pools automatically import upon boot?

The only thing I changed recently was to set the pool to have sync disabled, as the disk performance was abysmal (2MB/sec writes when trying to copy ISOs etc).
To confirm, prior to changing this, the problem never occurred? Did you ever have a successful import / reboot since making that change?
 

Squigley

Dabbler
Joined
May 13, 2020
Messages
19
There's only the one pool in the machine. It doesn't crash if I remove all the drives, and/or the raid controller, which is how I isolated it to being an issue with the pool. The one pool is configured to auto import.

Correct, prior to disabling sync, I had never had any issues or random reboots of the machine. It had been rebooted several times, both properly, and improperly, due to power issues, and it always came back up fine.

I had not rebooted the machine since I changed it to disable sync. I disabled it, and then a couple of days later found the machine to be unreachable, and checking the console of the VM was when I found it hung during the boot process, showing that it had rebooted itself (probably many many times based on the fact it just crashes and reboots every time) before it got stuck, and then I noticed it was down.

I extracted a debug dump from it which I have uploaded to my Jira ticket.

I looked through the dump, and found "panic: VERIFY(avl_find(tree, new_node, &where) == NULL) failed", but he only reference to that I can find is in the source code of the avl code from Open Solaris.. http://web.mit.edu/freebsd/head/sys/cddl/contrib/opensolaris/common/avl/avl.c
 
Last edited:

Squigley

Dabbler
Joined
May 13, 2020
Messages
19
I booted up the VM without the HBA/disks attached. Then I attached the HBA. It detected it, and the console shows the controller..

Code:
Dec 23 19:26:12 truenas mps0: <Avago Technologies (LSI) SAS2008> irq 10 at device 9.0 on pci0
Dec 23 19:26:12 truenas mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Dec 23 19:26:12 truenas mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>


In the VM console, all my drives appear.. (screenshot).

However, I cannot seem to attach/import them/the pool.

The disks show up, but the pool shows as offline, and there's no option to online it.

Trying to add the pool via importing it doesn't work, as it's not detected, and I'm just stuck on the screen asking me for the pool to import, with no options.

I tried importing it from the command line "zpool import pool".. which resulted in the kernel crashing, and rebooting..
 

Attachments

  • screenshot_11.png
    screenshot_11.png
    13.2 KB · Views: 276
  • screenshot_12.png
    screenshot_12.png
    13.7 KB · Views: 263

Squigley

Dabbler
Joined
May 13, 2020
Messages
19
Yes, I saw your Jira ticket immediately after I created mine, and it did sound similar. My instance has (had?) about 15 diskless clients with their root filesystem mounted via nfs4 exports, using zfs snapshots as the datasets being exported.

I had this exact same config running off FreeNAS, for at least a year, maybe 18 months, and I never saw any issues like this. The FreeNAS box had i/o that was as slow as a wet weekend for some reason, and the disks were always pinned, which is why I built the TrueNAS box, then I snapshot the exports, zfs send|recv them over from FN to TN, and rebooted each host to mount from the TN box.

It worked, but I had the same slow i/o, though not as slow (which I really can't understand when the box has 100GB of RAM, and 1.2GB SLOG, and the raw space is only 9T, with about 2T used, since it does have compression and deduplication enabled).

I'm currently waiting to see if there's any hope for recovering my pool, or if it's just toast, and I get to start again. I have copies of everything I care about; the client hosts all run BOINC/Rosetta@home, and currently I expect I have just lost a bunch of processing, and as the TrueNAS box has been dead for a few days now, the jobs are all going to miss their deadlines anyway.

The frustration comes from needing to go through and redo all the snaphosts, copy them all again, configure all the exports again, reboot and get everything running.. and then what, wait a few days and the whole thing will corrupt itself and just crash again, because nothing has changed, and to do the same thing expecting a different result is the definition of insanity?
 

Squigley

Dabbler
Joined
May 13, 2020
Messages
19
Oh, I just remembered there was another change I made.. I set the MTU to 9000 on the lagg interface, as I have jumbo frames configured on my switch ports/laggs and FreeNAS. Not sure that could have any effect though..

However, I was just trying to copy some stuff over, as I replaced all the disks in the TrueNAS box temporarily, in case my original pool is not completely trashed, and created a new pool, but I couldn't copy anything over. Changing the MTU back to 1500 fixed this issue..

Code:
freenas% sudo zfs send pool/nfsroots/ubuntu2004@v1.5 | sudo ssh truenas zfs recv pool2/nfsroots/ubuntu2004
Fssh_packet_write_wait: Connection to 192.168.13.13 port 22: Broken pipe
warning: cannot send 'pool/nfsroots/ubuntu2004@v1.5': signal received


tried again..

Code:
freenas% sudo zfs send pool/nfsroots/ubuntu2004@v1.5 | sudo ssh truenas zfs recv pool2/nfsroots/ubuntu2004
Fssh_packet_write_wait: Connection to 192.168.13.13 port 22: Broken pipe
warning: cannot send 'pool/nfsroots/ubuntu2004@v1.5': signal received


set the MTU back to 1500..

Code:
freenas% sudo zfs send pool/nfsroots/ubuntu2004@v1.5 | sudo ssh truenas zfs recv pool2/nfsroots/ubuntu2004
freenas%


I ran an iperf after changing the MTU back to 1500 (on truenas), and there are a lot of retries for some reason between FN and TN..

Code:
truenas% sudo iperf3 -c freenas
Password:
Connecting to host freenas, port 5201
[  5] local 192.168.13.13 port 57207 connected to 192.168.13.12 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   104 MBytes   876 Mbits/sec   10    353 KBytes     
[  5]   1.00-2.00   sec   113 MBytes   944 Mbits/sec    0    672 KBytes     
[  5]   2.00-3.00   sec   109 MBytes   911 Mbits/sec  308    302 KBytes     
[  5]   3.00-4.00   sec  84.8 MBytes   711 Mbits/sec  515    180 KBytes     
[  5]   4.00-5.00   sec   109 MBytes   915 Mbits/sec   66    359 KBytes     
[  5]   5.00-6.00   sec   111 MBytes   933 Mbits/sec   34    511 KBytes     
[  5]   6.00-7.00   sec   111 MBytes   930 Mbits/sec  526    541 KBytes     
[  5]   7.00-8.00   sec  96.1 MBytes   807 Mbits/sec  322    223 KBytes     
[  5]   8.00-9.00   sec  82.5 MBytes   690 Mbits/sec  141    260 KBytes     
[  5]   9.00-10.00  sec  64.6 MBytes   543 Mbits/sec   92    452 KBytes     
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   985 MBytes   826 Mbits/sec  2014             sender
[  5]   0.00-10.00  sec   984 MBytes   825 Mbits/sec                  receiver

iperf Done.


subsequent retests had 4246 and 3127 retries..

I changed TN mtu back to 9000, and tried again..

Code:
truenas% sudo iperf3 -c freenas
Connecting to host freenas, port 5201
[  5] local 192.168.13.13 port 54245 connected to 192.168.13.12 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.07   sec  35.0 KBytes   269 Kbits/sec    2   8.74 KBytes     
[  5]   1.07-2.07   sec  0.00 Bytes  0.00 bits/sec    1   8.74 KBytes     
[  5]   2.07-3.06   sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes     
[  5]   3.06-4.05   sec  0.00 Bytes  0.00 bits/sec    1   8.74 KBytes     
[  5]   4.05-5.05   sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes     
[  5]   5.05-6.05   sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes     
[  5]   6.05-7.05   sec  0.00 Bytes  0.00 bits/sec    1   8.74 KBytes     
[  5]   7.05-8.05   sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes     
[  5]   8.05-9.05   sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes     
[  5]   9.05-10.05  sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes     
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.05  sec  35.0 KBytes  28.5 Kbits/sec    5             sender
[  5]   0.00-10.05  sec  0.00 Bytes  0.00 bits/sec                  receiver

iperf Done.


Wow, that's awesome..

Disabling hardware offloading makes no difference.

From my Ubuntu desktop machine, with the MTU set to 1500, running an iperf3 to freenas (mtu 9000)..

Code:
squigley@squigley-server:~$ iperf3 -c freenas
Connecting to host freenas, port 5201
[  5] local 192.168.13.11 port 38382 connected to 192.168.13.12 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   114 MBytes   956 Mbits/sec    0    452 KBytes     
[  5]   1.00-2.00   sec   111 MBytes   933 Mbits/sec    0    510 KBytes     
[  5]   2.00-3.00   sec   111 MBytes   933 Mbits/sec    0    617 KBytes     
[  5]   3.00-4.00   sec   110 MBytes   923 Mbits/sec  372    358 KBytes     
[  5]   4.00-5.00   sec   112 MBytes   944 Mbits/sec    0    389 KBytes     
[  5]   5.00-6.00   sec   111 MBytes   933 Mbits/sec    0    409 KBytes     
[  5]   6.00-7.00   sec   111 MBytes   933 Mbits/sec    0    423 KBytes     
[  5]   7.00-8.00   sec   111 MBytes   933 Mbits/sec    0    433 KBytes     
[  5]   8.00-9.00   sec   111 MBytes   933 Mbits/sec    0    450 KBytes     
[  5]   9.00-10.00  sec   111 MBytes   933 Mbits/sec    0    457 KBytes     
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.09 GBytes   936 Mbits/sec  372             sender
[  5]   0.00-10.00  sec  1.09 GBytes   932 Mbits/sec                  receiver

iperf Done.


and running against truenas (mtu 9000):

Code:
squigley@squigley-server:~$ iperf3 -c truenas
Connecting to host truenas, port 5201
[  5] local 192.168.13.11 port 55706 connected to 192.168.13.13 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   113 MBytes   951 Mbits/sec    0    134 KBytes     
[  5]   1.00-2.00   sec   112 MBytes   940 Mbits/sec    0    177 KBytes     
[  5]   2.00-3.00   sec   111 MBytes   933 Mbits/sec    0    247 KBytes     
[  5]   3.00-4.00   sec   111 MBytes   933 Mbits/sec    0    314 KBytes     
[  5]   4.00-5.00   sec   111 MBytes   933 Mbits/sec    0    376 KBytes     
[  5]   5.00-6.00   sec   112 MBytes   943 Mbits/sec    0    376 KBytes     
[  5]   6.00-7.00   sec   111 MBytes   934 Mbits/sec    0    431 KBytes     
[  5]   7.00-8.00   sec   111 MBytes   933 Mbits/sec    0    496 KBytes     
[  5]   8.00-9.00   sec   111 MBytes   933 Mbits/sec    0    522 KBytes     
[  5]   9.00-10.00  sec   111 MBytes   933 Mbits/sec    0    522 KBytes     
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.09 GBytes   937 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   934 Mbits/sec                  receiver

iperf Done.


I tried setting my desktop machine to mtu 9000, though it's connected via an unmanaged desktop 8 port gigabit switch, so who knows if it even supports jumbo frames properly, and it had the same issue to both:

Code:
squigley@squigley-server:~$ iperf3 -c truenas
Connecting to host truenas, port 5201
[  5] local 192.168.13.11 port 58972 connected to 192.168.13.13 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   315 KBytes  2.58 Mbits/sec    5   8.74 KBytes     
[  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    1   8.74 KBytes     
[  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes     
[  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec    1   8.74 KBytes     
[  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes     
[  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes     
[  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    1   8.74 KBytes     
[  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes     
[  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes     
[  5]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes     
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   315 KBytes   258 Kbits/sec    8             sender
[  5]   0.00-10.00  sec  0.00 Bytes  0.00 bits/sec                  receiver

iperf Done.
squigley@squigley-server:~$ iperf3 -c freenas
Connecting to host freenas, port 5201
[  5] local 192.168.13.11 port 42824 connected to 192.168.13.12 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   332 KBytes  2.72 Mbits/sec    3   8.74 KBytes     
[  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    1   8.74 KBytes     
[  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes     
[  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec    1   8.74 KBytes     
[  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes     
[  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes     
[  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes     
[  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    1   8.74 KBytes     
[  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes     
[  5]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes     
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   332 KBytes   272 Kbits/sec    6             sender
[  5]   0.00-10.00  sec  0.00 Bytes  0.00 bits/sec                  receiver

iperf Done.


but this is all probably off topic and unrelated to my corrupt pool. May have something to do with the abysmal speeds I was seeing when copying data though.

I am using an Extreme Networks gigabit switch the relevant config is:

Code:
configure jumbo-frame-size size 9000
enable jumbo-frame ports 1 [to 26]

enable sharing 3 grouping 3-4 algorithm address-based L2 lacp
enable sharing 6 grouping 6-7 algorithm address-based L2 lacp
enable sharing 9 grouping 9-10 algorithm address-based L2 lacp
configure vlan Default add ports 1-3, 5-6, 8-9, 11-26 untagged
configure vlan default delete ports 4, 7, 10


There's pretty much nothing else configured in there.

It setup those delete ports excluding the 3 ports, as a result of me telling it to add all ports to the default vlan ("configure vlan Default add ports all untagged"), because of those 3 ports being lacp bonded, it's not like I did something funky by excluding them causing retries when the secondary port is being used.

Anyway, as I said, this is offtopic in regards to the pool getting corrupted and causing the truenas box to reboot itself constantly, this is just the fun I am having trying to get the thing usable again. Thinking it might just be time to install FreeNAS on here instead.

Update: I fixed the network problem.. I remembered I had wasted a ton of time on this before with an Extreme Networks 10GbE switch.. There's something screwy with the way they handle jumbo frames, and as soon as I turned down the frame size on my Ubuntu desktop machine slightly, it would start working.. I set the jumbo frame size to 9018 in the switch, and now that's all working properly between FN and TN, with both set to 9000.
 
Last edited:

viniciusferrao

Contributor
Joined
Mar 30, 2013
Messages
192
Yes, I saw your Jira ticket immediately after I created mine, and it did sound similar. My instance has (had?) about 15 diskless clients with their root filesystem mounted via nfs4 exports, using zfs snapshots as the datasets being exported.

I had this exact same config running off FreeNAS, for at least a year, maybe 18 months, and I never saw any issues like this. The FreeNAS box had i/o that was as slow as a wet weekend for some reason, and the disks were always pinned, which is why I built the TrueNAS box, then I snapshot the exports, zfs send|recv them over from FN to TN, and rebooted each host to mount from the TN box.

It worked, but I had the same slow i/o, though not as slow (which I really can't understand when the box has 100GB of RAM, and 1.2GB SLOG, and the raw space is only 9T, with about 2T used, since it does have compression and deduplication enabled).

I'm currently waiting to see if there's any hope for recovering my pool, or if it's just toast, and I get to start again. I have copies of everything I care about; the client hosts all run BOINC/Rosetta@home, and currently I expect I have just lost a bunch of processing, and as the TrueNAS box has been dead for a few days now, the jobs are all going to miss their deadlines anyway.

The frustration comes from needing to go through and redo all the snaphosts, copy them all again, configure all the exports again, reboot and get everything running.. and then what, wait a few days and the whole thing will corrupt itself and just crash again, because nothing has changed, and to do the same thing expecting a different result is the definition of insanity?

After being stressed with Christmas I tried to relax and look at the issue a little deeper. This time it's not the same the issue, and thankfully I haven't lost my pool. It is/was a race condition in LAGG implementation, and opened an issue here: https://jira.ixsystems.com/projects/NAS/issues/NAS-108810

One thing that's clear to me is that 12.0-RELEASE/U1 isn't production ready yet... I'm still extremely worried with my data.
 
Last edited:

Squigley

Dabbler
Joined
May 13, 2020
Messages
19
Oh great..

1609099149655.png


open up the VM monitor, and what do I see? The VM is rebooting over and over again..

This is with 5 new disks, and a brand new pool and I had attached one NFS client to it.

I think I am done with TrueNAS and I'll go back to FreeNAS which I've not had any issues like this with.
 

nromyn

Cadet
Joined
Dec 28, 2020
Messages
5
I have the same problem - at boot, the system tries to import a pool, which causes a panic.

panic: VERIFY(avl_find(tree, new_node, &where) == NULL) failed

I recently upgraded a pool with the latest feature flags, and after a reboot due to a crash - cause is unknown, it was the result of iocage destroy <jail>, I got stuck in a reboot loop. I run my jails in a pool called 'Jails', on a dedicated SSD, and my data is in a separate pool. After much screwing around, I rebooted into FreeNAS-11.3-U5 and everything is fine; I can load the pool in read-only but due to the feature flags I cannot mount it read-write, most unfortunately.

I have been changing around the networking configuration of the server - that is the only change that I have done to the server itself; everything else had to do with jails.

I am not very skilled in the are of ZFS and FreeBSD; however, I'm open to any suggestions of how I can provide data that may help resolve this. I'm going to start working on building a new pool and migrating the data, though I have no desire to do so.
 

Squigley

Dabbler
Joined
May 13, 2020
Messages
19
In my second case, I replaced the original drives in case they are recoverable..

I created a pool, created a dataset, then did a zfs send | recv from a freenas box. I NFS shared this dataset, and I cloned it a couple of times and shared those clones.

I mounted the shares on a couple of clients.

I didn't change anything else. All ZFS settings were left default, sync etc.

A couple of days later, it was stuck in the reboot loop again.

I have now reinstalled the VM with FreeNAS, and I created a new ZFS pool on those temporary drives, created the dataset, copied the data, shared it etc. 3 days later it's still running fine, so far..

Maybe I'll try TrueNAS 13 :smile:
 

chaddictive

Dabbler
Joined
Jan 17, 2021
Messages
16
I had the exact same case in True Nas 12.0 U1.1 as you, but just with 2 x 4TB in a Raid1 pool (called RaidPool). Actually I just started a few weeks ago, had everything running and (theoretically) out of nowwhere comes the issue you described here and in the Jira Ticket.

My first action is to rescue my data, therefore I tried already following:
- Install latest Freenas 11.3.U5 version (on seperate drive) and tried to import pool (which is visible for import): After import the NAS restarts and reboot loop starts again.
- Install fresh Truenas 12.0 U1.1., try to import manually by "zpool import -o readonly=on RaidPool" and also "zpool import -o readonly=on -f RaidPool". NAS restarts but still no pool imported.
- Created a new pool on a seperate drive and tried to import data with no luck (actually it was just a try without reading before, that this doesnt make sence)

Before I open a new post about "how to rescue data on my damaged pool" with my issue referencing to your post: Any simple ideas how to rescue my data first, before doing anything else? I did research some hours but didnt find any other options that could help - as I interpreted it.
 

Squigley

Dabbler
Joined
May 13, 2020
Messages
19
There's no requirement for a restart when importing a pool. If it's restarting as soon as you try to import it, then it's causing the crash/reboot. Are you watching the console when you do it?

I don't know how to recover the data yet. My original crashed pool disks are still sitting on the bench out of the server, and I used some spare drives temporarily while I waited to find out if there's a way to recover them. Those spare drives ended up crashing as well, at which point I reinstalled the machine with FreeNAS instead of TrueNAS, and I reused those spare drives.

Someone suggested being able to mount the pool read only, possibly using FreeNAS, even though it doesn't understand the newer flags on the ZFS. I think I tried this once, but without forcing read only, and I think it just crashed and rebooted and I gave up.

I am still running FreeNAS on the spare drives, and it's running fine for a few weeks now. I suspect I will just cut my losses and rebuild it all using the original drives on FreeNAS at some point.
 

chaddictive

Dabbler
Joined
Jan 17, 2021
Messages
16
Exactly as you say, the restart comes automatically when the import starts. And yes I was just watching the console while trying this time import readonly=on within Freenas (this combo I didnt do yet). Shell gives no feedback, restart is triggered with the following:

Code:
traverse_,visitbp() at traverse_visitbp+8x703/frame 8xfffffe0228cec3e0 .
traverse_impi() at traverse_impi+Ox317/frame 8xfffffe0228cec500
traverse_pool0 at traverse_pool+0x149/frane 8xfffffe0228cec5c0
spa_load() at spajoad+Ox141a/frame 8xfffffe0228cec720
spa_load_best() at spajoad_best+0x65/frame 8xfffffe0228cec770
spa_import() at spa_inport+8x27b/frame 8xfffffe0228cec830
Zfs_ioc_pool_import() at zfs_ioc_pooi_import+8x163/frane Oxfffffe0228cec880
Zfsdev_ioctl() at zfsdev_iocti.0x715/frame Oxfffffe0220cec920
devfs_ioct l_f () at devfs_ioctl_f+Ox126/frame Oxfffffe0228cec980
kern_ioctl() at kern_ioctl+0x267/frame 0xfffffe0226cec9f0
sys_ioctl() at sys_ioctl+0x15b/frane 8xfffffe0228cecac0
Amd64_syscall0 at and64_syscall+8xa86/frame 8xfffffe0228cecbf0
fast syscall_common()) at fast_syscall_common+0x101/frame 8xfffffe0220cecbf0
---syscall (54, FreeBSD ELF64, sys_ioctl), rip = 8x80120106a, rsp = 0x7fffffffb
c68, rbp = 0x7fffffffbce0 ---
KDB: enter: panic
[thread pid 1787 tid 100557]
Stopped at kdb_enter+0x3b: movq $0,kdb_why
db:0:kdb.enter.default> write cn_mute 1
cn_mute
db:0:kdb.enter.default> textdump dump
db:0:kdb.enter.default> reset
cpu reset: Restarting BSP
cpu=reset_proxy: Stopped CPU 2



(I tried to correct the OCR as good as I could)

Since I am new to Truenas: You are saying someone tried to mount it in readonly. I understand mounting is the same as importinmg in Freenas, right? No different procedure I guess.

Anyway thanks for your answer. I think I will open a new post hoping to find some help for rescue. I'll link it here later.
 
Top