Bootloop since this morning

vthinsel

Dabbler
Joined
May 18, 2021
Messages
19
Hello,

My trueNAS core is boot-looping since this morning :
1627293915228.png

Tried the previous versions back to 12.0-U2.1, but same story.
I also unplugged the SAS disks, and then boot goes fine. Any idea why , and how to resolve this if possible ?

OS Version:
TrueNAS-12.0-U4.1
Model:
B450 I AORUS PRO WIFI
Memory:
30 GiB

7 SAS 4Tb in raidz2

Thanks.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
I have seen this several times around the forum and all of them seemed to end with the pool being completely unrecoverable and restore from backup (if there was one) being necessary.

I did find a thread with similar symptoms in the freeBSD forum which ended a bit better for the person reporting it...

You will need to carefully research to understand if that case applies to you and see what you can get from it.

From all the cases I've seen like this, I have never been able to see the "root cause" elaborated, but unreliable hardware does feel like something that might be part of it. (I note that your system board is not a server class board and you're not using ECC memory (even worse, it's overclocking/gaming memory)... possibly not a cause, but points to hardware maybe not being up to the task).
 

vthinsel

Dabbler
Joined
May 18, 2021
Messages
19
Thanks for the link. smartctl is fine on all disks, and no memory error. I was able to boot into single user mode and place
zpool import -o readonly=on -N -F -f Pool01
The pool is online. If I remove the readonly option, it then panics.
I can do a zfs list succesfully also on the readonly pool
Anyway to recover data from a pool in readonly ?
Is there an option to boot normally, but without the pool being imported ? So I could try mounting the pool ?
Thanks.
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
With the pool imported, you should just be able to do zfs mount -a and it will mount if it can so you can look at the data.

It's also possible that since you didn't specify the mountroot when you imported, it's actually already mounted in /

Have a look and see if you find /Pool01
 

vthinsel

Dabbler
Joined
May 18, 2021
Messages
19
zfs list worked nicely but couldnt mount anything.
I'm now using https://github.com/nchevsky/systemrescue-zfs and I was able to mount/ and view all my files. I have reached a point I will re-install Truenas fully and restore a config backup to see what happens then, once truenas boot cleanly
Will keep you posted. Thanks for your quick answers !
EDIT: fresh install done, but issue occured again when importing, without too much surprises.....
 
Last edited:

vthinsel

Dabbler
Joined
May 18, 2021
Messages
19
Here is where I am now after a fresh install:
Code:
root@truenas[~]# zpool import -o readonly=on -R /mnt -f Pool01_7HDD
root@truenas[~]# zpool status Pool01_7HDD
  pool: Pool01_7HDD
state: ONLINE
  scan: scrub repaired 0B in 1 days 18:26:48 with 0 errors on Fri Jul  2 09:27:01 2021
config:

        NAME                                            STATE     READ WRITE CKSUM
        Pool01_7HDD                                     ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/74d0c669-85b4-11eb-91c3-b42e99d25cfb  ONLINE       0     0     0
            gptid/89bdb27c-8434-11eb-be79-b42e99d25cfb  ONLINE       0     0     0
            gptid/11e6c8b6-a90f-11eb-a303-b42e99d25cfb  ONLINE       0     0     0
            gptid/a167b8d5-88b1-11eb-9404-b42e99d25cfb  ONLINE       0     0     0
            gptid/49487399-8738-11eb-9ceb-b42e99d25cfb  ONLINE       0     0     0
            gptid/c8889518-8a52-11eb-8606-b42e99d25cfb  ONLINE       0     0     0
            gptid/8e0bc690-8bde-11eb-9bce-b42e99d25cfb  ONLINE       0     0     0

errors: No known data errors
root@truenas[~]# zfs list
NAME                                                         USED  AVAIL     REFER  MOUNTPOINT
Pool01_7HDD                                                 10.5T  5.92T      256K  /mnt/Pool01_7HDD
Pool01_7HDD/Arrivage                                         636G  5.92T      627G  /mnt/Pool01_7HDD/Arrivage
Pool01_7HDD/Data                                            4.75T  5.92T     32.7M  /mnt/Pool01_7HDD/Data
Pool01_7HDD/Data/Apps_Tools                                 46.4G  5.92T     46.4G
.......
root@truenas[~]#


I also got this in the logs while importing:
Code:
Jul 26 07:16:28 truenas 1 2021-07-26T07:16:28.086105-07:00 truenas.thinselin.local savecore 5166 - - /dev/ada0p3: Operation not permitted

I'll start to copy data over another NAS.
Any idea on what to do next, to get write mode back if possible ?

Thanks
 
Last edited by a moderator:

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,398
Try exporting and then importing read-write.
 

vthinsel

Dabbler
Joined
May 18, 2021
Messages
19
Some more attempts:

root@truenas[~]# zpool import -o readonly=on Pool01_7HDD -R /mnt
root@truenas[~]# zpool status
pool: Pool01_7HDD
state: ONLINE
scan: scrub canceled on Mon Jul 26 09:30:08 2021
config:

NAME STATE READ WRITE CKSUM
Pool01_7HDD ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/74d0c669-85b4-11eb-91c3-b42e99d25cfb ONLINE 0 0 0
gptid/89bdb27c-8434-11eb-be79-b42e99d25cfb ONLINE 0 0 0
gptid/11e6c8b6-a90f-11eb-a303-b42e99d25cfb ONLINE 0 0 0
da3p2 ONLINE 0 0 0
gptid/49487399-8738-11eb-9ceb-b42e99d25cfb ONLINE 0 0 0
gptid/c8889518-8a52-11eb-8606-b42e99d25cfb ONLINE 0 0 0
gptid/8e0bc690-8bde-11eb-9bce-b42e99d25cfb ONLINE 0 0 0

errors: No known data errors

pool: boot-pool
state: ONLINE
config:

NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
ada0p2 ONLINE 0 0 0

errors: No known data errors
root@truenas[~]# smartctl -a /dev/da3
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p6 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST4000NM0023
Revision: GE13
Compliance: SPC-4
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Logical block size: 512 bytes
LU is fully provisioned
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c50085952e07
Serial number: Z1ZB50GN
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Tue Jul 27 23:47:56 2021 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature: 43 C
Drive Trip Temperature: 60 C

Accumulated power on time, hours:minutes 33835:43
Manufactured in week 24 of year 2016
Specified cycle count over device lifetime: 10000
Accumulated start-stop cycles: 121
Specified load-unload count over device lifetime: 300000
Accumulated load-unload cycles: 1568
Elements in grown defect list: 0

Vendor (Seagate Cache) information
Blocks sent to initiator = 2246389618
Blocks received from initiator = 1956880549
Blocks read from cache and sent to initiator = 2699075219
Number of read and write commands whose size <= segment size = 3982387852
Number of read and write commands whose size > segment size = 123092

Vendor (Seagate/Hitachi) factory information
number of hours powered up = 33835.72
number of minutes until next internal SMART test = 1

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 2003687825 5 0 2003687830 5 1419917.411 0
write: 0 0 48 48 48 175839.534 0
verify: 2488830082 0 0 2488830082 0 4000.787 0

Non-medium error count: 3107114

SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background short Completed 64 33819 - [- - -]
# 2 Background short Completed 64 33795 - [- - -]
# 3 Background short Completed 64 33771 - [- - -]
# 4 Background short Completed 64 33747 - [- - -]
# 5 Background short Completed 64 33723 - [- - -]
# 6 Background short Completed 64 33699 - [- - -]
# 7 Background short Completed 64 33675 - [- - -]
# 8 Background short Completed 64 33651 - [- - -]
# 9 Background short Completed 64 33627 - [- - -]
#10 Background short Completed 64 33603 - [- - -]
#11 Background short Completed 64 33579 - [- - -]
#12 Background short Completed 64 33555 - [- - -]
#13 Background short Completed 64 33531 - [- - -]
#14 Background short Completed 64 33507 - [- - -]
#15 Background short Completed 64 33483 - [- - -]
#16 Background short Completed 64 33459 - [- - -]
#17 Background short Completed 64 33435 - [- - -]
#18 Background short Completed 64 33411 - [- - -]
#19 Background short Completed 64 33387 - [- - -]
#20 Background short Completed 64 33363 - [- - -]

Long (extended) Self-test duration: 32700 seconds [545.0 minutes]

root@truenas[~]#


What surprises me is the da3 disk status in the pool. Why does it display differently than the others ?
The error count log also worries me. All disks have high numbers.

Then:
root@truenas[/]# zfs mount -a
root@truenas[/]# zfs list
NAME USED AVAIL REFER MOUNTPOINT
Pool01_7HDD 10.5T 5.89T 256K /mnt/Pool01_7HDD
Pool01_7HDD/Arrivage 664G 5.89T 653G /mnt/Pool01_7HDD/Arrivage
Pool01_7HDD/Data 4.75T 5.89T 32.7M /mnt/Pool01_7HDD/Data
Pool01_7HDD/Data/Apps_Tools 46.4G 5.89T 46.4G /mnt/Pool01_7HDD/Data/Apps_Tools
Pool01_7HDD/Data/Archives 57.4G 5.89T 57.4G /mnt/Pool01_7HDD/Data/Archives
Pool01_7HDD/Data/Backups 1.07T 5.89T 1.05T /mnt/Pool01_7HDD/Data/Backups
Pool01_7HDD/Data/Download 11.2G 5.89T 11.2G /mnt/Pool01_7HDD/Data/Download


All datas are available
Then :

root@truenas[/]# zpool export Pool01_7HDD
root@truenas[/]# zpool import -o readonly=off Pool01_7HDD -R /mnt


Truenas then panics ...

Any idea idea why I can mount in write using ubuntu rescue disk ? How to troubleshoot further the issue making truenas complaining ?
 
Last edited:

QonoS

Explorer
Joined
Apr 1, 2021
Messages
87
Try importing the pool with the "-d /dev/gptid" switch. That way it should discover the missing gptid that was somehow replaced with "da3p2".
 

vthinsel

Dabbler
Joined
May 18, 2021
Messages
19
Thanks, the pool status is fine by now.
But I'm still stuck , as as I keep getting panics when mouting in write mode with truenas. Write mode with ubuntu is fine, so I can still access my data. Anything to debug/correct the situation ?
 

vthinsel

Dabbler
Joined
May 18, 2021
Messages
19
So I finally sent all the datasets of the pool to another truenas I had for backup purpose. I took the opportunity to upgrade the hardware to support ECC, and a more robust motherboard too, as well as power supply. I re-installed TrueNas core, and started the pool sync the other way. It is nicely progressing. Just a quick question: once all data will be back to the new pool, could I restore a TrueNas backup I made some weeks ago, before I applied the last upgrade ? This would speed-up a bit the config.
 
Top