TrueNAS 12 keeps rebooting constantly

videopete

Dabbler
Joined
Sep 5, 2021
Messages
14
System was running fine for weeks. Now it seems to get through most of the boot sequence and then the system just powers down and retries power every 3-5 seconds. Checked memory with memtest ran no problems. If I enter the BIOS system is fully stable. If I run TrueNAS SAFE mode it seems to fully boot and remain up so I don't think it's hardware related.
Where can I look for any clues as to what can be causing this since only safe mode is working now. Are there any logs I can look into while in safe mode to see what happens for the earlier boots ? It seems the logs under /var/log are only for the last boot ?
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Are there any logs I can look into while in safe mode to see what happens for the earlier boots ?
/var/log/...

I don't think you're going to get much help until you read the forum rules and share some of your hardware details... based on your symptoms, it's likely to be hardware in the sense of loading the drivers for a poorly supported NIC (Realtek) causing a kernel panic under load (which wouldn't happen in safe mode) or the wrong power management settings on Ryzen (both of which have been copiously handled on the forums already).
 

videopete

Dabbler
Joined
Sep 5, 2021
Messages
14
/var/log/...

I don't think you're going to get much help until you read the forum rules and share some of your hardware details... based on your symptoms, it's likely to be hardware in the sense of loading the drivers for a poorly supported NIC (Realtek) causing a kernel panic under load (which wouldn't happen in safe mode) or the wrong power management settings on Ryzen (both of which have been copiously handled on the forums already).
The system has been running fine for months which is why I don't think it would now complain of a hardware issue. Nothing was changed or added. These log files don't seem to contain any history - only the last boot info. I'm looking for historical logs that can help me point to the actual reason it is failing or where it failed. I have attached my output of lspci -v . This was run when most of the memory was removed (because I was troubleshooting so it only had 2GB at this particular time).
 

Attachments

  • hardwarelist.txt
    10.6 KB · Views: 103

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Other than indicating you have a Gigabyte board, this output doesn't really help us. Please describe your system in human-readable terms.
 

videopete

Dabbler
Joined
Sep 5, 2021
Messages
14
Gigabyte ga-ep45c-ds3r board
Intel Q9950 quad core chip
8GB Kingston Hyper RAM
LAN: Realtek 8111C chips (10/100/1000 Mbit)

This is a backup system I've been using for several months now.
 
Last edited:

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
8 GB is the bare minimum to boot the OS. Are you trying to run a ton of plugins and VMs? How many shares do you have defined? This appears to be a simple case of memory pressure.
 

videopete

Dabbler
Joined
Sep 5, 2021
Messages
14
There are only two shares, no VMs etc. Very simple system and as I've mentioned it was running perfectly for months. I have a second similar system also with 8GB with zero issues. I don't use these extensively and especially this one, it's only used to backup the other one. There must be a problem with a module upon startup I'm guessing so I'm looking for where to look for this startup information to pinpoint which module, etc is having an issue upon startup. the /var/log directories seem to only hold the last boot, not past ones. If I take a video of the verbose console as it's starting will that help?
 

videopete

Dabbler
Joined
Sep 5, 2021
Messages
14
I managed to start the system in normal mode (not safe). It's up and running (this time). I managed to grab the console logs and attached them here if anyone notices something wrong I would be forever in your debt. Thank you.
 

Attachments

  • console_log.txt
    30.4 KB · Views: 96

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
What's the output of zpool status -v? The console log indicates:

Code:
Apr 22 14:17:36 truenas spa_misc.c:416:spa_load_note(): spa_load(NASDATA, config trusted): spa_load_verify found 0 metadata errors and 1 data errors
 

videopete

Dabbler
Joined
Sep 5, 2021
Messages
14
What's the output of zpool status -v? The console log indicates:

Code:
Apr 22 14:17:36 truenas spa_misc.c:416:spa_load_note(): spa_load(NASDATA, config trusted): spa_load_verify found 0 metadata errors and 1 data errors

Code:
root@truenas[/var/log]# zpool status -v
  pool: NASDATA
 state: ONLINE
  scan: scrub canceled on Thu Apr 21 10:54:45 2022
config:

        NAME                                            STATE     READ WRITE CKSUM
        NASDATA                                         ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/78b3ad6d-1039-11ec-a6a4-e03f49ea0478  ONLINE       0     0     0
            gptid/78c94749-1039-11ec-a6a4-e03f49ea0478  ONLINE       0     0     0
            gptid/78ff1331-1039-11ec-a6a4-e03f49ea0478  ONLINE       0     0     0

errors: No known data errors

  pool: boot-pool
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          ada0p2    ONLINE       0     0     0

errors: No known data errors
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Try scrubbing your pool, and letting it run to completion.
 

videopete

Dabbler
Joined
Sep 5, 2021
Messages
14
Try scrubbing your pool, and letting it run to completion.
Scrub completed. I don't think it corrected anything.
I did notice errors in the /var/log/messages file related to vmx_init and avahi-daemon, not sure if they are related to my issue.


Code:
Apr 22 14:17:36 truenas vmx_init: processor does not support desired primary processor-based controls
Apr 22 14:17:36 truenas module_register_init: MOD_LOAD (vmm, 0xffffffff8331ce60, 0) error 22


Code:
Apr 22 20:53:46 truenas 1 2022-04-22T20:53:46.724085-04:00 truenas79.local avahi-daemon 1166 - - IP_DROP_MEMBERSHIP failed: Can't assign requested address


Code:
root@truenas[/var/log]# zpool status -v
  pool: NASDATA
 state: ONLINE
  scan: scrub repaired 0B in 04:26:06 with 0 errors on Fri Apr 22 20:48:07 2022
config:

        NAME                                            STATE     READ WRITE CKSUM
        NASDATA                                         ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/78b3ad6d-1039-11ec-a6a4-e03f49ea0478  ONLINE       0     0     0
            gptid/78c94749-1039-11ec-a6a4-e03f49ea0478  ONLINE       0     0     0
            gptid/78ff1331-1039-11ec-a6a4-e03f49ea0478  ONLINE       0     0     0

errors: No known data errors

  pool: boot-pool
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          ada0p2    ONLINE       0     0     0

errors: No known data errors
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
LAN: Realtek 8111C chips (10/100/1000 Mbit)
I'd be pointing squarely at this as the source of any issues until we prove otherwise.

Realtek drivers are known to cause kernel panics under heavy load.

Your machine finished a full scrub, so unlikely to be pool/disk/SATA related, leaving the network or other miscellaneous flaws like hairline cracks in boards or something.

Check the logs, but I don't think there will be much to go on if it's the hardware... you may find some core dumps depending on how the system fails exactly.
 

videopete

Dabbler
Joined
Sep 5, 2021
Messages
14
After some more extensive troubleshooting, it ended up being the power supply itself. Outside of the box, with the paper clip power pins shorted to allow it to turn on, it would work fine and remain ON but as soon as I connected the MB and CPU connections on their own it would begin to turn on and off again. I checked the PS pins DC voltage and they all seemed ok on the meter so it's a bit confusing. I then made no other changes except replacing the PS supply with a spare one I had and the problem immediately went away. So somehow when the MB was drawing power from the old PS it was failing. Thanks to everyone for their help on this. I hope this may help others in the future.
 
Top