Unscheduled Reboots

Ignasi

Cadet
Joined
Jun 5, 2021
Messages
5
I'm running TrueNAS-SCALE-22.02.2.1 and before that I was running the BETA.
All worked fine for months, but lately random reboots started, now I got 3 on a row every single day.

No idea what is going on, any advice?

I'm only running SMB shares.
And one single VM with Ubuntu.

Temperature hardly exceeds 45Cº
The CPU load is always Low 20% peak.

AMD EPYC 7352 24Core.
128GB ECC Ram.

The pools looks fine. the Disk tests also fine.
The power supply is more than enough. I checked and the system draws only 120W while PSU is over 650W

While working during the day I can move large amounts of files, with the Mellanox 100Gbe, at around 3500MB/s peak. And doesn't crash.
I get the crashes at night mainly where there is very few load.

What can I check more?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Did you check the boot pool?
Are those reboots regular or happen at random times?
 

WN1X

Explorer
Joined
Dec 2, 2019
Messages
77
And is there any indication of the reason for the reboots in the logs?
 

Ignasi

Cadet
Joined
Jun 5, 2021
Messages
5
The boot pool is 2 SSD that are a copy of itself samsung EVO 970 500GB ( waste of space but that is another thing... )

Sadly about the regular or not, I can't check the three last ones because I cleaned up the alert history by mistake... but before it happened at different times... even one time while working. all my network drives disconnected for a while...

The Logs, haven't checked, I should give a look. where is the best place to find the logs related to this? or there is ony one LOG file I should be checking? Thank you!

I'll keep investigatin thank you for the tips!
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
All worked fine for months,
Makes me think that you must have already applied the C-state and Cool 'n' quiet settings changes in the BIOS required for stable operation on AMD... but

but lately random reboots started, now I got 3 on a row every single day.
So I'll recommend to check your BIOS and confirm those are off since you don't mention having already done it and the problems aren't under load.
 

Ignasi

Cadet
Joined
Jun 5, 2021
Messages
5
I checked the logs with

grep -E -i -r 'err|warn|panic' -E ' 07:0' /var/log/

because I knew that one of the crashes happened at 7:00am
it is not consistent, everyday a bit different 7 ~ 9:00am but is not even when the backup occurs. that is around 2 or 3 am.
I found nothing at all.

I peaked syslog, messages, and so... and nothing relevant either...
What is going on.....!! :S

I'll check the Cool n Quiet setting, do Epyc have that too?
 

Ignasi

Cadet
Joined
Jun 5, 2021
Messages
5
1659867298174.png

Possibly this? ... but not much more information there...
 
Top