Unscheduled System Reboot

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
You could take that case and build yourself a nice firewall PC with it, low power requirements :smile:

I think you are making a good decision since you desire to add more drives anyway.
 

IronSheepdog

Dabbler
Joined
May 27, 2020
Messages
25
So I first replaced the power supply with a 750 watt one. That didn't make a difference. I started to pay attention to the temperatures on heavy use. When I would start to run Windows 10 in a VM, I noticed the resources (CPU mainly) at 100% and the temperature start to go up on the drives and CPU. Eventually, that would trigger a reboot. I'm thinking it is the case (after all of this time). I just purchased a used Dell Poweredge R720 server and will be moving my drives over to that. Then I'll have way more power that I'll probably ever need.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
This is where you would have identified the thermal problem during a proper burn-in period. It takes a few days but in the long run, well worth it. If it's just case cooling, a new case would solve the problem, but that is an assumption that something is just overheating. Something could also be failing that cannot be cooled or just due to the stresses being placed on it.

Good luck with the new server.
 

IronSheepdog

Dabbler
Joined
May 27, 2020
Messages
25
This is where you would have identified the thermal problem during a proper burn-in period. It takes a few days but in the long run, well worth it. If it's just case cooling, a new case would solve the problem, but that is an assumption that something is just overheating. Something could also be failing that cannot be cooled or just due to the stresses being placed on it.

Good luck with the new server.
I don't believe a "proper burn in" was the problem. This system has been running (and running hard) for a year or more with no problems at all. So yes, it is possible heat could have damaged something. Who knows.
 

MasterTacoChief

Explorer
Joined
Feb 20, 2017
Messages
67
Here to report I've been having the same issue since updating to 12.0-U8.

Dell Precision Rack 7910
2xE5-2630L-v3
256GB ECC RAM
LSI 9201-16e HBA controller to a 24-bay disk shelf
SATA SSD boot drive (in CDROM slot)
5kVA 240V UPS running in online mode (~1.5kVA used), APC power strip reports constant voltage

System has been in service for multiple years with no previous issues.
Latest version will reboot unexpectedly anywhere from once a day to once every couple weeks. If I drop back to the previously running version (12-U5.1) then I haven't seen a reboot.

Nothing useful in /var/log/messages. Goes from a config reload straight to syslog restarting after the reboot.
I have idrac enterprise on this thing. No alerts or anything logged by the idrac to indicate an issue either. If it were hardware, I'd expect to see the idrac complain.

Any ideas of what else to try? Considering just waiting until 13.0 releases and cross my fingers at this point...
 

Krautmaster

Explorer
Joined
Apr 10, 2017
Messages
81
testwise disable c states in bios. That caused the same in baytrail.

Anyway. I have random crashes on the 13 version, U8 does not work as a Kernel patch made my HBA not booting in U8...
 

MasterTacoChief

Explorer
Joined
Feb 20, 2017
Messages
67
testwise disable c states in bios. That caused the same in baytrail.

Anyway. I have random crashes on the 13 version, U8 does not work as a Kernel patch made my HBA not booting in U8...
C states disabled didn't fix it for me. Rebooted again within 36hrs of setting it. It seems to always happen in the early hours of the day, which is when I have snapshots and replication setup, but it's not 100% consistent with when those are started or running. This is with U8.1 now. Guess I'll be going back to U5.1.
 

Fabrik872

Cadet
Joined
Dec 2, 2022
Messages
1
Hello I am new here but i have been using Truenas for few years on my little server:

case: Eolize SVD NC11 4 Mini ITX
MB: Asrock j4105-itx
32GB non ecc ram
350w psu

and recently i have also experienced random restarts very often sometimes multiple times a day so i tried to switch psu to quite expensive one but still 350w (SilverStone SST-FX350-G 350W) after that random restarts are only once a month or so. So this does look like it helped a bit but the issue is still not completely solved. I dont think i have a overheating issue on synthetic load my cpu is under 70C and on normal usage is around 45C and all disks is under 35C. I have 2 hdds and 1 ssd for truenas so overloading psu with 10w cpu and 3 drives is also not a case i think and in logs there is nothing useful as well. Only boot start sequence. Any idea how i can fix this? I have feeling that motherboard might be dying. It have over 3 years of service and i have read that this model have issues with watchdog memory chip that it is wearing quite fast but i dont want to blow more money if i am not sure that it is the case.
 

Krautmaster

Explorer
Joined
Apr 10, 2017
Messages
81
try to disable c states in bios and check if stable then. Hard to say what the issue could be. What about truenas scale? I found debian to be more reliable on these systems
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Hello I am new here but i have been using Truenas for few years on my little server:

case: Eolize SVD NC11 4 Mini ITX
MB: Asrock j4105-itx
32GB non ecc ram
350w psu

and recently i have also experienced random restarts very often sometimes multiple times a day so i tried to switch psu to quite expensive one but still 350w (SilverStone SST-FX350-G 350W) after that random restarts are only once a month or so. So this does look like it helped a bit but the issue is still not completely solved. I dont think i have a overheating issue on synthetic load my cpu is under 70C and on normal usage is around 45C and all disks is under 35C. I have 2 hdds and 1 ssd for truenas so overloading psu with 10w cpu and 3 drives is also not a case i think and in logs there is nothing useful as well. Only boot start sequence. Any idea how i can fix this? I have feeling that motherboard might be dying. It have over 3 years of service and i have read that this model have issues with watchdog memory chip that it is wearing quite fast but i dont want to blow more money if i am not sure that it is the case.
So a few things, what version of TrueNAS are you using?
Did you do anything to your system (hardware or software) shortly before the system started rebooting itself? This may be hard to identify if it's been a while. When it comes to hardware, did you open the case, move the computer, anything at all. Even moving it can cause a shift in the hardware and cause issues.

Since this system had been running well for a period of time (you stated a few years), I would not change the BIOS settings, you should identify the root cause.

Suggestions:
1) Shut down the system and disconnect/reconnect every card and electrical connection. While you are at it, blow out all the dust that may have accumulated.
2) Run CPU Stress Test and RAM Burn-In tests. I'd run a Memtest86+ for a few days, and a CPU test at least for 4 hours, possibly 8 hours to heat saturate the board/socket. If these pass then you can feel good that the CPU, RAM, and Motherboard in general are in good condition. I say in general because you didn't completely test the motherboard and I don't know if any test that can do that.
3) If you replaced the power supply and the problems continue, well that isn't the problem.
4) Is your computer on a UPS and has a data cable connected and configured to shut down the computer upon power failure? This would help you identify if you are having power issues. A system that powers down properly means the computer is fine and doing what it should during a power failure. If you have the BIOS setup to Power On after power failure, that too can lead to confusion. Set it to Remain Off at least while you troubleshoot the problem.

That is about all I can tell you right now until you do some of that stuff listed above.

But I will say this, the motherboard could be bad, the power-on-reset capacity could have failed and periodically short out. Most computer capacitors are much better these days, but this is just one example of a motherboard component causing reboot issues. It could be a number of components to be honest and identifying them requires a schematic diagram and the knowledge to troubleshoot electronics.

So think back to just before you started seeing these problems, see if you recall making any changes. This is the most likely suspect in my experience.

Good Luck.
 
Top