Unscheduled System Reboot

IronSheepdog

Dabbler
Joined
May 27, 2020
Messages
25
I am running TrueNAS-12.0-U8, but this has been going on for a few months now (about six or more). I am still a newbie at this and it's driving me insane. This is random, but it seems like it's been happening about anywhere from two to six times a week. My TrueNAS server has unscheduled system reboots. I've only had my system running for about a year and a half (if that) and the hardware is all new. I'm on a UPS which is also new. I had SMART tests done in TrueNAS on all of the drives with no errors. Somewhere I read that the cables to the drives could be the issue, so I replaced them. It's still happening. I've seen numerous threats on this issue which are no help at all.

Can someone please help this dumba** out? I'll show whatever logs you want (as long as I know how to get them).
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Last edited:

IronSheepdog

Dabbler
Joined
May 27, 2020
Messages
25
It's highly likely your 250W power supply isn't enough for your build, and is cutting power to the board when certain rails get overloaded. See https://www.truenas.com/community/threads/truenas-randomly-shutting-down.98702/, and follow the thread to the power supply sizing calculators @jgreco links to.

Your CPU alone uses at least 80W TDP. Your 4x Red Pro drives use 7.2W each. That's at least 109W.
I may replace it to see if that's the issue. I'm not doubting your experience or knowledge at all. But I'm curious as to why this would start affecting my build after a year and not right away.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
But I'm curious as to why this would start affecting my build after a year and not right away.
Have you cleaned all the dust out of your box recently?
 

IronSheepdog

Dabbler
Joined
May 27, 2020
Messages
25
Have you cleaned all the dust out of your box recently?
I have it in a pretty much dust-free room, but I check it once every few months. It's clean.
 
Joined
Oct 22, 2019
Messages
3,641
The design is a bit odd. The PSU fan intake is almost kissing the drive bay, and it's pulling air from inside the case itself, so it doesn't draw in the coolest air from outside the case. In fact, it's immediately sucking in the already-heated air from the drives themselves. Perhaps, accelerated wear on the PSU due to an inefficient cooling design?

EDIT: In addition to the above concern, this case apparently includes a low-end PSU (to save on costs.)

11-123-173-09.jpg
 
Last edited:

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
OK, so it's winter up here in the North East US - heating season - server in a warmer environment?

I went through these situations with my FreeNAS Mini which was a small case not dissimilar to yours with a lot of stuff in it - keeping everything in it cool was a problem. The standard PS was a 250W Bronze, running 4 WD Reds and an Atom CPU - I added a couple of extra case fans and a CPU fan also (original was passive), and I had to replace a failed power supply. I had a few unexplained crashes and always felt I was "on the edge" for both power and cooling.

Good luck in working though this.
 

IronSheepdog

Dabbler
Joined
May 27, 2020
Messages
25
OK, so it's winter up here in the North East US - heating season - server in a warmer environment?

I went through these situations with my FreeNAS Mini which was a small case not dissimilar to yours with a lot of stuff in it - keeping everything in it cool was a problem. The standard PS was a 250W Bronze, running 4 WD Reds and an Atom CPU - I added a couple of extra case fans and a CPU fan also (original was passive), and I had to replace a failed power supply. I had a few unexplained crashes and always felt I was "on the edge" for both power and cooling.

Good luck in working though this.
The room I have this server in is actually not heated. So as of right now, it's about 55 degrees F. I doubt temperature is an issue. Either way, is there any way to check that?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Look at the CPU and drive temp graphs under Reporting.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Either way, is there any way to check that?
Does your SEL not give you such data? CPU temp, for instance? Drive temps (which should also be available from SMART monitoring and your TrueNAS system alerts).
 
Joined
Oct 22, 2019
Messages
3,641
So as of right now, it's about 55 degrees F.
Since the system is running, there's hopefully enough ambient heat to prevent condensation. However, if the dew point is in the high 40's or even low 50's, that's some really high relative humidity.

I doubt temperature is an issue.
The "wear" might have accrued over time, across multiple seasons. (My comment about the PSU and case design.)

It's shooting in the dark, so no one can really say for sure what the culprit is.


One thing you can try, if you have the time for it, is to disconnect all drives (for safety reasons), boot into a live Linux ISO, and then run mprime or stress on the maximum settings to crank up the CPU power draw and heat to the maximum, and see if the system stays powered on over night (or for as long as you're willing to test this out without being able to use TrueNAS in the meantime.)

Technically, you could leave the drives plugged in and spinning for a more accurate test during this. Up to you.
 

IronSheepdog

Dabbler
Joined
May 27, 2020
Messages
25
Look at the CPU and drive temp graphs under Reporting.
Everything is normal even back to the last system reboot. And I do have a dehumidifier running in the room as well that I forgot to mention.

Are there any logs I could check in TrueNAS that might clue me in on something?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Did you do any burn-in testing before putting your system into production, or did you just assemble everything and went live, trusting to luck?

Did you check the input power from the wall with a multimeter? Is the voltage between the hot and ground, and neutral and ground the same, or does it fluctuate?
 

IronSheepdog

Dabbler
Joined
May 27, 2020
Messages
25
Did you do any burn-in testing before putting your system into production, or did you just assemble everything and went live, trusting to luck?

Did you check the input power from the wall with a multimeter? Is the voltage between the hot and ground, and neutral and ground the same, or does it fluctuate?
I don't know about testing, but I didn't actually start using the system for about two weeks after assembly. I just let it run. Either way, it was running with everything I have on it for about a year before this started happening.

There is no voltage between neutral and ground. They both are returns on power. Only hot is supposed to be "hot". Voltage is a constant 112 to 119 volts in my house no matter where I am.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
In particular, you might get lucky and see something in /var/log/console.log. However, this is mostly likely an environmental cause, like a ground loop, static discharge, power supply sagging, or capacitors drying out.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
The design is a bit odd. The PSU fan intake is almost kissing the drive bay, and it's pulling air from inside the case itself, so it doesn't draw in the coolest air from outside the case. In fact, it's immediately drawing in already-heated air from the drives themselves. Perhaps, accelerated wear on the PSU due to an inefficient cooling design?

EDIT: In addition to the above concern, this case apparently includes a low-end PSU (to save on costs.)

11-123-173-09.jpg

The whole airflow design of this case is very peculiar. I attached the manual for the case. The black shroud under the drive cage diverts air from the case input fan, which takes air from the drive cage 90 degrees out of phase, and blows that over the motherboard. The power supply vents into a small plenum just above the power supply, which expects the flow to immediately turn 90 degrees to vent out the back of the case. I don't see how this works aerodynamically; there are bound to be tons of airflow dead zones within this case.
 

Attachments

  • QIG_SR30169_V1.1.pdf
    391.7 KB · Views: 144

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Others have already mentioned this and I think you should seriously perform the following to ensure your system is stable, and it may help point you in the correct direction. And if you have a failure during these tests, you need to look into proper cooling, which is not the temperature of the incoming air but rather the flow of air and it's ability to remove the heat created by the components. You have a lot of heat producing items in a small space and a power supply which might be on the small size. But test it before replacing anything. You want to know if your system is stable and if not, identify and fix the problem.

1) Run a CPU Stress Test like Prime95 for a minimum of 4 hours (some people run it 24 hour or longer). You are trying to use the maximum power and heat saturate your system. If the computer does not have any issues here then odds are your cooling and power supply are fine. Also note and report the CPU Temperature during this test. Make sure you leave your hard drives plugged into the power connector, the data cable can be disconnected before you power the system on to run the test if you are worried about your data. And test with your case closed up.

2) Run a memory test like MemTest86 for several days, I like to go for 5 to 7 days myself but one pass of the testing is not proof it's stable, you really need to let it run for a long duration and hope there are not failures.

3) Do you have an UPS? If not then any power issues could be affecting the system. Some systems are more susceptible to minor power variations so just because your TV doesn't take a dive or your lights do not flicker, it doesn't mean the power didn't cause this, which is why a good UPS is highly recommended.

4) If you have a failure, repeat the test and ensure you can create the failure again. Report exactly what the failure results are. In order to help you out we need details.

Good Luck
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
But I'm curious as to why this would start affecting my build after a year and not right away.

If you go and look at the Proper Power Supply Sizing Guidance thread, you'll notice I spend a crapton of time talking about derating and related issues.


One of the things that can happen is that stressed components can degrade over time. What this means is that, just for example, if the PSU is running near its max capacity, it may be running warm, and could be slowly cooking electrolytic capacitors, which will tolerate such abuse for awhile, hundreds of hours in many cases, but after awhile they will start to lose capacity, and along with it, their ability to smooth out transient demands.

Along the same lines, as others have noted, after a year, sometimes components start to get dirty, building up dust. This also increases heat and stress on auxiliary components such as fans. A stalled or struggling fan can suck up a lot more watts than a brand new fan, and my opinion of the fans commonly sold to PC builders and gaming hobbyists is not particularly good.

Others have posted other highly topical comments.
 

IronSheepdog

Dabbler
Joined
May 27, 2020
Messages
25
I think the first thing I'm going to do is replace the entire case. Just from what everyone on here is saying about the cheap PSU and the poor airflow design of the Chenbro case and the fact that I would like to add more drives, I think it's time to ditch it. I bought it thinking that it would be a neat little case that wouldn't take up much space, but perhaps that is it's own downfall. I'm going to purchase a larger case with a larger and better quality power supply. It'll probably take a while, but I will respond with my results. Thanks everyone!
 
Top