CPU overheating for 2 days, no alert created

yusisushi

Dabbler
Joined
Nov 11, 2022
Messages
14
2days ago my TrueNAS system became very slow and it was not immediately apparent why until I noticed the CPU being at 90°C.
The cause was a defective CPU fan.
1690621727519.png


I did not get an alert email.. I didn't even see an alert in the notifications in the GUI(!)

I've been trying to find any related settings for CPU temperature and alerts, but I know any 'WARNING' or higher is supposed to send me an email. This also happens when I reboot the NAS for example 'warning:unscheduled reboot'

I'm worried that my CPU could have died if I didn't notice what was going on.

Thank you in advance

Platform: Generic
Version: TrueNAS-13.0-U4
CPU: Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz
Memory: 4x 16GB DDR4
Storage: WD RED+
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I'm worried that my CPU could have died if I didn't notice what was going on.
Your CPU has a built in throttle to limit it from getting "too hot". But I fully understand your concern.

Why wouldn't your Motherboard alert you like everyone else's would? Mine beeps if the CPU fan drops below a certain speed and beeps if the CPU gets too hot. These are BIOS settings.

I believe you need to have a motherboard which supports IPMI, when it signals an alarm condition then TrueNAS would report it, but I could be wrong, I have never had that situation. I also have not seen a CPU Temp Alarm Limit setting in the GUI (Alert Settings). Sorry I could not give you a better answer, someone might chime in and provide more.
 

yusisushi

Dabbler
Joined
Nov 11, 2022
Messages
14
Your CPU has a built in throttle to limit it from getting "too hot". But I fully understand your concern.

Why wouldn't your Motherboard alert you like everyone else's would? Mine beeps if the CPU fan drops below a certain speed and beeps if the CPU gets too hot. These are BIOS settings.

I believe you need to have a motherboard which supports IPMI, when it signals an alarm condition then TrueNAS would report it, but I could be wrong, I have never had that situation. I also have not seen a CPU Temp Alarm Limit setting in the GUI (Alert Settings). Sorry I could not give you a better answer, someone might chime in and provide more.
Hi, thanks for the reply. I do indeed think it throttled and stopped itself from actualy loverheating. But It still makes sense to be alerted about something like this.

It is possible my motherboard would beep, but there is no speaker attached currently. And I can't say for sure because the motherboard is a non-brand without any manual to be found online, for, reasons..
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Get into the BIOS setting and take a look at the settings. See if you have anything that would alert you as to the CPU fan speed was too slow, or allow setting the CPU throttling temperature to a lower value if you desire. Also, on your motherboard, most have a 4 pin header called Speaker or SPKR. You just need to connect a speaker to pins 1 and 4 (the out two pins). When you bootstrap the computer you might head a few beeps, this is normal on most boards, it tells you that POST has past. If the beeps continue then you have an alarm condition. You can test this out by stopping the CPU fan rotation to see if the alarm is generated.

If you find the 4 pin header, you can purchase a speaker similar to this one practically anywhere.

If you need help with your motherboard identification and finding a user manual, post some photos of the motherboard, get some close ups of any markings on the board, like manufacturer and model. With all the members on this forum, we should be able to help. Also, provide photos of the jumpers and headers that are clear to see and read. I've been dealing with motherboards since 1977. I built my first computer from a kit, it was exciting and intimidating at the same time. I knew electronics and that was the start of learning computers.
 

yusisushi

Dabbler
Joined
Nov 11, 2022
Messages
14
If you need help with your motherboard identification and finding a user manual, post some photos of the motherboard

This is the motherboard I have. If you could provide me with a manual that would be splendid! I suspect it is a chinese non-brand. I was already very lucky to find a post somewhere that described which jumpers to switch for being able to use an nvme drive to boot.

The reason I went with this board is because its Micro-ATX, has 4 DDR4 dim slots and supports the 2011 socket, which is not easy to find (!) and all of which were requirements I had for the parts I had.

I found the speaker connector and my plan is now to indeed install one.
1690794876551.png


However, I still would like the NAS itself to also create an alert. I have seen other posts of people talking about overtemperature warning, so if my CPU was not overheating, maybe I could lower the treshold for an alert somewhere?
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
There may be lm_sensors loaded, or the sensors command. But usually that needs to be configured for your motherboard, Off brand chinese motherboards typically don't have manuals, and you have to go thru the bios documenting everyhting yourself, though you may find it is mostly a a copy of a known brand bios. Unless the sensors package can detect and configure your system monitoring, then getting an alert from the system seems overly optimistic. Also configuring monitoring on an appliance OS like TN needs to be done in such a was as to integrate into the system, not break it, and not get blown away on reboot of upgrade.
 

yusisushi

Dabbler
Joined
Nov 11, 2022
Messages
14
There may be lm_sensors loaded, or the sensors command. But usually that needs to be configured for your motherboard, Off brand chinese motherboards typically don't have manuals, and you have to go thru the bios documenting everyhting yourself, though you may find it is mostly a a copy of a known brand bios. Unless the sensors package can detect and configure your system monitoring, then getting an alert from the system seems overly optimistic. Also configuring monitoring on an appliance OS like TN needs to be done in such a was as to integrate into the system, not break it, and not get blown away on reboot of upgrade.

Ok, but I can read the temperatures fine from within the GUI or from cli.
See screenshot and below output of command;

Code:
...@truenas:~ $ sysctl dev.cpu | grep temperature
dev.cpu.23.temperature: 44.0C
dev.cpu.21.temperature: 48.0C
dev.cpu.19.temperature: 49.0C
dev.cpu.17.temperature: 47.0C
dev.cpu.15.temperature: 43.0C
dev.cpu.13.temperature: 47.0C
dev.cpu.11.temperature: 45.0C
dev.cpu.9.temperature: 47.0C
dev.cpu.7.temperature: 46.0C
dev.cpu.5.temperature: 46.0C
dev.cpu.3.temperature: 48.0C
dev.cpu.1.temperature: 46.0C
dev.cpu.22.temperature: 44.0C
dev.cpu.20.temperature: 48.0C
dev.cpu.18.temperature: 48.0C
dev.cpu.16.temperature: 45.0C
dev.cpu.14.temperature: 44.0C
dev.cpu.12.temperature: 47.0C
dev.cpu.10.temperature: 45.0C
dev.cpu.8.temperature: 47.0C
dev.cpu.6.temperature: 46.0C
dev.cpu.4.temperature: 46.0C
dev.cpu.2.temperature: 48.0C
dev.cpu.0.temperature: 46.0C


Is it possible at all to also configure an alert in truenas at fe; 80°C?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
s it possible at all to also configure an alert in truenas at fe; 80°C?
Absolutely, but not in TrueNAS directly. You can create a simple script that would poll your system every 5, 10, 30 minutes (your choice) that would inspect the CPU temperature and if it exceeds the limit you set, generate a Critical Warning email and send it to your email address. It's an easy thing to do if you have the will to create it, then you will have the will to learn how to create it. There are scripts out there that you could examine to give you an idea but I think the easiest way is a simple Python script, then run a CRON job for the periodicity you desire.


Necessity is the mother of all inventions.
 
Top