After 11.0 update, CPU temperatures are too high

Status
Not open for further replies.

sweeze

Dabbler
Joined
Sep 23, 2013
Messages
24
After I upgraded to 11 I saw the console was spamming some coretemp messages about the CPU temperature being too high and that a shutdown was recommended.

I halted the system, drove home, booted it with a monitor attached and went into my BIOS (gigabyte z68x-ud3h-b3) and lowered my clock multiplier from 34 to 30 made sure it saw fans being used, let it boot back up.

I still see high temperatures in sysctl via dev.cpu, the thermal zone temperatures look fine.

I'm already back to about 93C and I'm not sure how to proceed at this point. Any ideas?
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
I'm already back to about 93C
You meant 93F right? Not 93Celsius (that's almost 200 degrees F)
First off, I would go back to your previous boot environment and check to see if the problem
resolves itself. if it does, it's going to be either a BSD compatibility issue with the mobo chipset,
or a bios update. From time to time BSD drops support for hardware and your board/chip/bios
may be affected.
 

sweeze

Dabbler
Joined
Sep 23, 2013
Messages
24
You meant 93F right? Not 93Celsius (that's almost 200 degrees F)
First off, I would go back to your previous boot environment and check to see if the problem
resolves itself. if it does, it's going to be either a BSD compatibility issue with the mobo chipset,
or a bios update. From time to time BSD drops support for hardware and your board/chip/bios
may be affected.

I mean 93C assuming sysctl reports it correctly?

I just changed one thing that I can't imagine was the root cause but who knows — I rebuilt a binary for the RealTek driver against the FreeBSD 11 kernel sources in a VM. If I don't use the RealTek-provided driver the network stack freezes after some condition related to utilization and requires a reboot, and after upgrading to FreeNAS 11 I knew I'd be needing to rebuild it.

After I built it, copied it over, loaded it successfully and the cpu throttle_log had either hit 1 or stayed at 0 across all cores, the problem resolved.

I suppose it's possible that attempting to load the old module unsuccessfully could have put the kernel into a state where it was absurdly busy or something that resolved itself after loading a module that worked?

Or could be completely unrelated.

I'll include the relevant/interesting values from sysctl in-line:

Code:
hw.acpi.thermal.tz1.temperature: 29.9C
hw.acpi.thermal.tz0.temperature: 27.9C
dev.coretemp.7.%parent: cpu7
dev.coretemp.7.%pnpinfo:
dev.coretemp.7.%location:
dev.coretemp.7.%driver: coretemp
dev.coretemp.7.%desc: CPU On-Die Thermal Sensors
dev.coretemp.6.%parent: cpu6
dev.coretemp.6.%pnpinfo:
dev.coretemp.6.%location:
dev.coretemp.6.%driver: coretemp
dev.coretemp.6.%desc: CPU On-Die Thermal Sensors
dev.coretemp.5.%parent: cpu5
dev.coretemp.5.%pnpinfo:
dev.coretemp.5.%location:
dev.coretemp.5.%driver: coretemp
dev.coretemp.5.%desc: CPU On-Die Thermal Sensors
dev.coretemp.4.%parent: cpu4
dev.coretemp.4.%pnpinfo:
dev.coretemp.4.%location:
dev.coretemp.4.%driver: coretemp
dev.coretemp.4.%desc: CPU On-Die Thermal Sensors
dev.coretemp.3.%parent: cpu3
dev.coretemp.3.%pnpinfo:
dev.coretemp.3.%location:
dev.coretemp.3.%driver: coretemp
dev.coretemp.3.%desc: CPU On-Die Thermal Sensors
dev.coretemp.2.%parent: cpu2
dev.coretemp.2.%pnpinfo:
dev.coretemp.2.%location:
dev.coretemp.2.%driver: coretemp
dev.coretemp.2.%desc: CPU On-Die Thermal Sensors
dev.coretemp.1.%parent: cpu1
dev.coretemp.1.%pnpinfo:
dev.coretemp.1.%location:
dev.coretemp.1.%driver: coretemp
dev.coretemp.1.%desc: CPU On-Die Thermal Sensors
dev.coretemp.0.%parent: cpu0
dev.coretemp.0.%pnpinfo:
dev.coretemp.0.%location:
dev.coretemp.0.%driver: coretemp
dev.coretemp.0.%desc: CPU On-Die Thermal Sensors
dev.coretemp.%parent:
dev.cpu.7.temperature: 62.0C
dev.cpu.7.coretemp.throttle_log: 0
dev.cpu.7.coretemp.tjmax: 98.0C
dev.cpu.7.coretemp.resolution: 1
dev.cpu.7.coretemp.delta: 36
dev.cpu.6.temperature: 62.0C
dev.cpu.6.coretemp.throttle_log: 1
dev.cpu.6.coretemp.tjmax: 98.0C
dev.cpu.6.coretemp.resolution: 1
dev.cpu.6.coretemp.delta: 36
dev.cpu.5.temperature: 58.0C
dev.cpu.5.coretemp.throttle_log: 0
dev.cpu.5.coretemp.tjmax: 98.0C
dev.cpu.5.coretemp.resolution: 1
dev.cpu.5.coretemp.delta: 40
dev.cpu.4.temperature: 58.0C
dev.cpu.4.coretemp.throttle_log: 1
dev.cpu.4.coretemp.tjmax: 98.0C
dev.cpu.4.coretemp.resolution: 1
dev.cpu.4.coretemp.delta: 40
dev.cpu.3.temperature: 59.0C
dev.cpu.3.coretemp.throttle_log: 1
dev.cpu.3.coretemp.tjmax: 98.0C
dev.cpu.3.coretemp.resolution: 1
dev.cpu.3.coretemp.delta: 39
dev.cpu.2.temperature: 59.0C
dev.cpu.2.coretemp.throttle_log: 1
dev.cpu.2.coretemp.tjmax: 98.0C
dev.cpu.2.coretemp.resolution: 1
dev.cpu.2.coretemp.delta: 39
dev.cpu.1.temperature: 56.0C
dev.cpu.1.coretemp.throttle_log: 0
dev.cpu.1.coretemp.tjmax: 98.0C
dev.cpu.1.coretemp.resolution: 1
dev.cpu.1.coretemp.delta: 42
dev.cpu.0.temperature: 56.0C
dev.cpu.0.coretemp.throttle_log: 1
dev.cpu.0.coretemp.tjmax: 98.0C
dev.cpu.0.coretemp.resolution: 1
dev.cpu.0.coretemp.delta: 42


It's holding in the 50-60Cs now and as long as that continues to happen I have no problem. Would really like to know the root cause though obviously so that it doesn't melt itself while I'm out of town.
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
I suppose it's possible that attempting to load the old module unsuccessfully could have put the kernel into a state where it was absurdly busy or something that resolved itself after loading a module that worked?
Well this is a horse of a completely different color:eek:
My guess is that the temp was not accurate (not even close), I would think a cpu would last only minutes at 200degreesF :)
 

sweeze

Dabbler
Joined
Sep 23, 2013
Messages
24
Well this is a horse of a completely different color:eek:
My guess is that the temp was not accurate (not even close), I would think a cpu would last only minutes at 200degreesF :)

The tjmax value is 98C so that's the hottest it will go before doing something drastic. I'm apparently hit that temperature at least once on some of the cores since I see that throttle_log value of non-zero on some of them.

The CPU in that system is Sandy Bridge 2600K if that's relevant. The thing that weirds me out is that I have a closed water pump in that guy which was why I checked the BIOS to make sure the RPMs were not 0.

I don't know why sysctl/acpi would have the wrong value reported, I am inclined to believe it was really that hot.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
Definitely reseat that cooler.
 

sweeze

Dabbler
Joined
Sep 23, 2013
Messages
24
Definitely reseat that cooler.

Will do. For those playing at home, ever since I loaded the Realtek kernel module for FreeBSD/FreeNAS 11 kernel, it has not gotten above 60C. I kept an eye on it while I was traveling this weekend and it's been totally copacetic.
 
Status
Not open for further replies.
Top