High CPU temp immediately

Status
Not open for further replies.

eccevery

Dabbler
Joined
Oct 24, 2014
Messages
42
I have a SuperMicro X9 motherboard that I have used for about two years. A while back I started to get high CPU temp warnings, 89 C, and a really angry beep from the server. Sometimes it hangs, probably because of the high temperature.

The funny thing is, the temperature spikes immediately after boot. If I go into BIOS I get 89 C right away and it stays like that. But if I let FreeNAS boot, the CPU settles at around 60 C once the boot process is complete. If I put some load on the server it goes up again, as expected, but the really high temp so soon after boot seems weird right? I mean, how much load does the CPU have at that point?

The CPU is a E5-2660, but I have tried with a another E5 CPU as well (can't remember the exact model, but the cheapest, least TDP one in the series). Same thing there.

I'm not sure what to do about it. Could there be a faulty sensor? A CPU switch did not change the behaviour, would a motherboard replacement be worth a try?
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
Check your CPU heat sink and be sure it's mounted correctly. Even 60 C at idle is very high.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Sir I don't know your level of experience, so forgive me if you already considered this....

But this is *PRECISELY* the behavior one expects when the thermal compound is not present, or is misapplied, or the heatsink is not properly and fully affixed to the die case.
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
I would first follow the advice of the two highly qualified members given above. The reason this advice takes
first place on the list of troubleshooting is that failures of in-die sensors and on-board thermistors are rare.

The funny thing is, the temperature spikes immediately after boot. If I go into BIOS I get 89 C right away and it stays like that. But if I let FreeNAS boot, the CPU settles at around 60 C once the boot process is complete. If I put some load on the server it goes up again, as expected, but the really high temp so soon after boot seems weird right? I mean, how much load does the CPU have at that point?
IIRC when booting, all CPUs start up at full throttle. After a period of time (varies with settings/configuration)
if the power saving features are enabled, the cpu is powered lower after a period of time and therefore cools down at idle speeds. So this is indeed perfectly normal.

The CPU is a E5-2660, but I have tried with a another E5 CPU as well (can't remember the exact model, but the cheapest, least TDP one in the series). Same thing there.
Assuming you have applied the TIM (Thermal Interface Material) correctly with the second CPU, your description -
"same thing there" is a little vague. Would you care to elaborate?

I'm not sure what to do about it. Could there be a faulty sensor? A CPU switch did not change the behavior, would a motherboard replacement be worth a try
Before changing boards, I would attempt to see if a temporary fan placement could illicit some lowering of the
temperature. Take the case cover off, blow some cold air in there with a fan, if the temp will not budge from
the 60 C idle level, then possibly the motherboard is faulty. Do you know if your board has a socket temp sensor?
 

eccevery

Dabbler
Joined
Oct 24, 2014
Messages
42
Check your CPU heat sink and be sure it's mounted correctly. Even 60 C at idle is very high.

I should off cause have been a bit more clear on this: I checked the CPU cooler when I changed the CPU. New thermal compound applied and the old one was wiped off. All fans in the server works fine. That said, the cooler does not seem that hot to me if I touch it.

Assuming you have applied the TIM (Thermal Interface Material) correctly with the second CPU, your description -
"same thing there" is a little vague. Would you care to elaborate?

I meant the same thing happened with the replacement CPU, spiking in heat immediately and the heat alarm went off. That should eliminate an suspicion on a faulty CPU. But as you say, it is probably normal behaviour if CPUs run at full speed on power-on.

Before changing boards, I would attempt to see if a temporary fan placement could illicit some lowering of the
temperature. Take the case cover off, blow some cold air in there with a fan, if the temp will not budge from
the 60 C idle level, then possibly the motherboard is faulty. Do you know if your board has a socket temp sensor?

I've been chasing a new cooler, but its hard to find. I didn't know, when I bought the motherboard, that there are TWO VERSIONS of socket-2011: square and narrow:
S2011-Arten.jpg


There are not many coolers for the narrow version, and that's what I have. Since I'm starting to suspect my CPU cooler is somehow faulty, but replacing it would be a lot easier if I had the square version.

Here are the temp readings (chassis fan connected to fan controller, so not showing up here):

$ ipmitool -H 192.168.10.2 -U admin sensor list
CPU Temp | 89.000 | degrees C | cr | 0.000 | 0.000 | 0.000 | 85.000 | 88.000 | 90.000
System Temp | 40.000 | degrees C | ok | -9.000 | -7.000 | -5.000 | 80.000 | 85.000 | 90.000
Peripheral Temp | 38.000 | degrees C | ok | -9.000 | -7.000 | -5.000 | 80.000 | 85.000 | 90.000
PCH Temp | 58.000 | degrees C | ok | -11.000 | -8.000 | -5.000 | 90.000 | 95.000 | 100.000
P1-DIMMA1 TEMP | 47.000 | degrees C | ok | 1.000 | 2.000 | 4.000 | 80.000 | 85.000 | 90.000
P1-DIMMA2 TEMP | 45.000 | degrees C | ok | 1.000 | 2.000 | 4.000 | 80.000 | 85.000 | 90.000
P1-DIMMB1 TEMP | 46.000 | degrees C | ok | 1.000 | 2.000 | 4.000 | 80.000 | 85.000 | 90.000
P1-DIMMB2 TEMP | 43.000 | degrees C | ok | 1.000 | 2.000 | 4.000 | 80.000 | 85.000 | 90.000
P1-DIMMC1 TEMP | 41.000 | degrees C | ok | 1.000 | 2.000 | 4.000 | 80.000 | 85.000 | 90.000
P1-DIMMC2 TEMP | 41.000 | degrees C | ok | 1.000 | 2.000 | 4.000 | 80.000 | 85.000 | 90.000
P1-DIMMD1 TEMP | 41.000 | degrees C | ok | 1.000 | 2.000 | 4.000 | 80.000 | 85.000 | 90.000
P1-DIMMD2 TEMP | 41.000 | degrees C | ok | 1.000 | 2.000 | 4.000 | 80.000 | 85.000 | 90.000
FAN 1 | na | | na | na | na | na | na | na | na
FAN 2 | na | | na | na | na | na | na | na | na
FAN 3 | 975.000 | RPM | ok | 300.000 | 450.000 | 600.000 | 18975.000 | 19050.000 | 19125.000
FAN 4 | na | | na | na | na | na | na | na | na
FAN A | na | | na | na | na | na | na | na | na
Vcore | 0.888 | Volts | ok | 0.480 | 0.512 | 0.544 | 1.488 | 1.520 | 1.552
3.3VCC | 3.424 | Volts | ok | 2.816 | 2.880 | 2.944 | 3.584 | 3.648 | 3.712
12V | 12.296 | Volts | ok | 10.494 | 10.600 | 10.706 | 13.091 | 13.197 | 13.303
VDIMM | 1.504 | Volts | ok | 1.152 | 1.216 | 1.280 | 1.760 | 1.776 | 1.792
5VCC | 5.152 | Volts | ok | 4.096 | 4.320 | 4.576 | 5.344 | 5.600 | 5.632
CPU VTT | 1.064 | Volts | ok | 0.872 | 0.896 | 0.920 | 1.344 | 1.368 | 1.392
VBAT | 3.488 | Volts | ok | 2.816 | 2.880 | 2.944 | 3.584 | 3.648 | 3.712
VSB | 3.584 | Volts | ok | 3.008 | 3.072 | 3.136 | 3.856 | 3.920 | 3.984
AVCC | 3.424 | Volts | ok | 2.816 | 2.880 | 2.944 | 3.584 | 3.648 | 3.712
Chassis Intru | 0x0 | discrete | 0x0000| na | na | na | na | na | na
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419

eccevery

Dabbler
Joined
Oct 24, 2014
Messages
42

eccevery

Dabbler
Joined
Oct 24, 2014
Messages
42

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
I haven't tried Noctua, but I fear it will fit in the socket but the heat pipes will take up space, making the first RAM slot on each side of the CPU unusable. Which motherboard do you have? I have http://www.supermicro.com/products/motherboard/Xeon/C600/X9SRL-F.cfm and had a hard time finding a cooler that allowed all memory slots to be used.

With normal sized memory (as opposed to crazy gamer ram with high heat sinks), there is no problem. The U9DX i4 I linked suits the X10SRi-F and most other SM mobos too.
 

eccevery

Dabbler
Joined
Oct 24, 2014
Messages
42
With normal sized memory (as opposed to crazy gamer ram with high heat sinks), there is no problem. The U9DX i4 I linked suits the X10SRi-F and most other SM mobos too.

OK, buying a U9DXi4 would have been cheeper, too late now though. I have heat sinks on the memory modules, but not gaming-style, just HP memory with a a small heat sink on. My CPU cooler now BARELY passes the memory, it's less than a millimeter to the memory module. It's ridiculously tight.
 

eccevery

Dabbler
Joined
Oct 24, 2014
Messages
42
OK, problem found. The CPU cooler is BENT! If I put it on a flat surface is rolls a bit from side to side.

New mobo installed and new cooler:

$ ipmitool -H 192.168.10.2 sensor list
CPU Temp | 35.000 | degrees C | ok | 0.000 | 0.000 | 0.000 | 85.000 | 88.000 | 90.000


The SRi-F though was VERY picky about the memory modules. The slightest difference in CAS latency and it would not POST. One of my modules was 11-12-E2 and the rest is 11-11-E2. And it would not accept three modules of EXACTLY the same type, although the quick specs says odd number of modules should be runable. So I'm down to 32GB from 64 :(

Thanks for the support and ideas!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
OK, problem found. The CPU cooler is BENT! If I put it on a flat surface is rolls a bit from side to side.
Yup, that would do it. Odd that the surface would be bowed that much. For my aftermarket coolers I've learned to check the flatness of the cooler and believe it or not, they shouldn't be perfectly flat, they should be slightly convex but not enough that you could notice it on a flat table surface. Okay, you really need to compare it to your CPU and try to match the two surfaces, well almost match, there should be just slightly more contact in the center area but I do mean slightly. A razor blade works well to check flatness.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Sir,

I just want the record to show, you could have saved quite a bit of money and frustration, if you had just believed me right at the outset that what you had was only properly explained by misapplied thermal compound or a loose cooler.
:)
 

eccevery

Dabbler
Joined
Oct 24, 2014
Messages
42
Sir,

I just want the record to show, you could have saved quite a bit of money and frustration, if you had just believed me right at the outset that what you had was only properly explained by misapplied thermal compound or a loose cooler.
:)

Not as much as you might think.
- I did re-fit The cooler when i Changed cpu. New thermal paste applied and The old wiped off. I did not feel strange to me when i mounted it.
- The noctua cooler mentioned is pretty much The only modell that fits. It a few available where i live, and fairly expensive. About $65.
- The mb was about $145. And it is a lot easier to get a new cooler if needed in The future.
- selling The old mb should make up for the price diff. At least make The loss very small.

The dissapointment really is the memory issue.
 
Status
Not open for further replies.
Top