X9SCM-F E3-1230v1 CPU overheat false?

Status
Not open for further replies.

vv111y

Dabbler
Joined
Jun 17, 2015
Messages
24
- All good parts
- I updated BIOS to v2.2, but it didn't go smoothly. "ami_2.bat" wouldn't run. It is reporting v2.2 now and it's working. don't know if 'ME' was updated (I'm a noob here, don't know about ME)
- redid cooler&compound. Stock intel cooler
- happens with no/low load
- box was sitting in storage for a long time unused fwiw

I'm out of ideas. I contacted supermicro fwiw. All I can think of is new cpu and/or new mobo
Or is it something with the 'ME'?

Thanks
 
Joined
Oct 2, 2014
Messages
925
are you sure the heatsink is secured properly? Typically when installing parts you install the RAM, CPU, and CPU cooler while the motherboard is outside the case/chassis this allows you to make sure it is fully seated/secure. Did you clean the thermal paste off the CPU and heatsink when you redid it?
 

vv111y

Dabbler
Joined
Jun 17, 2015
Messages
24
are you sure the heatsink is secured properly? Typically when installing parts you install the RAM, CPU, and CPU cooler while the motherboard is outside the case/chassis this allows you to make sure it is fully seated/secure. Did you clean the thermal paste off the CPU and heatsink when you redid it?

I did clean off paste. fwiw it felt somewhat hard and gummy compared to fresh stuff.

If you know the 1230v1 stock cooler it is all plastic with plastic 'pins' that flare on other side of mobo surface. Flares when you push down on the plastic 'poles'. I made sure I felt all 4 click and they all went down the full depth. That's as good as it gets with this cooler, I can't tighten it any more.

I just tried a load experiment. It was good for about 20min idle, then I tried this stress test https://forums.freenas.org/index.ph...al-performance-during-hdd-stress-tests.28184/

within seconds the alarm went off. heatsink is cool to touch, it always was. Maybe it is a legit alarm. My other box has exact same hardware: no issues for years. So why probs with this one?
 
Joined
Oct 2, 2014
Messages
925
I did clean off paste. fwiw it felt somewhat hard and gummy compared to fresh stuff.

If you know the 1230v1 stock cooler it is all plastic with plastic 'pins' that flare on other side of mobo surface. Flares when you push down on the plastic 'poles'. I made sure I felt all 4 click and they all went down the full depth. That's as good as it gets with this cooler, I can't tighten it any more.

I just tried a load experiment. It was good for about 20min idle, then I tried this stress test https://forums.freenas.org/index.ph...al-performance-during-hdd-stress-tests.28184/

within seconds the alarm went off. heatsink is cool to touch, it always was. Maybe it is a legit alarm. My other box has exact same hardware: no issues for years. So why probs with this one?
When you reinstalled the CPU cooler, were you able to pull up on it without it coming loose? What are you full system spec's, psu,case,etc.
 

vv111y

Dabbler
Joined
Jun 17, 2015
Messages
24
X9SCM-F
E3-1230V1
2X8 C5W133EB4GH DDR3-1333 Super Talent CL9 ECC HY
Seasonic 650W
M1015 IT flashed
5 x 2tb drives
cheapo case (but it's all open while troubleshooting)(everything is cool)

I had a hard time getting the cooler out. I checked all the poles though, and they all flared like they should. Best I can tell it seated fine. You may be right suspecting the cooler though
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
If the cooler feels cool but the CPU is overheating then this is pretty well diagnostic of the cooler not being in good thermal contact with the CPU. I would get a new cooler, clean off every trace of thermal paste on the CPU with a compatible solvent and start again. But assemble the cooler as a trial without paste and confirm it is firmly held in contact with the CPU, in case something has got bent or stretched, or the CPU is not quite level, or something. It that's ok, redo the assembly with the thinnest layer of paste that won't have gaps.
 

vv111y

Dabbler
Joined
Jun 17, 2015
Messages
24
are you sure the heatsink is secured properly? Typically when installing parts you install the RAM, CPU, and CPU cooler while the motherboard is outside the case/chassis this allows you to make sure it is fully seated/secure.

And that's what I'm going to do from now on

If the cooler feels cool but the CPU is overheating then this is pretty well diagnostic of the cooler not being in good thermal contact with the CPU. I would get a new cooler, clean off every trace of thermal paste on the CPU with a compatible solvent and start again. But assemble the cooler as a trial without paste and confirm it is firmly held in contact with the CPU, in case something has got bent or stretched, or the CPU is not quite level, or something. It that's ok, redo the assembly with the thinnest layer of paste that won't have gaps.

Yes.

I had to look under the magnifying glass to see that it was just so slightly uneven, even with the connectors/pins clicking in place.
I tried the quickie cpu test again and ran for hours without issue.

Either it's the mobo that's a little warped or the cooler is slightly off. If I had pushed as hard as I did without my hand on the other side I would have broke the mobo. Because it is plastic and gives I did need to put that extra force to get the pins in just 1mm deeper. It required close visual inspection just to see that difference - not a rush job which I tend to do.

lessons:
- metal screws with tension springs is the right way to go. Intel guys -> don't do that.
- go slow and inspect
- As Darren says, populate the mobo first and then put it in place.

That took hours to find.
Thanks guys, help is greatly appreciated.
 

vv111y

Dabbler
Joined
Jun 17, 2015
Messages
24
Now the pool is degraded. 7 checksum errors on new drive. I think it may be the CPU prob but I don't want to take a chance so I'll replace the drive and test it later.

Also all the physical moving and constant reboots didn't help...
 

vv111y

Dabbler
Joined
Jun 17, 2015
Messages
24
Actually does anyone have experience with this? How serious is 7 checksum errors?
Could I clear the errors and see how it goes or is that too risky? Replication is setup but has just started so it will go for quite a while.
 
Joined
Oct 2, 2014
Messages
925
did you do a burn in on the new drive? Glad you got the overheating issue situated
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
Actually does anyone have experience with this? How serious is 7 checksum errors?
Could I clear the errors and see how it goes or is that too risky? Replication is setup but has just started so it will go for quite a while.
If they were all on the drive you removed and they don't recur they are probably nothing to worry about as far as the rest of the machine is concerned. As for testing the drive that had the errors, at the very least it needs a long SMART test and check the results of smartctl -a on it. Some people do prolonged burn-in on drives using badblocks and the above tests plus or minus badblocks on the whole drive should tell you whether the checksum errors are due to a failing drive on just a transient problem.
 

vv111y

Dabbler
Joined
Jun 17, 2015
Messages
24
When it rains it pours - I thought I was okay, checked in a few hours and it was throwing out over 100 Read/Write errors.
I swapped the drive and it rebuilt by 1pm today. Replication finished also.

So it turns out the 7 checksum errors was a prelude to failure.
I will run SMART tests on both boxes and keep a regular schedule of tests and scrubs as per Cyberjocks guide.

fwiw the drives that failed were good Samsung spinpoint drives that sat in storage for a couple years. Now that they were being used 3 out of 4 die fast. My other box has exact same drives from same order and they have been going no problem all this time. Seems keeping still was not healthy.

Thanks again Darren & Roger, the help is appreciated. I should be good to go for the rest (hopefully :))
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
When it rains it pours - I thought I was okay, checked in a few hours and it was throwing out over 100 Read/Write errors.
I swapped the drive and it rebuilt by 1pm today. Replication finished also.

So it turns out the 7 checksum errors was a prelude to failure.
I will run SMART tests on both boxes and keep a regular schedule of tests and scrubs as per Cyberjocks guide.

fwiw the drives that failed were good Samsung spinpoint drives that sat in storage for a couple years. Now that they were being used 3 out of 4 die fast. My other box has exact same drives from same order and they have been going no problem all this time. Seems keeping still was not healthy.

Thanks again Darren & Roger, the help is appreciated. I should be good to go for the rest (hopefully :))
After those unfortunate experiences, I hope all goes well now!
What I learn from this is that it may be important to use new drives in the order purchased, not leave the first one bought as a spare (which I was going to do with my series of gradually purchased larger drives), so that is a valuable piece of information.
 
Status
Not open for further replies.
Top