Supermicro X10SDV-TLN4F - CPU Fan Issue

D-Tijori

Dabbler
Joined
Apr 19, 2017
Messages
40
Hello all,

Been a while since I posted. Hope everyone is doing well. Guess no news is good news.


Well, until now.

System:
FreeNAS 11.2-U6 (Fresh install on a SSD)
Supermicro X10SDV-TlN4F ( with fan/heatsink)
2 * Crucial 16GB ECC RDIMM's
4*4 WD Red drives
Chassis: SilverStone DS380B


Backstory:

Post the update to 11.2-U6, there was a day when I accidentally turned-off the power to the machine. Restarted and everything was working well.

Issues noticed after above incident:
1. Fan1 (rear chassis fan - spinning at 1300 RPM) & Fan 2 (CPU heatsink fan - spinning, and making most of the noise, at 6800 RPM) were dialed up all the way and have stayed that way since.
1_Fans.JPG

2. Also, did not realize the CMOS battery was almost dead (only had 0.065 volts per IPMI sensors).
3. LED 8 starts blinking red as soon as the system boots and/or boots into FreeNAS. This does not happen when the system is started after a while (cold system). Looked up LED 8 in the manual:

Q: What does PWR Fail/power failure mean here?
2_LED8.JPG



Steps Taken:
1. Changed the battery thinking that was the root cause - Did not solve the issue, Fan1 & Fan2 still enjoying brisk speed along with LED8 blinking
2. Did multiple complete shut-downs and start-ups - Did not solve the issue, Fan1 & Fan2 still enjoying brisk speed along with LED8 blinking
3. Did a fresh install of FreeNAS 11.2-U6 on a SSD. (Was on two USB keys earlier), used IPMItool script to assign proper thresholds (same as earlier install) - Did not solve the issue, Fan1 & Fan2 still enjoying brisk speed along with LED8 blinking
4. Swapped out RAM sticks one by one in all the 4 slots.

Thoughts:
1. Is there an issue with the motherboard (sincerely hope not)?
2. Is the CPU fan failing and hence LED8 is blinky? Does not seem to be the case since it is happily being noisy at constant 6800 RPM.
3. RAM sticks do not seem to be the issue at all since they work in all 4 slots together and individually.

Further Steps in process:
Ordered thermal paste/ Noctua 60mm CPU fan to replace the existing fan.

=================

Anything else someone can think about at this point? Any and all inputs are welcomed.

Thanks very much in advance.


--
D-Tijori
 

D-Tijori

Dabbler
Joined
Apr 19, 2017
Messages
40
*Bump*
 

Jessep

Patron
Joined
Aug 19, 2018
Messages
379
Thermal sensor? If the sensor failed the board would run the fans at max and register a overheating condition.
 

D-Tijori

Dabbler
Joined
Apr 19, 2017
Messages
40
Did BMC reset and now the fans are cycling between slow and full cycle. This is post applying new thermal paste.

Hopefully, this means the thermal sensor is not the culprit here. But whatever thresholds I apply, it just isnt working.

Apart from CPU fans, there are 3 Noctua PWM fans in the case. All with 300 rpm - 1200 rpm range.

Firmware Revision: 3.84
Bios: 2.0c
Redfish version: 1.01


Thoughts?
 
Last edited:

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211

D-Tijori

Dabbler
Joined
Apr 19, 2017
Messages
40
Indeed. And getting quite irritating now to a place where the ostrich act just wont cut it. Apologies for the late reply.

Backstory & Updated Issue:
A week ago found the right (see below) thresholds for all fans and the system stabilized. However, since yesterday about this time (7 am PST), all the fans started cycling again. No change in workload, temperatures (in fact ambient temps were down by 3-4 'C since last 3 days.

Fans:
HDD Fans: 2 * Noctua NF-P12 PWM (RPM Range: 200 - 1200) - FAN3 & FAN 4
Case Fan: 1 * Noctua NF-S12A PWM (RPM Range: 200 - 1300) - FAN 1
CPU FAN (See previous post for pic): FAN 2

1570717380677.png


Steps tried:
- Followed exactly the same procedure as last time with same thresholds. No change throughout the day. Fans still cycling.
Environment: About 22-23'C, room window open​
- Overnight (last night), all the fans settled on their own.
Environment: 18-20'C, window closed​
Latest State (7 am PST):
- Fans have started cycling again. Without any load.
Environment: 15-16'C, window open​
Potential Root Cause / Conclusions:
- Have seen some spikes in electricity (lgihts dimming mostly). Could this be triggering something on the motherboard?
Case for a UPS?​
- Act of windows being open/closed is tripping the thermal sensors?
But shouldn't lower temps. alleviate, by logic, any thermal issues? Of course, I do not have nearly the same hardware/software knowledge as others here.​
- However, another line of thought is: perhaps lower temps. spool down the FANS and then IPMI is thinking they are stalled and spins them back again?
Strong case to maintain some sort of stable temps in the room? (i.e. keep the window closed?)​
=======================================================

@Glorious1: TM for the reply. Thoughts? This motherboard only has one zone I think. Might need to try your script.


Looking for all possible insights/rationale. More to come, I suspect.



Thanks very much.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
That board has a lot of variations apparently. If yours is PCB 2.00, whatever that is, you have two fan zones. Fan Header 4 on the board should be designated FAN A in IPMI if that is the case, according to the manual.

It looks like your upper thresholds for FANS 3+4 are reversed with those for FAN1, based on the specs you posted. Also, you should start the upper threshold a little above the maximum rated speed. See @Ericloewe's guide for setting thresholds if you haven't already.

The LED 8 blinking is troubling. Have you tried switching fan modes in IPMI (Full, Standard, etc)? Maybe that would bring it back. Although if resetting the BMC didn't, I don't know.

Also, I would try shutting down, then unplugging and replugging all the fans.

Did you get the new CPU fan?

Again, if you runs spincheck.sh, you'll be able to see what is happening in terms of duty cycle and RPMs as they change. It will also show which fans are cycling.
 

D-Tijori

Dabbler
Joined
Apr 19, 2017
Messages
40
That board has a lot of variations apparently. If yours is PCB 2.00, whatever that is, you have two fan zones. Fan Header 4 on the board should be designated FAN A in IPMI if that is the case, according to the manual.

It says on the motherboard its PCB 2.00. In IPMI it shows FAN4, not FANA. B


It looks like your upper thresholds for FANS 3+4 are reversed with those for FAN1, based on the specs you posted. Also, you should start the upper threshold a little above the maximum rated speed. See @Ericloewe's guide for setting thresholds if you haven't already.

Yes indeed. Apologies. Correct specs:

HDD Fans: 2 * Noctua NF-P12 PWM (RPM Range: 200 - 1300) - FAN3 & FAN 4
Case Fan: 1 * Noctua NF-S12A PWM (RPM Range: 200 - 1200) - FAN 1


The LED 8 blinking is troubling. Have you tried switching fan modes in IPMI (Full, Standard, etc)? Maybe that would bring it back. Although if resetting the BMC didn't, I don't know.

Have done this multiple time. During repasting of CPU, pretty much had to disassemble the whole system. Have reset BMC multiple times with following commands:

1. ipmitool bmc reset warm
2. ipmitool raw 0x3c 0x40




Also, I would try shutting down, then unplugging and replugging all the fans.

Pre/Post threshold update? Pre is done.

Did you get the new CPU fan?

I did. But the fans stabilized and lasted a week till today, so did not bother installing it. Do not see anything wrong with the CPU fan really.

Again, if you runs spincheck.sh, you'll be able to see what is happening in terms of duty cycle and RPMs as they change. It will also show which fans are cycling.

Will be doing this next if reversing the thresholds does not work. In fact, I did try it last week once. But there was some oprand error and hence was not deploying correctly.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Well, hopefully something you did fixed it.
If you paste that spincheck error for me I can look into it. Something different about your board probably - it works for mine.

EDIT: Actually I found that last time I submitted an update I accidentally used an earlier version of spincheck.sh. Here is the latest one. You need to delete the ".txt" and then might need to do a chmod +x spincheck.sh to make it executable. No idea if this will fix the error though.
 

Attachments

  • spincheck.sh.txt
    8.9 KB · Views: 336
Last edited:

D-Tijori

Dabbler
Joined
Apr 19, 2017
Messages
40
Here we are. Do explain the fan duty cycle bit a bit.

Code:
#How many whole minutes do you want between spin checks?

NOTE ABOUT DUTY CYCLE (Fan%0 and Fan%1):
Some boards apparently report incorrect duty cycle, and can
report duty cycle for zone 1 when that zone does not exist.

Key to drive status symbols:  * spinning;  _ standby;  ? unknown                              Version 2018-01-01

Thursday, Oct 10
          ada0 ada1 ada2 ada3 Tmax Tmean  ERRc CPU  FAN1  FAN2  FAN3  FAN4  FANA Fan%0 Fan%1 MODE   
23:23:48  *28  *30  *29  *30  ^30  29.25 -4.32  35  1300  6800  1200  1200   ---   100   100 Optimal
23:24:49  *28  *30  *29  *30  ^30  29.25 -4.32  32   500  2900   400   500   ---    20    20 Optimal
23:25:50  *28  *30  *29  *30  ^30  29.25 -4.32  33  1300  6800  1200  1200   ---   100   100 Optimal
23:26:51  *28  *30  *29  *30  ^30  29.25 -4.32  32  1300  6800  1200  1200   ---   100   100 Optimal
23:27:52  *28  *30  *29  *30  ^30  29.25 -4.32  33  1300  6800  1200  1200   ---   100   100 Optimal
23:28:53  *28  *30  *29  *30  ^30  29.25 -4.32  32  1300  6800  1200  1200   ---   100   100 Optimal
23:29:54  *28  *30  *29  *30  ^30  29.25 -4.32  34  1300  3000  1100  1200   ---    20    20 Optimal
23:30:55  *28  *30  *29  *30  ^30  29.25 -4.32  34  1300  6800  1200  1200   ---   100   100 Optimal
23:31:55  *28  *30  *29  *30  ^30  29.25 -4.32  33  1200  4400  1200  1100   ---    20    20 Optimal
23:32:56  *28  *30  *29  *30  ^30  29.25 -4.32  33  1300  6800  1200  1200   ---   100   100 Optimal
23:33:57  *28  *30  *29  *30  ^30  29.25 -4.32  33  1300  6800  1200  1200   ---   100   100 Optimal
23:34:58  *28  *30  *29  *30  ^30  29.25 -4.32  33  1300  6800  1200  1200   ---   100   100 Optimal
23:35:59  *28  *30  *29  *30  ^30  29.25 -4.32  32  1300  6800  1200  1200   ---   100   100 Optimal
23:36:59  *28  *30  *29  *30  ^30  29.25 -4.32  35  1300  6800  1200  1200   ---   100   100 Optimal
23:38:00  *28  *30  *29  *30  ^30  29.25 -4.32  33  1300  6800  1200  1200   ---   100   100 Optimal
23:39:01  *28  *30  *29  *30  ^30  29.25 -4.32  34  1300  6800  1200  1200   ---   100   100 Optimal
23:40:02  *28  *30  *29  *30  ^30  29.25 -4.32  32  1300  6800  1200  1200   ---   100   100 Optimal
23:41:03  *28  *30  *29  *30  ^30  29.25 -4.32  34  1300  6800  1200  1200   ---   100   100 Optimal
23:42:04  *28  *30  *29  *30  ^30  29.25 -4.32  33  1300  6800  1200  1200   ---   100   100 Optimal
23:43:04  *28  *30  *29  *30  ^30  29.25 -4.32  34   300  4100   700  1200   ---    20    20 Optimal
23:44:05  *28  *30  *29  *30  ^30  29.25 -4.32  34  1300  6800  1200  1200   ---   100   100 Optimal#


And, joy-of-joy, fans are normalized. For now:

1_Temps&Fans.JPG

2_Temps&Fans.JPG


Now, what did your script REALLY do here? Did you just invade my privacy and took over my server?

Just kidding. Its friday. :)

Conclusions:
On a serious note. Unless you see anything in the log, I do not see how they have again normalized overnight without any additional load or lack thereof. And without any config changes.


Thoughts?
 

D-Tijori

Dabbler
Joined
Apr 19, 2017
Messages
40
Normalized > Cycling (Time stamp: 8:01:39). Same load. Ambient temp. down.

Code:
How many whole minutes do you want between spin checks?

NOTE ABOUT DUTY CYCLE (Fan%0 and Fan%1):
Some boards apparently report incorrect duty cycle, and can
report duty cycle for zone 1 when that zone does not exist.

Key to drive status symbols:  * spinning;  _ standby;  ? unknown                              Version 2018-01-01

Friday, Oct 11
          ada0 ada1 ada2 ada3 Tmax Tmean  ERRc CPU  FAN1  FAN2  FAN3  FAN4  FANA Fan%0 Fan%1 MODE  
07:20:03  *32  *34  *33  *33  ^34  33.00 -0.57  35   400  3500   300   300   ---    28    28 Optimal
07:21:04  *32  *34  *33  *33  ^34  33.00 -0.57  34   400  3500   300   300   ---    28    28 Optimal
07:22:05  *32  *34  *33  *33  ^34  33.00 -0.57  34   400  3500   300   300   ---    28    28 Optimal
07:23:06  *32  *34  *33  *33  ^34  33.00 -0.57  34   400  3500   300   300   ---    28    28 Optimal
07:24:07  *32  *34  *33  *33  ^34  33.00 -0.57  34   400  3500   300   300   ---    28    28 Optimal
07:25:08  *32  *34  *33  *33  ^34  33.00 -0.57  34   400  3500   300   300   ---    28    28 Optimal
07:26:09  *32  *34  *33  *33  ^34  33.00 -0.57  34   400  3500   300   300   ---    28    28 Optimal
07:27:10  *32  *34  *33  *33  ^34  33.00 -0.57  35   400  3500   300   300   ---    28    28 Optimal
07:28:10  *32  *33  *33  *33  ^33  32.75 -0.82  35   400  3500   300   300   ---    28    28 Optimal
07:29:11  *32  *33  *33  *33  ^33  32.75 -0.82  35   400  3500   300   300   ---    28    28 Optimal
07:30:12  *32  *33  *33  *33  ^33  32.75 -0.82  35   400  3500   300   300   ---    28    28 Optimal
07:31:13  *32  *33  *33  *33  ^33  32.75 -0.82  36   400  3500   300   300   ---    28    28 Optimal
07:32:14  *32  *33  *33  *32  ^33  32.50 -1.07  36   400  3500   300   300   ---    28    28 Optimal
07:33:14  *32  *33  *33  *32  ^33  32.50 -1.07  35   400  3500   300   300   ---    28    28 Optimal
07:34:15  *32  *33  *33  *32  ^33  32.50 -1.07  35   400  3500   300   300   ---    28    28 Optimal
07:35:16  *32  *33  *33  *32  ^33  32.50 -1.07  35   400  3500   300   300   ---    28    28 Optimal
07:36:17  *32  *33  *33  *32  ^33  32.50 -1.07  35   400  3500   300   300   ---    28    28 Optimal
07:37:18  *31  *33  *33  *32  ^33  32.25 -1.32  35   400  3500   300   300   ---    28    28 Optimal
07:38:19  *31  *33  *33  *32  ^33  32.25 -1.32  35   400  3500   300   300   ---    28    28 Optimal
07:39:20  *31  *33  *33  *32  ^33  32.25 -1.32  35   400  3500   300   300   ---    28    28 Optimal
07:40:20  *31  *33  *33  *32  ^33  32.25 -1.32  35   400  3500   300   300   ---    28    28 Optimal
07:41:21  *31  *33  *33  *32  ^33  32.25 -1.32  35   400  3500   300   300   ---    28    28 Optimal
07:42:22  *31  *33  *33  *32  ^33  32.25 -1.32  34   400  3500   300   300   ---    28    28 Optimal
07:43:23  *31  *33  *32  *32  ^33  32.00 -1.57  34   400  3500   300   300   ---    28    28 Optimal
07:44:24  *31  *33  *32  *32  ^33  32.00 -1.57  34   400  3500   300   300   ---    28    28 Optimal
07:45:25  *31  *33  *32  *32  ^33  32.00 -1.57  34   400  3500   300   300   ---    28    28 Optimal
07:46:26  *31  *33  *32  *32  ^33  32.00 -1.57  35   400  3500   300   300   ---    28    28 Optimal
07:47:27  *31  *33  *32  *32  ^33  32.00 -1.57  35   400  3500   300   300   ---    28    28 Optimal
07:48:28  *31  *33  *32  *32  ^33  32.00 -1.57  34   400  3500   300   300   ---    28    28 Optimal
07:49:29  *31  *33  *32  *32  ^33  32.00 -1.57  35   400  3500   300   300   ---    28    28 Optimal
07:50:30  *31  *33  *32  *32  ^33  32.00 -1.57  35   400  3500   300   300   ---    28    28 Optimal
07:51:31  *31  *33  *32  *32  ^33  32.00 -1.57  35   400  3500   300   300   ---    28    28 Optimal
07:52:32  *31  *33  *32  *31  ^33  31.75 -1.82  35   400  3400   300   300   ---    26    26 Optimal
07:53:33  *31  *33  *32  *31  ^33  31.75 -1.82  35   400  3300   300   300   ---    26    26 Optimal
07:54:33  *31  *32  *32  *31  ^32  31.50 -2.07  35   400  3400   300   300   ---    26    26 Optimal
07:55:34  *31  *32  *32  *31  ^32  31.50 -2.07  34   400  3300   300   300   ---    26    26 Optimal
07:56:35  *31  *32  *32  *31  ^32  31.50 -2.07  34   400  3400   300   300   ---    26    26 Optimal
07:57:36  *31  *32  *32  *31  ^32  31.50 -2.07  35   400  3300   300   300   ---    26    26 Optimal
07:58:37  *31  *32  *32  *31  ^32  31.50 -2.07  35   400  3400   300   300   ---    26    26 Optimal
07:59:38  *31  *32  *32  *31  ^32  31.50 -2.07  35   400  3300   300   300   ---    26    26 Optimal
08:00:39  *30  *32  *32  *31  ^32  31.25 -2.32  34   400  3400   300   ---   ---   100   100 Optimal
08:01:39  *30  *32  *31  *31  ^32  31.00 -2.57  28  1300  6800  1200  1200   ---   100   100 Optimal
08:02:40  *30  *32  *31  *31  ^32  31.00 -2.57  26  1300  6800  1200  1200   ---   100   100 Optimal
08:03:41  *29  *31  *30  *30  ^31  30.00 -3.57  25  1300  6800  1200  1200   ---   100   100 Optimal
08:04:42  *29  *31  *30  *30  ^31  30.00 -3.57  24  1300  6800  1200  1200   ---   100   100 Optimal
08:05:43  *29  *30  *29  *30  ^30  29.50 -4.07  25  1300  6800  1200  1200   ---   100   100 Optimal
08:06:44  *28  *30  *29  *29  ^30  29.00 -4.57  25  1300  6800  1200  1200   ---   100   100 Optimal
08:07:45  *28  *30  *28  *29  ^30  28.75 -4.82  24  1300  6800  1200  1200   ---   100   100 Optimal
08:08:46  *27  *29  *28  *29  ^29  28.25 -5.32  24  1300  6800  1200  1200   ---   100   100 Optimal
08:09:47  *27  *29  *28  *28  ^29  28.00 -5.57  24  1300  6800  1200  1200   ---   100   100 Optimal
08:10:48  *27  *29  *27  *28  ^29  27.75 -5.82  24  1300  6800  1200  1200   ---   100   100 Optimal
08:11:49  *26  *28  *27  *28  ^28  27.25 -6.32  26  1300  6800  1200  1200   ---   100   100 Optimal
08:12:50  *26  *28  *27  *28  ^28  27.25 -6.32  27  1300  6800  1200  1200   ---   100   100 Optimal
08:13:50  *26  *28  *27  *27  ^28  27.00 -6.57  27  1300  6800  1200  1200   ---   100   100 Optimal
08:14:51  *26  *28  *27  *27  ^28  27.00 -6.57  26  1300  6800  1200  1200   ---   100   100 Optimal
08:15:52  *26  *28  *27  *27  ^28  27.00 -6.57  26  1300  6800  1200  1200   ---   100   100 Optimal
08:16:53  *26  *27  *26  *27  ^27  26.50 -7.07  27  1300  6800  1200  1200   ---   100   100 Optimal
08:17:54  *26  *27  *26  *27  ^27  26.50 -7.07  27  1300  6800  1200  1200   ---   100   100 Optimal
08:18:55  *25  *27  *26  *27  ^27  26.25 -7.32  27  1300  6800  1200  1200   ---   100   100 Optimal
08:19:56  *25  *27  *26  *27  ^27  26.25 -7.32  28  1300  6800  1200  1200   ---   100   100 Optimal
08:20:57  *25  *27  *26  *27  ^27  26.25 -7.32  27  1300  6800  1200  1200   ---   100   100 Optimal
08:21:58  *25  *27  *26  *27  ^27  26.25 -7.32  27  1300  6800  1200  1200   ---   100   100 Optimal
08:22:59  *25  *27  *26  *26  ^27  26.00 -7.57  28  1300  6800  1200  1200   ---   100   100 Optimal
08:23:59  *25  *27  *26  *26  ^27  26.00 -7.57  30  1300  6800  1200  1200   ---   100   100 Optimal
08:25:00  *25  *27  *26  *26  ^27  26.00 -7.57  28  1300  6800  1200  1200   ---   100   100 Optimal
 
Top