Fan Scripts for Supermicro Boards Using PID Logic

Fan Scripts for Supermicro Boards Using PID Logic 2020-08-20, previous one was missing a file

lmannyr

Contributor
Joined
Oct 11, 2015
Messages
198
Here is a run with "How Duty" set to 0. Seems to be working fine. FYI... FAN2 and FAN3 took a dump recently. 3rd one in the last 3 weeks. I guess they all die at about the same age. The replacement fans will arrive today.

Code:
****** SETTINGS ******
CPU zone 1; Peripheral zone 0
CPU fans min/max duty cycle: 15/100
PER fans min/max duty cycle: 15/100
CPU fans - measured RPMs at 30% and 100% duty cycle: 500/1500
PER fans - measured RPMs at 30% and 100% duty cycle: 900/2500
Drive temperature setpoint (C): 35
Kp=4, Kd=40
Drive check interval (main cycle; minutes): 5
CPU check interval (seconds): 3
CPU reference temperature (C): 40
CPU scalar: 6
Assuming fan duty as set
Getting CPU temperatures via sysctl

Key to drive status symbols:  * spinning;  _ standby;  ? unknown                              Version 2020-08-20

Wednesday, Oct 28                                                                            CPU         New_Fan%  New_RPM_____________________
          da0  da1  da2  da3  da4  da5  da6  da7  ada0 ada1 Tmax Tmean   ERRc      P      D TEMP MODE    CPU PER   FANA  FAN1  FAN2  FAN3  FAN4
08:43:19  *32  *32  *33  *35  *37  *37  *35  *38  *39  *36  ^39  35.40   0.40   1.60   3.20   32 Full     50  55    800  1600   ---   ---  1800
08:48:32  *32  *31  *32  *35  *37  *36  *35  *38  *39  *35  ^39  35.00   0.00   0.00  -3.20   34 Full     15  52    200  1500   ---   ---  1700   
08:53:47  *32  *31  *32  *35  *37  *36  *34  *38  *39  *35  ^39  34.90  -0.10  -0.40  -0.80   33 Full     15  51    200  1500   ---   ---  1700
08:59:01  *32  *31  *32  *34  *37  *36  *34  *38  *38  *35  ^38  34.70  -0.30  -1.20  -1.60   34 Full     15  48    200  1400   ---   ---  1600
09:04:16  *32  *31  *32  *34  *37  *36  *34  *38  *38  *35  ^38  34.70  -0.30  -1.20   0.00   34 Full     15  47    200  1400   ---   ---  1500
09:09:29  *32  *31  *32  *34  *37  *36  *34  *38  *38  *35  ^38  34.70  -0.30  -1.20   0.00   33 Full     15  46    200  1400   ---   ---  1500
09:14:45  *32  *31  *32  *34  *37  *36  *34  *38  *38  *35  ^38  34.70  -0.30  -1.20   0.00   34 Full     15  45    200  1300   ---   ---  1500
09:20:00  *32  *31  *32  *35  *37  *36  *34  *38  *38  *35  ^38  34.80  -0.20  -0.80   0.80   34 Full     15  45    200  1300   ---   ---  1500
09:25:14  *32  *31  *32  *35  *37  *36  *34  *38  *38  *35  ^38  34.80  -0.20  -0.80   0.00   34 Full     15  44    200  1300   ---   ---  1500
09:30:27  *32  *31  *32  *35  *37  *36  *34  *38  *38  *35  ^38  34.80  -0.20  -0.80   0.00   34 Full     15  43    200  1300   ---   ---  1400
09:35:42  *32  *31  *32  *35  *37  *36  *35  *38  *38  *35  ^38  34.90  -0.10  -0.40   0.80   33 Full     15  43    200  1300   ---   ---  1400
09:40:56  *32  *31  *32  *35  *37  *36  *35  *38  *39  *35  ^39  35.00   0.00   0.00   0.80   33 Full     15  44    200  1300   ---   ---  1500
09:46:10  *32  *31  *32  *35  *37  *36  *35  *38  *39  *35  ^39  35.00   0.00   0.00   0.00   35 Full     15  44    200  1300   ---   ---  1500
09:51:23  *32  *31  *32  *35  *37  *36  *35  *38  *39  *35  ^39  35.00   0.00   0.00   0.00   37 Full     15  44    200  1300   ---   ---  1500
09:56:37  *32  *32  *32  *35  *37  *36  *35  *38  *39  *35  ^39  35.10   0.10   0.40   0.80   34 Full     15  45    200  1300   ---   ---  1500
10:01:53  *32  *32  *32  *35  *37  *36  *35  *38  *39  *35  ^39  35.10   0.10   0.40   0.00   35 Full     15  45    200  1300   ---   ---  1500
10:07:07  *32  *32  *32  *35  *37  *36  *35  *38  *39  *35  ^39  35.10   0.10   0.40   0.00   34 Full     15  45    200  1300   ---   ---  1500
10:12:21  *32  *32  *32  *35  *37  *36  *35  *38  *39  *36  ^39  35.20   0.20   0.80   0.80   34 Full     15  47    200  1400   ---   ---  1500
10:17:35  *32  *32  *32  *35  *37  *36  *35  *38  *39  *36  ^39  35.20   0.20   0.80   0.00   35 Full     15  48    200  1400   ---   ---  1600
10:22:49  *32  *32  *32  *35  *37  *36  *35  *38  *39  *36  ^39  35.20   0.20   0.80   0.00   34 Full     15  49    200  1400   ---   ---  1600
10:28:03  *32  *32  *32  *35  *37  *36  *35  *38  *39  *36  ^39  35.20   0.20   0.80   0.00   39 Full     15  50    200  1500   ---   ---  1600
10:33:18  *32  *32  *32  *35  *37  *36  *35  *38  *39  *36  ^39  35.20   0.20   0.80   0.00   34 Full     15  51    200  1500   ---   ---  1700
10:38:32  *32  *32  *32  *35  *37  *36  *35  *38  *39  *36  ^39  35.20   0.20   0.80   0.00   34 Full     15  52    200  1500   ---   ---  1700
10:43:46  *32  *32  *32  *35  *37  *36  *35  *38  *39  *35  ^39  35.10   0.10   0.40  -0.80   36 Full     15  52    200  1500   ---   ---  1700
10:49:00  *32  *32  *32  *35  *37  *36  *35  *38  *39  *35  ^39  35.10   0.10   0.40   0.00   34 Full     15  52    200  1500   ---   ---  1700
10:54:14  *32  *32  *32  *35  *37  *36  *35  *38  *39  *35  ^39  35.10   0.10   0.40   0.00   34 Full     15  52    200  1500   ---   ---  1700
10:59:28  *32  *32  *32  *35  *37  *36  *35  *38  *39  *35  ^39  35.10   0.10   0.40   0.00   35 Full     15  52    200  1500   ---   ---  1700
11:04:43  *32  *32  *32  *35  *37  *36  *35  *38  *39  *35  ^39  35.10   0.10   0.40   0.00   35 Full     15  52    200  1500   ---   ---  1700
11:09:58  *32  *32  *32  *35  *37  *36  *35  *38  *39  *35  ^39  35.10   0.10   0.40   0.00   35 Full     15  52    200  1500   ---   ---  1700
11:15:13  *32  *32  *32  *35  *37  *36  *35  *38  *39  *36  ^39  35.20   0.20   0.80   0.80   35 Full     15  54    200  1600   ---   ---  1700
11:20:27  *32  *32  *32  *35  *37  *36  *35  *38  *39  *36  ^39  35.20   0.20   0.80   0.00   35 Full     15  55    200  1600   ---   ---  1800
11:25:42  *32  *32  *32  *35  *37  *36  *35  *38  *39  *36  ^39  35.20   0.20   0.80   0.00   34 Full     15  56    200  1600   ---   ---  1800
11:30:57  *32  *32  *32  *35  *37  *36  *35  *38  *39  *36  ^39  35.20   0.20   0.80   0.00   35 Full     15  57    200  1600   ---   ---  1800
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
Hello!

A few years down the line I've finally come around to put your scripts to work, yet with errors:

I am running, although I get two errors:
Code:
/spincheck.sh: line 133: let: Tsum += : syntax error: operand expected (error token is "+= ")

Code:
./spinpid2.sh: line 185: let: Tsum += : syntax error: operand expected (error token is "+= ")


any ideas on how to sort these out?

EDIT:
I don't understand code well - here's my effort:
My system is using SSD's. That might be a clue.

Furthermore:
Code:
cat /var/tempfile
smartctl 7.1 2019-12-30 r5022 [FreeBSD 12.2-RC3 amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

Device is in SLEEP mode, exit(2)
[
/CODE]
 
Last edited:

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Yes, that's a good clue. The scripts are really designed for spinning drives. That said, if smartctl returned 2, it should show as in standby ("_") and the arithmetic not get done. I don't know why it is still trying to do the math.

In spincheck.sh, please insert this after line 110, which has BIT1=$(($RETURN & 2)):
Code:
   # DIAGNOSTIC variables - uncomment for troubleshooting:
   printf "\n\n DEVID=%s \n RETURN=%s, BIT0=%s, BIT1=%s " "${DEVID:---}" "${RETURN:---}" "${BIT0:---}" 

Then, after what will now be line 134, which has only fi:
Code:
   # DIAGNOSTIC variables - uncomment for troubleshooting:
   printf "\n TEMP=%s, Tsum=%s " "${TEMP:---}" "${Tsum:---}"

Then run for one cycle and lets see what it prints.

In any case, it looks like it is not getting a temperature, and that's why the math fails. Since your disk says it is sleeping, it's not giving any temperature.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
Hello,
Sorry for the delay.

I managed to somehow browse through the code and make adaptations to work for my application (all SSD's). See below:

There were a few hurdles to overcome.
First getting /var/tempfile populated with my drivedata.
In this chunk of code in the spinpid2.sh I removed an entry of "SSD" - which was present in the output of SmartData from the Samsung 850EVO drives I'm using. This change allowed the script to populate the tempfile.
Code:
375 DEVLIST="$(echo "$DEVLIST1"|sed '/KINGSTON/d;/ADATA/d;/SanDisk/d;/OCZ/d;/LSI/d;/EXP/d;/INTEL/d;/TDKMedia/d;/VMware/d;/Enclosure/d;/Card/d;/Flash/d')"
However, it would not gather a proper temperature reading.
To fix that, the SMART output data does reference temperature different on these SSD's than my spinners, and the script.
I simply replaced and entered a proper value after grep.
Code:
178                 TEMP=$( cat /var/tempfile | grep "Airflow_Temperature_Cel" | awk '{print $10}')

However, now the script would return bogus temperature value at the 4th delimitation of awk (0069 something in my case) which caused the script to run rampage, and complain the values are too far off (cool implementation!)
This was fixed by 'bypassing' and printing the same chunk of text again:
Code:
183                 TEMP=$( cat /var/tempfile | grep "Airflow_Temperature_Cel" | awk '{print $10}')

At this point the temperatures populated correctly.

But - fans were soon throwing errors of 'Mismatch between duty cycle and RPM'.
Somehow the fan would report max speed at Duty cycle=94, or not go higher. Or potentially, reach max rpm already at duty cycle 94.
This would lead to repeated BMC resets.

I tried several things. I'm not sure exactly what piece of change that unlocked the last bit of stability..

I tried changing these values to change the 'sensitivity' of the check. Default values are 95 and 25.
Since I did not fully understand what the check really accomplished (is it lower or higher than? I'd really appreciate a "read out loud" of these lines to improve my understanding!) - I tried variations:
Setting value at 90 and 20 did not alleviate the problem.
Since I had witnessed something about duty cycle 94, I tried the opposite direction, 98 and 28.
That is how it stands today.
Code:
280         if [[ (DUTY_CPU -ge 98 && ${!RPM_CPU} -lt RPM_CPU_MAX) || (DUTY_CPU -lt 28 && ${!RPM_CPU} -gt RPM_CPU_30) ]] ; then
284         if [[ (DUTY_PER -ge 98 && ${!RPM_PER} -lt RPM_PER_MAX) || (DUTY_PER -lt 28 && ${!RPM_PER} -gt RPM_PER_30) ]] ; then


At the same time, I changed the fan speed settings in the spinpid2.config file:
Particulary, limiting to max speed at 90 cycles, seemed to have done the trick.
Code:
 46 DUTY_PER_MIN=40
 47 DUTY_PER_MAX=90


After these settings the script is functional, and looking reaaally good at this point.

Cheers!
 

glauco

Guru
Joined
Jan 30, 2017
Messages
526
The above post by @Dice was what I needed to tailor the script to my new all-SSD setup (two mirrored 8TB Samsung QVO 870 SSDs).
In addition to his instructions I halved the default Kp and Kd values. Before that, fans would at times spin unnecessarily fast.
1613770373105.png
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
I recently discovered that for whatever reason, on my machine the sysctl temperature readout (sysctl dev.cpu | grep temperature) is getting stuck and does not respond to CPU temperature as shown correctly by sudo ipmitool sensor | grep CPU. I first noticed that cpu temperature never changed in the logs. The TrueNAS webGUI dashboard showed the same thing; I assume it also uses sysctl since it shows the temp of all cores. Then I tested with a CPU load. sysctl (and the webGUI) showed no change but ipmitool sensor did.

The scripts (and scripts written by others) by default use sysctl if it is available. I suggest anyone using any script that relies on sysctl check its reliability as above. Let me know if you see that fault.

I had to change the script to use ipmitool instead. Both methods are in the script in the function CPU_check_adjust; you just have to comment/uncomment the appropriate code. Just copy out the line
Code:
CPU_TEMP=$($IPMITOOL sensor get "CPU Temp" | awk '/Sensor Reading/ {print $4}')

and paste it in front of the if statement that begins
Code:
if [[ $CPU_TEMP_SYSCTL == 1 ]]; then 

then comment out the whole if statement (through the second fi).
 
Joined
Dec 2, 2015
Messages
730
I recently discovered that for whatever reason, on my machine the sysctl temperature readout (sysctl dev.cpu | grep temperature) is getting stuck and does not respond to CPU temperature as shown correctly by sudo ipmitool sensor | grep CPU. I first noticed that cpu temperature never changed in the logs. The TrueNAS webGUI dashboard showed the same thing; I assume it also uses sysctl since it shows the temp of all cores. Then I tested with a CPU load. sysctl (and the webGUI) showed no change but ipmitool sensor did.
Interesting. I assume you tested sysctl at the command line as well. So far, knock wood, sysctl continues to reliably report CPU temperature on my servers (Supermicro X10SRH-cF with E5 1650-v4 CPU and Supermicro X10SL7-F with G3258 CPU), both running TrueNAS-12.0-U2.1.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Yes, I did test on command line. It's been a while now, but if I recall correctly, rebooting made it work again. It may be unique to my machine for whatever reason.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Does anyone knows a way to get this working on Intel Boards?
ipmitool is the key to using the scripts in their native form.

If you can find the commands to control fans with ipmitool, you can substitute them in the scripts and use it pretty much as-is.

If you can't find the ipmitool commands for your IPMI interface/board (like in my case with an ASUS board where they didn't put the feature into the IPMI (to my great frustration), then you could consider using a Corsair Commander Pro and the script I adapted... see here: https://www.truenas.com/community/t...k-in-a-jail-to-control-fans.71873/post-619377

Happy to help with some guidance there if needed.
 
Last edited:

spacecabbie

Explorer
Joined
Aug 20, 2017
Messages
99
Still happily running this script although recently I experianced a small issue i am utilizing the server more and more and it seems the cpu fan can't keep cpu cool enough Is there a way to insert a if/then that basically does: if cpu_temp >= 60% turn all fans to 100% ?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Is there a way to insert a if/then that basically does: if cpu_temp >= 60% turn all fans to 100% ?
I would refer you to the version of the script worked on by @Kevin Horton based on the variant developed by @Stux , which already has this option built in and can be configured in the settings at the top of the file:

and more directly:
 
Joined
Jan 27, 2020
Messages
577
Is there a way to exclude FAN4 from the CPU duty cycle while leaving the FANA-cycle untouched?

My Supermicro-Board has FAN1-4 and FANA headers, but I want to control the active cooling on my HBA (tiny Noctua fan) independently from CPU and PER cycles, because the tiny fan needs to spin with much more RPM to be efficient, while the cpu and hdd fans are most of time idling and so are the 120mm fans that cools all of it. Mixing different kinds of fan-rpms do not work well with this script I assume.
 

Sparx

Contributor
Joined
Apr 18, 2017
Messages
107
Is this updated for Freenas Scale yet? Pretty sure camcontrol wont work. But what else?
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
I don't really know what TrueNAS Scale is. Someone else will have to modify the script for it if that is necessary.
 

Sparx

Contributor
Joined
Apr 18, 2017
Messages
107
I dont know the roadmap for Truenas Core, but I guess they will shut it down sooner or later? Or will they run together for the foreseeable future? Truenas Scale is based of debian. Its the future. I hope ;D
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I think I have a working version of mine for SCALE... not done a lot of testing with it though... I'll post a link after I check it.

EDIT:
Nope... I had a version working on Debian at one point, but seems not for SCALE in particular.

I think almost everything is OK except that the building of the HD list on startup needs to use a linux alternative to camcontrol devlist (perhaps will need to be something like sfdisk -l.

The disk temps can use smartctl but I think CPU temps will have to be altered to use sensors as well.

EDIT again:
OK, so I worked on it a bit... here's a link to the testing version for SCALE... you still need to think about setting all the relevant settings and putting the .ini file in place to suit your setup.

Theory says the scale version will actually be cross platform and will also work on CORE... much testing not yet done in that regard.

 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I dont know the roadmap for Truenas Core, but I guess they will shut it down sooner or later?
There is no planned sunset/obsolescence for CORE... it will continue to be available as far as the published roadmaps indicate.

SCALE will exist in parallel and will share middleware with CORE, but is not a replacement for it.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Is this updated for Freenas Scale yet? Pretty sure camcontrol wont work. But what else?
OK, so I've had a more serious look at it now and was able to replace camcontrol with sfdisk -l and was able to use only the sensors from lmsensors that describe themselves as "Core...", so should only be using core temps from CPUs unless some device also has Core in its properties somewhere.

You can find a version that I have as yet been unable to test in full (but I have done unit testing on the only bits that I changed) on the link I already provided a couple of posts up.

Theory says that same version (with the right settings provided) can work on either CORE or SCALE. Just make sure you go through and set all the parts you need including selecting the $script_mode for your own fan control option (it supports OpenCorsairLink = ocl, Supermicro X10+ others = supermicro and Asrock X470D4U2-2T = asrock)
 

Rnadyrshin

Cadet
Joined
Feb 21, 2022
Messages
1
Hello Glorious1
Thank you for this script, it works well for me while the disks are spinning!
But when the disks are not spinning, I see errors in the logs of the script:
13:10:00 *34 *34 *32 *31 *32 *34 *34 *34 *34 *35 ?0 ?0 ^35 33.40 -0.17 -0.68 0.80 50 Drives 60/60 Full 1300 60
13:15:15 *34 *33 *31 *31 *31 *34 *33 *33 *33 *34 ?0 ?0 ^34 32.70 -0.87 -3.48 -5.60 44 Drives 60/51 Full 1200 51
13:20:32 *34 _0 *31 *30 *31 *34 *33 *33 *33 *34 ?0 ?0 ^34 32.55 -1.02 -4.08 -1.20 40 Drives 51/46 Full 1000 5
13:25:46 _0 _0 _0 _0 _0 _0 _0 _0 _0 _0 ?0 ?0 ^-- --- -1.02 --- --- 42 Drives 5/5 Full 200 bc: stdin:1: syntax error: < unexpected
5 bc: stdin:1: syntax error: < unexpected
bc: stdin:1: syntax error: < unexpected
bc: stdin:1: syntax error: < unexpected
bc: stdin:1: syntax error: < unexpected

i use a spindown_timer.sh script to stop disks spinning:
https://www.truenas.com/community/resources/hdd-spindown-timer.122/

Can you help fix this?
 
Top