PID fan controller Perl script

qubus

Cadet
Joined
Oct 20, 2013
Messages
8
Hi Kevin

Thank you very much for your script. I just took the latest from Github yesterday but had to do some major debugging to get it to work.
In the end I found out that the default values do not get set:

These don't get called and saved into the working variables:
Code:
$default_hd_ave_target = 38;         # PID control loop will target this average temperature for the warmest N disks
$default_Kp = 16/3;                  # PID control loop proportional gain
$default_Ki = 0;                     # PID control loop integral gain
$default_Kd = 24;                    # PID control loop derivative gain
$default_hd_num_peak = 2;            # Number of warmest HDs to use when calculating average temp


The the beginning I had an div/0 error which was due to this line (592):
Code:
my $ave_temp = $temp_sum / $hd_num_peak;


Taking out the "default_" took care of it.
 
Joined
Dec 2, 2015
Messages
730
Hi Kevin

Thank you very much for your script. I just took the latest from Github yesterday but had to do some major debugging to get it to work.
In the end I found out that the default values do not get set:

These don't get called and saved into the working variables:
Code:
$default_hd_ave_target = 38;         # PID control loop will target this average temperature for the warmest N disks
$default_Kp = 16/3;                  # PID control loop proportional gain
$default_Ki = 0;                     # PID control loop integral gain
$default_Kd = 24;                    # PID control loop derivative gain
$default_hd_num_peak = 2;            # Number of warmest HDs to use when calculating average temp


The the beginning I had an div/0 error which was due to this line (592):
Code:
my $ave_temp = $temp_sum / $hd_num_peak;


Taking out the "default_" took care of it.
OH - Thanks for reporting this. I fixed the error, tested it, and committed it.
 

thepixelgeek

Patron
Joined
Jan 21, 2016
Messages
271
@Kevin Horton Does that affect npeak branch? Should I update my script?
 
Joined
Dec 2, 2015
Messages
730
@Kevin Horton Does that affect npeak branch? Should I update my script?
npeak branch has been rolled into the Master branch, so it will disappear if you update. And, I need to make some changes to the names of the config files, and the instructions.

Today's change only affects the way the script finds the various settings. It does not change how the script controls the fans to achieve a desired HD temp.

I'm a big believer in "If it isn't broke, don't try to fix it". If you are happy with the way your script is performing, don't update it.
 

thepixelgeek

Patron
Joined
Jan 21, 2016
Messages
271
What's the best/easiest way to update the script?
 
Joined
Dec 2, 2015
Messages
730
What's the best/easiest way to update the script?
If you used git to get it, use git to pull down any desired updates. If you downloaded it manually from GitHub, manually download any updates.

At the moment, I haven't made any changes that are worth updating, if your current setup is working as desired.
 

KevDog

Patron
Joined
Nov 26, 2016
Messages
462
Ok Kevin sorry to open an older thread but I'm having issues. The main issue is that I added two new fans into the case: https://www.amazon.com/gp/product/B00KFCRATC/ref=ppx_yo_dt_b_asin_title_o02_s00?ie=UTF8&psc=1. These unfortunately are heavy duty fans (and they sound like it) are they are extremely LOUD!!. The docs say they can run from 750RPM to 3000RPM.

My setup is a Supermicro X11CF Mobo with standard Intel CPU cooling fan on header FANA.
Case cooling fans are hooked to headers FAN1-FAN4.
On each FAN(1-4) header I have a 2 way splitter. Focusing for the moment on the FAN2 header - on the splitter - the 4 pin PWM is connected to the Noctura fan and the other to another Enermax fan.

Threshold were set using ipmitool and are the following:
Code:
FAN1             | 1200.000   | RPM        | ok    | 300.000   | 400.000   | 500.000   | 2600.000  | 2800.000  | 3000.000
FAN2             | 2800.000   | RPM        | cr    | 800.000   | 900.000   | 1000.000  | 2600.000  | 2800.000  | 3000.000
FAN3             | 1500.000   | RPM        | ok    | 300.000   | 400.000   | 500.000   | 2500.000  | 2200.000  | 2000.000
FAN4             | 1500.000   | RPM        | ok    | 300.000   | 400.000   | 500.000   | 2500.000  | 2200.000  | 2000.000
FANA             | 2000.000   | RPM        | ok    | 300.000   | 700.000   | 900.000   | 2600.000  | 2800.000  | 3000.000


You can see the FAN2 header is running at 2800 RPM which is extremely loud.

Using your script from github, I have the following fan settings:

Code:
## FAN SPEEDS
## You need to determine the actual max fan speeds that are achieved by the fans
## Connected to the cpu_fan_header and the hd_fan_header.
## These values are used to verify high/low fan speeds and trigger a BMC reset if necessary.
$cpu_max_fan_speed    = 3300;
$hd_max_fan_speed     = 1800;


## CPU FAN DUTY LEVELS
## These levels are used to control the CPU fans
$fan_duty_low          =  30;

## HD FAN DUTY LEVELS
## These levels are used to control the HD fans
$hd_fan_duty_high      = 100;    # percentage on, ie 100% is full speed.
$hd_fan_duty_med_high  =  80;
$hd_fan_duty_med_low   =  50;
$hd_fan_duty_low       =  30;    # some 120mm fans stall below 30.

## FAN ZONES
# Your CPU/case fans should probably be connected to the main fan sockets, which are in fan zone zero
# Your HD fans should be connected to FANA which is in Zone 1
# You could switch the CPU/HD fans around, as long as you change the zones and fan header configurations.
#
# 0 = FAN1..5
# 1 = FANA..FANC
$cpu_fan_zone = 1;
$hd_fan_zone  = 0;


## FAN HEADERS
## these are the fan headers which are used to verify the fan zone is high. FAN1+ are all in Zone 0, FANA is Zone 1.
## cpu_fan_header should be in the cpu_fan_zone
## hd_fan_header should be in the hd_fan_zone
$cpu_fan_header = "FANA";                 # used for printing to standard output for debugging
$hd_fan_header  = "FAN2";                 # used for printing to standard output for debugging
@hd_fan_list = ("FAN1", "FAN3", "FAN4");  # used for logging to file


################


I set the the CPU zone for zone 1 = FANA and the rest for zone 0.
hd_fan_header = "FAN2" which is the 2800RPM Noctura fan zone
I set maximum hd fan speed to 1800 RPM to quiet down the fans since 2800 RPM is terribly loud.

In running your script I get the following:
Code:
2020-02-12 12:47:52: CPU Temp: 31.0
2020-02-12 12:47:52: CPU Fan: low
2020-02-12 12:47:53: fan_speed = 2000
2020-02-12 12:47:53: CPU Fan speed: 2000 RPM
2020-02-12 12:47:53: fan_speed = 2800
2020-02-12 12:47:53: HD Fan speed: 2800 RPM
2020-02-12 12:47:53: HD fan speed should be low, but 2800 > 1440.
2020-02-12 12:47:53: bmc_fail_count:  2, bmc_fail_threshold: 1
2020-02-12 12:47:53: Fan speeds are still not where they should be after 2 attempts, will reboot BMC.
2020-02-12 12:47:53: fanmode: full = 1


In essence - it can't set the FAN2 header lower so it reboots BMC

On reboot, the process unfortunately repeats

ipmitool sdr readings:

Code:
FAN1             | 1200 RPM          | ok
FAN2             | 2800 RPM          | cr
FAN3             | 1500 RPM          | ok
FAN4             | 1500 RPM          | ok
FANA             | 2000 RPM          | ok


The script tries to lower FAN2 to 1440, it can't. Tries a few more times then reboots BMC.
Rinse, wash and repeat.

Just for a test (Since FAN2 is technically in Zone 0), I tried to set the Zone 0 Fans to 25% speed:
Code:
ipmitool raw 0x30 0x70 0x66 0x01 0x00 0x16


What will happen is that they will lower down for maybe 1 sec before they spin back up to speeds listed above. It's like the BMC won't let me set the fan speeds without intervening. Same thing happens with your script - fan speeds lower for like 1 sec and then they are spun back up again. Changes don't seem to stick.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
I think the notes in the script may give some clues on what to do next... (see the line about upper and lower thresholds being in line with the fans used)
Code:
# If the fans should be high, and they are stuck low, or vice-versa, the BMC will be rebooted, thus it is critical to set the
# cpu/hd_max_fan_speed variables correctly.

###############################################################################################

# The IPMI fan lower and upper fan speed thresholds must be adjusted to be compatible with the fans used.  Do not rely
# completely on manufacturer specs to determine the slowest and fastest possible fan speeds, as some fans have been found
# to run at speeds that differ somewhat from the official specs.  See:
# https://forums.freenas.org/index.php?resources/how-to-change-ipmi-sensor-thresholds-using-ipmitool.35/

# The following ipmitool commands can be run when connected to the FreeNAS server via ssh.  They are useful to set a desired fan duty cycle before
# checking the fan speeds.

# Set duty cycle in Zone 0 to 100%: ipmitool raw 0x30 0x70 0x66 0x01 0x00 100
# Set duty cycle in Zone 0 to  50%: ipmitool raw 0x30 0x70 0x66 0x01 0x00 50
# Set duty cycle in Zone 0 to  20%: ipmitool raw 0x30 0x70 0x66 0x01 0x00 20

# Set duty cycle in Zone 1 to 100%: ipmitool raw 0x30 0x70 0x66 0x01 0x01 100
# Set duty cycle in Zone 1 to  50%: ipmitool raw 0x30 0x70 0x66 0x01 0x01 50
# Set duty cycle in Zone 1 to  20%: ipmitool raw 0x30 0x70 0x66 0x01 0x01 20

# Check duty cycle in Zone 0:                   ipmitool raw 0x30 0x70 0x66 0x00 0x00
# result is hex, with 64 being 100% duty cycle.  32 is 50% duty cycle.  14 is 20% duty cycle.

#  Check duty cycle in Zone 1:                  ipmitool raw 0x30 0x70 0x66 0x00 0x01
# result is hex, with 64 being 100% duty cycle.  32 is 50% duty cycle.  14 is 20% duty cycle.

# Check fan speeds using: ipmitool sdr


I think perhaps you need to look again at how you connect... you need all Zone1 to be in the same domain (numbers or letters) and all Zone0 to be in the same domain...
## these are the fan headers which are used to verify the fan zone is high. FAN1+ are all in Zone 0, FANA is Zone 1.

Also, I think trying to play with the max speed setting to get them to quiet down isn't the right approach... set the max to what it is, but then set your temperature targets appropriately to make sure the script never decides to use 100%.
 

KevDog

Patron
Joined
Nov 26, 2016
Messages
462
Hey I'm really sorry to bother you on this -- I played with this script before about a year ago and everything seemed to function correctly.

It's just these new fans are driving me bonkers.

All Fans are in Zone 0 (these are the header labeled 1-4)
Zone 1 (header FANA is connected to CPU)

If I manually enter (bring duty cycle down to 20% Zone 0):
ipmitool raw 0x30 0x70 0x66 0x01 0x00 20

The fans will spin down but immediately spin back up. (within 1 second they are back up). I don't remember this happening last year. If I set the duty cycle the fans would obey the duty cycle for awhile.

I'm not running the script at the current time. I simply entered the above command so the BMC has full control. I've rebooted the BMC many times but it's the same thing -- fans immediately spin back up (after full reboot). Only times fan spin down to a reasonable speed is during the reboot and initiation process of the BMC. The script isn't going to work if the BMC (or whatever process) immediately changes the command line arguments -- since basically the script uses the same command line language to adjust the duty cycle.

It also doesn't matter what I set for the upper thresholds for FAN2. FAN2 always runs at 2800 RPM (except for 1 second when changing the duty cycle or during a BMC reboot). The only thing I perceive the threshold are doing is setting parameters when the BMC should be rebooted.

FAN1 | 1200 RPM | ok
FAN2 | 2800 RPM | nr
FAN3 | 1400 RPM | ok
FAN4 | 1500 RPM | ok
FANA | 2000 RPM | ok
 
Joined
Dec 2, 2015
Messages
730
I've been on the road for the last two days, and have three more days to go, with limited internet access. Please excuse the delayed response.

With fan splitters, it is never clear which fan rpm is being reported. You need to set the upper and lower fan speed thresholds such that none of the fans on that header could ever trip the upper or lower thresholds. Ignoring the other fan on the FAN2 header, the specs say your new fan could go as slow as 750 rpm, but the lower nonrecoverable threshold is set to 800, and the lower critical threshold is 900 rpm. If that fan ever goes slower than 900 rpm this will trigger a BMC reset. So, lower the thresholds to something well below any possible rpm. I have those fans, and I set my thresholds to:

Code:
FANB             | 700.000    | RPM        | ok    | 100.000   | 100.000   | 100.000   | 3100.000  | 3200.000  | 3400.000  


I.e. , all lower thresholds are at 100 rpm. This would reset the BMC if a fan ever stalls and stops, but any other likely slow speed is acceptable.
 

KevDog

Patron
Joined
Nov 26, 2016
Messages
462
Ok I've done what you said, but I'm still at 2800 RPM and I really cant change duty cycle for more than about one second: You're at 700 RPM. I wish that were the case here.

Code:
FAN1             | 1200.000   | RPM        | ok    | 300.000   | 400.000   | 500.000   | 2600.000  | 2800.000  | 3000.000
FAN2             | 2800.000   | RPM        | ok    | 100.000   | 100.000   | 100.000   | 3100.000  | 3200.000  | 3400.000
FAN3             | 1400.000   | RPM        | ok    | 300.000   | 400.000   | 500.000   | 2500.000  | 2200.000  | 2000.000
FAN4             | 1500.000   | RPM        | ok    | 300.000   | 400.000   | 500.000   | 2500.000  | 2200.000  | 2000.000
FANA             | 2000.000   | RPM        | ok    | 300.000   | 700.000   | 900.000   | 2600.000  | 2800.000  | 3000.000
 
Joined
Dec 2, 2015
Messages
730
Ok I've done what you said, but I'm still at 2800 RPM and I really can't change duty cycle for more than about one second: You're at 700 RPM. I wish that were the case here.

Code:
FAN1             | 1200.000   | RPM        | ok    | 300.000   | 400.000   | 500.000   | 2600.000  | 2800.000  | 3000.000
FAN2             | 2800.000   | RPM        | ok    | 100.000   | 100.000   | 100.000   | 3100.000  | 3200.000  | 3400.000
FAN3             | 1400.000   | RPM        | ok    | 300.000   | 400.000   | 500.000   | 2500.000  | 2200.000  | 2000.000
FAN4             | 1500.000   | RPM        | ok    | 300.000   | 400.000   | 500.000   | 2500.000  | 2200.000  | 2000.000
FANA             | 2000.000   | RPM        | ok    | 300.000   | 700.000   | 900.000   | 2600.000  | 2800.000  | 3000.000
I doubt this is the root cause, but FAN3 and FAN4 have disordered high thresholds. 2500 2200 2000. They should be revised to be in an increasing order, not decreasing.

What fan models are connected to the splitters on each fan header? What are the min and max speed specs for these fans?
 

Xyrgh

Explorer
Joined
Apr 11, 2016
Messages
69
Is anyone getting issues with the script with 11.3-U1?

My FreeNAS system was running fine with this script in 11.2 as a post init task. Now I'm getting a traceback error:

Code:
* Failed to check for alert IPMISEL:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/middlewared/plugins/alert.py", line 660, in __run_source
alerts = (await alert_source.check()) or []
File "/usr/local/lib/python3.7/site-packages/middlewared/plugins/../alert/source/ipmi_sel.py", line 130, in check
(await run(["ipmitool", "-c", "sel", "elist"], encoding="utf8")).stdout)
File "/usr/local/lib/python3.7/site-packages/middlewared/utils/__init__.py", line 109, in run
cp.check_returncode()
File "/usr/local/lib/python3.7/subprocess.py", line 444, in check_returncode
self.stderr)
subprocess.CalledProcessError: Command '('ipmitool', '-c', 'sel', 'elist')' returned non-zero exit status 1.


This is on a Supermicro X11SSL-F board. Immediately killing the script stops the errors.

Anyone have any ideas?
 

linus12

Explorer
Joined
Oct 12, 2018
Messages
65
Hi Keven, I hope I'm duplicating anything here, but I think the fix you made for the default values wasn't complete. If you don't have a config file it works, and if you have one with every value defined it works.

But if the config file becomes corrupt or an admin fumble fingers it and deletes a value (or worse changes the name of the variable in the config file) then the value becomes null again causing the problem that previously reported.

I created an issue in git hub, but thought I'd document it here, just to be sure you saw it ;)
 
Joined
Dec 2, 2015
Messages
730
Hi Keven, I hope I'm duplicating anything here, but I think the fix you made for the default values wasn't complete. If you don't have a config file it works, and if you have one with every value defined it works.

But if the config file becomes corrupt or an admin fumble fingers it and deletes a value (or worse changes the name of the variable in the config file) then the value becomes null again causing the problem that previously reported.

I created an issue in git hub, but thought I'd document it here, just to be sure you saw it ;)
Thanks for reporting this. I'll look into the bug. I did get an email from GitHub when you created the issue, so that works at least. My work schedule is somewhat disrupted due to the COVID-19 situation, so it may take a few days before I have much free time to dig into this.
 

linus12

Explorer
Joined
Oct 12, 2018
Messages
65
....My work schedule is somewhat disrupted due to the COVID-19 situation, so it may take a few days before I have much free time to dig into this.
I understand completely! Sort of in the same boat myself!
 

Gilgameshxg

Dabbler
Joined
Nov 26, 2018
Messages
37
Sorry if you explained this before but I wanted to see what all the other scripts were for on the github? I beleive I have the PID_fan_control.pl script working fine but I don't understand the switch, config_test and #d_config files. It seems like the PID script calls for them and you would pick 1 out of the number and just change the name to what the script is acalling for but i just don't understand what their for or which one I would need. Thanks for the script it's awsome so far.
 

treefrob

Cadet
Joined
Sep 25, 2018
Messages
9
Kevin,
thanks very much for providing this script.
I am in the process of cleaning it up some:
  • declaring global and local variables explicitly
  • reformatting comments and some code to get it to fit on a screen less than 219 characters wide
  • getting rid of all the calls to external programs such as [e]grep, awk, sed and using pure Perl
  • simplifying where possible (for example removing all the newlines from the calls to dprint() and adding one in the dprint() function)
  • in general, getting it to pass a basic syntax check with "use strict;" and "perl -w"

I found some minor bugs and repaired them. I'm not sure how to fix one in particular. On line 897 of the code I checked out (a week or so ago), in calculate_hd_fan_duty_cycle_PID(), there is a reference to the variable "$hd_temp", which is unknown:
Code:
    894     else
    895     {
    896         $hd_duty = 100;
    897         dprint( 0, "Drive temperature ($hd_temp) invalid. going to 100%\n");
    898     }


It's just a dprint(), but perhaps you can tell me what this should be.

Also, is this call on line 986 to sprintf() used simply to convert a float to an integer?
Code:
my $ave_speed = sprintf("%i", $speed_sum / $fan_count);


I am planning to use the script to control the fans of an ASRock Rack X470D4U2-2T, which requires some changes to the logic. In particular, the ASRock MB does not know anything about fan modes ("full", "optimal", etc). The duty cycle can be set for each fan between 1% and 100% (in practice 30% is the lowest setting at which the fans still reliably spin), so I guess this is comparable to only having the Supermicro "optimal" mode. Also, it simply has FAN1 through FAN6, and the CPU fan is connected to FAN1, i.e., no expllicit zones.

Some questions:
  1. Are you interested in getting the cleaned up version of your script before I start making the logic changes? A *lot* has changed.
  2. I can try to add the ASRock logic such that one or the other behavior can be chosen, but only if you have a strong interest in integrating my changes into your repo. Otherwise I will just strip out the stuff that is Supermicro specific. Please let me know. Doing the former would probably mean having a MB-specific set_fan_mode() function, and the ASRock version would be a no-op.
Rob
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
Are you interested in getting the cleaned up version of your script before I start making the logic changes? A *lot* has changed.
At very least, I am. I have already hacked the script to handle the Open Corsair Link range of devices, but my perl skills are practically 0, so I'm sure it's shamefully poorly done.
I can try to add the ASRock logic such that one or the other behavior can be chosen, but only if you have a strong interest in integrating my changes into your repo. Otherwise I will just strip out the stuff that is Supermicro specific. Please let me know. Doing the former would probably mean having a MB-specific set_fan_mode() function, and the ASRock version would be a no-op.
I would love to see that bit as I think I would adapt the command, but use the logic to handle the Corsair Commmandeer Pro's 6 fans.
I would also like to see the ability to use the Aqaero (https://github.com/Aquaeronix/aerocli), I guess a bit more complex though as it indicates svery setting of the fans requires setting all fans that need a setting or not mentioned devices are stopped.

You can see the hybrid_fan_controller2.pl in my fork of the repo.

I would also love to re-use the output from each run of the command to output the fan speeds and sensor values to an influxdb for graphing. I have a separate script for that at the moment which seems to cause bad readings when too many queries are run simultaneously in quick succession and anyway, it's wasteful if it's being run every 90 seconds as part of the main loop. It would be great to just have the option to specify an influxdb server with a few parameters and have the script push to it if a flag is set. See influx.pl in my fork for the code to do that... again, super-inelegant and not efficient, I'm sure
It is then perfect for graphing with Grafana like this:
1591180671728.png
 
Last edited:
Top