PID fan controller Perl script

Soloam

Contributor
Joined
Feb 14, 2014
Messages
196
Who do you guys start the script? On Boot? Any one with a costume script that checks if the script is running, and if not starts?

Thank You
 

Soloam

Contributor
Joined
Feb 14, 2014
Messages
196
@Kevin Horton I think that something is wrong with the script! From what I see, if a CPU temp goes over "$high_cpu_temp" the fans go "high", and then should go "med" if droped bellow "med_cpu_temp". But from what I see, my fans go from "high" to "med" when temperature goes bellow "$high_cpu_temp", this makes the spinning very spiky. Going Full and low every few secconds:




DEBUG_LOG 2020-06-19 18:45:34: core_temp = 48.0 C
DEBUG_LOG 2020-06-19 18:45:34: core_temp = 47.0 C
DEBUG_LOG 2020-06-19 18:45:34: core_temp = 50.0 C
DEBUG_LOG 2020-06-19 18:45:34: core_temp = 50.0 C
DEBUG_LOG 2020-06-19 18:45:34: core_temp = 56.0 C
DEBUG_LOG 2020-06-19 18:45:34: core_temp = 56.0 C
DEBUG_LOG 2020-06-19 18:45:34: core_temp = 52.0 C
DEBUG_LOG 2020-06-19 18:45:34: core_temp = 52.0 C
DEBUG_LOG 2020-06-19 18:45:34: CPU Temp: 56.0
DEBUG_LOG 2020-06-19 18:45:34: CPU Temp: 56.0 >= 55, CPU Fan going high.
DEBUG_LOG 2020-06-19 18:45:34: CPU Fan: high
DEBUG_LOG 2020-06-19 18:45:34: CPU Fan changing... (high)
DEBUG_LOG 2020-06-19 18:45:34: Setting Zone 0 duty cycle to 100%
DEBUG_LOG 2020-06-19 18:45:35: core_temps:
53.0
53.0
54.0
55.0
53.0
53.0
51.0
55.0
DEBUG_LOG 2020-06-19 18:45:35: core_temp = 53.0 C
DEBUG_LOG 2020-06-19 18:45:35: core_temp = 53.0 C
DEBUG_LOG 2020-06-19 18:45:35: core_temp = 54.0 C
DEBUG_LOG 2020-06-19 18:45:35: core_temp = 55.0 C
DEBUG_LOG 2020-06-19 18:45:35: core_temp = 53.0 C
DEBUG_LOG 2020-06-19 18:45:35: core_temp = 53.0 C
DEBUG_LOG 2020-06-19 18:45:35: core_temp = 51.0 C
DEBUG_LOG 2020-06-19 18:45:35: core_temp = 55.0 C
DEBUG_LOG 2020-06-19 18:45:35: CPU Temp: 55.0
DEBUG_LOG 2020-06-19 18:45:35: CPU Fan: high
DEBUG_LOG 2020-06-19 18:45:36: core_temps:
52.0
52.0
52.0
52.0
57.0
57.0
54.0
54.0
DEBUG_LOG 2020-06-19 18:45:36: core_temp = 52.0 C
DEBUG_LOG 2020-06-19 18:45:36: core_temp = 52.0 C
DEBUG_LOG 2020-06-19 18:45:36: core_temp = 52.0 C
DEBUG_LOG 2020-06-19 18:45:36: core_temp = 52.0 C
DEBUG_LOG 2020-06-19 18:45:36: core_temp = 57.0 C
DEBUG_LOG 2020-06-19 18:45:36: core_temp = 57.0 C
DEBUG_LOG 2020-06-19 18:45:36: core_temp = 54.0 C
DEBUG_LOG 2020-06-19 18:45:36: core_temp = 54.0 C
DEBUG_LOG 2020-06-19 18:45:36: CPU Temp: 57.0
DEBUG_LOG 2020-06-19 18:45:36: CPU Fan: high
DEBUG_LOG 2020-06-19 18:45:39: core_temps:
52.0
52.0
53.0
53.0
60.0
59.0
55.0
55.0
DEBUG_LOG 2020-06-19 18:45:39: core_temp = 52.0 C
DEBUG_LOG 2020-06-19 18:45:39: core_temp = 52.0 C
DEBUG_LOG 2020-06-19 18:45:39: core_temp = 53.0 C
DEBUG_LOG 2020-06-19 18:45:39: core_temp = 53.0 C
DEBUG_LOG 2020-06-19 18:45:39: core_temp = 60.0 C
DEBUG_LOG 2020-06-19 18:45:39: core_temp = 59.0 C
DEBUG_LOG 2020-06-19 18:45:39: core_temp = 55.0 C
DEBUG_LOG 2020-06-19 18:45:39: core_temp = 55.0 C
DEBUG_LOG 2020-06-19 18:45:39: CPU Temp: 60.0
DEBUG_LOG 2020-06-19 18:45:39: CPU Fan: high
DEBUG_LOG 2020-06-19 18:45:40: core_temps:
52.0
53.0
51.0
54.0
54.0
54.0
54.0
54.0
DEBUG_LOG 2020-06-19 18:45:40: core_temp = 52.0 C
DEBUG_LOG 2020-06-19 18:45:40: core_temp = 53.0 C
DEBUG_LOG 2020-06-19 18:45:40: core_temp = 51.0 C
DEBUG_LOG 2020-06-19 18:45:40: core_temp = 54.0 C
DEBUG_LOG 2020-06-19 18:45:40: core_temp = 54.0 C
DEBUG_LOG 2020-06-19 18:45:40: core_temp = 54.0 C
DEBUG_LOG 2020-06-19 18:45:40: core_temp = 54.0 C
DEBUG_LOG 2020-06-19 18:45:40: core_temp = 54.0 C
DEBUG_LOG 2020-06-19 18:45:40: CPU Temp: 54.0
DEBUG_LOG 2020-06-19 18:45:40: CPU Temp: 54.0 >= 45, CPU Fan going med.
DEBUG_LOG 2020-06-19 18:45:40: CPU Fan: med
DEBUG_LOG 2020-06-19 18:45:40: CPU Fan changing... (med)
DEBUG_LOG 2020-06-19 18:45:40: Setting Zone 0 duty cycle to 60%
DEBUG_LOG 2020-06-19 18:45:41: core_temps:

I think that should only go med when temperature is 45
 

Gilgameshxg

Dabbler
Joined
Nov 26, 2018
Messages
37
Does anyone know if there has been a script like this one created for consumer baords without IPMI? I just built another FreeNAS box with a 3900x and a Rog Strix x570-f motherboard and was wondering if there was another thread with a script for this. Thanks in advance!
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Does anyone know if there has been a script like this one created for consumer baords without IPMI?
The PID script (or a variant of it) is available for using with a Corsair Commander Pro (and maybe some of their other fan controllers now that OpenCorsairLink is more-or-less complete).


You have very slim chances of getting to the BIOS controlled fans without IPMI (or even with it in the case of my ASUS board).

A separate fan controller is pretty much your only path. I have only been able to find 2 fan controller systems that have non-windows control options available:

The Corsair product lines that are supported in OpenCorsairLink (commander pro, astek, coolit... seems the h100 may have been added, but the comments say read only... https://github.com/audiohacked/OpenCorsairLink... I note that product is no longer supported by the creator, but I have forked it just in case and will at least be able to compile a version if required)

and Aquaero (5 or 6), which has an open source repository that seems to do something (under linux), but it seems not to have been touched in some years, nor completed to the point of being able to compile in FreeBSD with gmake. (https://github.com/Aquaeronix/aerocli)
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I fixed a few bugs in the script already. It seems (still) to work for me. I should mention that added dependencies on two Perl modules, Proc::Daemon and IPC::Run. If people find this particularly onerous, I can probably get rid of the dependencies.
I forked Kevin's repo as well and comitted all my changes to the new repo:


There is definitely more work to be done. An incomplete list:
  • get rid of all the chaff (functions no longer used)
  • re-vamp the config file format:
    • use standard config-file format ("key = value")
    • possibly use different "profiles" with one marked active as a substitute for zillions of different .ini files
  • variable-name cleanup -- use standard Perl naming convention
  • use hashes for passing groups of parameters back and forth instead of individual values

-rob
I don't know why, but I can never get cpan to install anything without hours of messing around chasing upstream dependencies, so if you can do it without the dependencies, that would be excellent...

I'm working on a collective variant which should support OpenCorsairLink, Asrock and supermicro fan control modes and will log to influxdb in addition to the logfile.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703

yonkoc

Explorer
Joined
Oct 26, 2011
Messages
52
Hello,

apologies, did not want to go Frankenstein on the thread, just to share my experience. For those like me that happened to move from SATA to SAS disks and, in the process, discover that all disks were gone from the logs along with their temps, here's how to update the script to bring them back.

My perl fu is very poor and coding fu, overall, is poor :smile: but here's what I did:
1st problem: the regex in sub get_hd_list was not detecting the SAS drives. The output was completely blank.
Solution, change the regex from egrep '^[a]*da[0-9]+\$' to egrep '^[a]*da[0-9]'

2nd problem: SATA drives output a whole table worth of info and you drop the temperature line from it in a space-split array, SAS drives report the temp but the output is very basic: Current Drive Temperature: XX C
Solution, change the lines in sub hd_get_temp and sub hd_get_temps from my $command = "/usr/local/sbin/smartctl -A $disk_dev | grep Temperature_Celsius"; to my $command = `/usr/local/sbin/smartctl -A $disk_dev | grep "Drive Temperature"`;

3rd problem: quoting, backticks etc. due to the new command above
Solution, in same subs, change this line from my $output = `$command`; to my $output = $command; to avoid double quoting.

4th problem: there's no 10th item to show, there are only 5 items in the space-divided array "Current Drive Temperature: XX C". Here, the 4th item is the one we're interested in.
Solution, in same subs, change this line from my $temp = "$vals[9]" to my $temp = "$vals[3]"

Now everything is back to normal and my fans are purring at 20% duty :smile:.

I just don't understand as to why the Fan Mode in the log is showing as Full. Can anyone enlighten me please?
Code:
PID Fan Controller Log  ---  Target 4 Disk HD Temperature = 38.00 deg C  ---  PID Control Gains: Kp =  5.333, Ki =  0.000, Kd =  48.0
                                                                  Max   Ave  Temp   Fan   Fan  Fan %   CPU    P      I      D      Fan
2021-08-08 da0  da1  da2  da3  da4  da5  da6  da7  da8  da9 da10 Temp  Temp   Err  Mode   RPM Old/New Temp  Corr   Corr   Corr    Duty
22:01:27   34   34   33   34   36   33   34   34   32   37   34  ^37  35.25 -2.75  Full   900  20/20   39 -22.00   0.00   16.00   20.00%
22:02:57   34   34   33   34   34   33   34   34   32   37   34  ^37  34.75 -3.25  Full   900  20/20   40 -26.00  -0.00  -16.00   20.00%
22:04:28   34   34   33   34   34   33   34   34   32   37   34  ^37  34.75 -3.25  Full   900  20/20   37 -26.00  -0.00    0.00   20.00%
22:05:57   33   34   33   34   34   33   34   34   32   37   34  ^37  34.75 -3.25  Full   900  20/20   40 -26.00  -0.00    0.00   20.00%
22:07:28   34   34   33   34   34   33   34   34   32   37   34  ^37  34.75 -3.25  Full   900  20/20   38 -26.00  -0.00    0.00   20.00%
22:08:58   34   34   33   34   34   33   34   34   32   37   34  ^37  34.75 -3.25  Full   900  20/20   38 -26.00  -0.00    0.00   20.00%
22:10:27   34   34   33   34   34   33   34   34   32   37   34  ^37  34.75 -3.25  Full   900  20/20   42 -26.00  -0.00    0.00   20.00%
22:11:57   34   34   33   34   36   33   34   34   32   37   34  ^37  35.25 -2.75  Full   900  20/20   40 -22.00  -0.00   16.00   20.00%
22:13:27   34   34   33   34   34   33   34   34   32   37   34  ^37  34.75 -3.25  Full   900  20/20   41 -26.00  -0.00  -16.00   20.00%
22:14:57   34   34   33   34   34   33   34   34   32   37   34  ^37  34.75 -3.25  Full   900  20/20   40 -26.00  -0.00    0.00   20.00%
22:16:27   34   34   33   34   36   33   34   34   32   37   34  ^37  35.25 -2.75  Full   900  20/20   39 -22.00  -0.00   16.00   20.00%
22:17:57   34   34   33   34   34   33   34   34   32   37   34  ^37  34.75 -3.25  Full   900  20/20   39 -26.00  -0.00  -16.00   20.00%


Update 8/13: I made extra modifications to the script, condensing the $hd_fan_list to only FANA (and updated variable where used) as that is the only group I have in my chassis. RPMs now correctly show only FANA speeds. Also adjusted TrueNAS SMART critical temp setting from 40C to 45C (got SAS drives now), temp to maintain is 39C, script now averages temps from 6 drives instead of the 4, and reduced fan speed to 10% min. It now occasionally ventures past 10-11% while idle, rarely above 20%, almost never above 25%. Noise now passes WAF. :smile: Hope this helps someone else like me on their first "Perl-y date".
Code:
PID Fan Controller Log  ---  Target 6 Disk HD Temperature = 39.00 deg C  ---  PID Control Gains: Kp =  5.333, Ki =  0.000, Kd =  48.0
                                                                  Max   Ave  Temp   Fan   Fan  Fan %   CPU    P      I      D      Fan
2021-08-13 da0  da1  da2  da3  da4  da5  da6  da7  da8  da9 da10 Temp  Temp   Err  Mode   RPM Old/New Temp  Corr   Corr   Corr    Duty
20:00:09   38   37   36   38   39   37   38   38   34   38   37  ^39  38.17 -0.83  Full  2200  10/10   40  -6.67  -0.00    0.00   10.00%
20:01:39   38   38   36   38   39   37   38   38   34   38   37  ^39  38.17 -0.83  Full  2200  10/10   43  -6.67  -0.00    0.00   10.00%
20:03:10   38   37   36   38   39   37   38   38   34   38   37  ^39  38.17 -0.83  Full  2200  10/10   43  -6.67  -0.00    0.00   10.00%
20:04:39   38   38   36   38   39   37   38   38   34   39   37  ^39  38.33 -0.67  Full  2200  10/10   42  -5.33  -0.00    5.33   10.00%
20:06:09   38   38   36   38   39   37   38   38   34   38   37  ^39  38.17 -0.83  Full  2200  10/10   44  -6.67  -0.00   -5.33   10.00%
20:07:39   38   38   36   38   39   37   38   38   34   38   37  ^39  38.17 -0.83  Full  2200  10/10   40  -6.67  -0.00    0.00   10.00%
20:09:09   38   38   36   38   39   37   38   38   34   39   37  ^39  38.33 -0.67  Full  2200  10/10   43  -5.33  -0.00    5.33   10.00%
20:10:39   38   37   36   38   39   37   38   38   34   38   37  ^39  38.17 -0.83  Full  2200  10/10   42  -6.67  -0.00   -5.33   10.00%
20:12:09   38   37   36   38   39   37   38   38   34   38   37  ^39  38.17 -0.83  Full  2200  10/10   42  -6.67  -0.00    0.00   10.00%
20:13:39   38   38   36   38   39   37   38   38   34   38   37  ^39  38.17 -0.83  Full  2200  10/10   43  -6.67  -0.00    0.00   10.00%
20:15:09   38   37   36   38   39   37   38   38   34   38   37  ^39  38.17 -0.83  Full  2200  10/10   41  -6.67  -0.00    0.00   10.00%
20:16:39   38   37   36   38   39   37   38   38   34   38   37  ^39  38.17 -0.83  Full  2200  10/10   41  -6.67  -0.00    0.00   10.00%
 
Last edited:

fosaq

Dabbler
Joined
Jun 24, 2020
Messages
17
I am using the script with some modifications in my TEST system for some time now and I want to give some feedback and have some questions.

Hardware: X11SCL-F with four 60x25 mm Nocuas on FANA header (with Y-cables) and four HDDs.

The script is set up to only control the HDD fans (FANA Header).

The first thing i did was eliminating the line that set the fan mode to 'full' at the beginning.
Code:
# set_fan_mode("full");

As controlling only the HD fans, this was leading to the CPU zone running 100% but I want the BMC to control the CPU zone.
Or maybe I didn't waitet long enough for the CPU zone to spin down?
The fan mode is set to 'optimal' so the BMC should not try to change the HD fans (same for 'heavyio').

Another change I made was to set the HD fan duty levels to...:
Code:
$hd_fan_duty_high      = 100;    # percentage on, ie 100% is full speed.
$hd_fan_duty_med_high  = 74;
$hd_fan_duty_med_low   = 48;
$hd_fan_duty_low       = 22;    # some 120mm fans stall below 30.
This works sweet with my fans.

And I changed all the new lines with "\n" to "\r\n" so the logs are also readable with a windows machine.

For the most time the system is idle but it worked during badblocks test and scrubs.

I've read the thread linked in the script and most of the posts on the thread from Stux.
But I still have some questions. $debug is set to 4 and I want to understand the log:

Code:
PID Fan Controller Log  ---  Target HD Temperature = 36.50 deg C  ---  PID Control Gains: Kp =  5.333, Ki =  0.000, Kd = 120.0
                               Max   Ave  Temp   Fan   Fan  Fan %   CPU    P      I      D      Fan
2022-03-19ada0 ada1 ada2 ada3 Temp  Temp   Err  Mode   RPM Old/New Temp  Corr   Corr   Corr    Duty
18:00:21   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   37 -40.00   0.00    0.00   22.00%
18:03:20   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   35 -40.00  -0.00    0.00   22.00%
18:06:20   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   36 -40.00   0.00    0.00   22.00%
18:09:20   33   34   36   34  ^36  34.25 -2.25   Opt   200  22/22   35 -36.00  -0.00   10.00   22.00%
18:12:20   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   36 -40.00  -0.00  -10.00   22.00%
18:15:20   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   37 -40.00  -0.00    0.00   22.00%
18:18:20   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   36 -40.00  -0.00    0.00   22.00%
18:21:20   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   37 -40.00  -0.00    0.00   22.00%
18:24:20   33   34   36   34  ^36  34.25 -2.25   Opt   200  22/22   36 -36.00  -0.00   10.00   22.00%
18:27:20   33   34   36   34  ^36  34.25 -2.25   Opt   200  22/22   36 -36.00  -0.00    0.00   22.00%
18:30:20   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   36 -40.00  -0.00  -10.00   22.00%
18:33:20   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   36 -40.00  -0.00    0.00   22.00%
18:36:20   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   36 -40.00  -0.00    0.00   22.00%
18:39:20   33   34   36   34  ^36  34.25 -2.25   Opt   200  22/22   38 -36.00  -0.00   10.00   22.00%
18:42:21   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   37 -40.00   0.00  -10.00   22.00%
18:45:21   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   35 -40.00  -0.00    0.00   22.00%


What does the columns "Temp Err", "P", "I" and "D" exactly telling me?
Yeah, "P", "I" and "D" stand for proportional, integral and derivative but what is the consequence for the fan control?
There are also correction values in the script ("Kp", "Ki", "Kd") and I just don't know if or when I need to change these values.

Last thing is: the column "FAN RPM" shows just the wrong value. The fans are actualy spinning with 500-600 RPM but are shown as 200.
 
Joined
Dec 2, 2015
Messages
730
I've read the thread linked in the script and most of the posts on the thread from Stux.
But I still have some questions. $debug is set to 4 and I want to understand the log:

Code:
PID Fan Controller Log  ---  Target HD Temperature = 36.50 deg C  ---  PID Control Gains: Kp =  5.333, Ki =  0.000, Kd = 120.0
                               Max   Ave  Temp   Fan   Fan  Fan %   CPU    P      I      D      Fan
2022-03-19ada0 ada1 ada2 ada3 Temp  Temp   Err  Mode   RPM Old/New Temp  Corr   Corr   Corr    Duty
18:00:21   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   37 -40.00   0.00    0.00   22.00%
18:03:20   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   35 -40.00  -0.00    0.00   22.00%
18:06:20   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   36 -40.00   0.00    0.00   22.00%
18:09:20   33   34   36   34  ^36  34.25 -2.25   Opt   200  22/22   35 -36.00  -0.00   10.00   22.00%
18:12:20   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   36 -40.00  -0.00  -10.00   22.00%
18:15:20   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   37 -40.00  -0.00    0.00   22.00%
18:18:20   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   36 -40.00  -0.00    0.00   22.00%
18:21:20   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   37 -40.00  -0.00    0.00   22.00%
18:24:20   33   34   36   34  ^36  34.25 -2.25   Opt   200  22/22   36 -36.00  -0.00   10.00   22.00%
18:27:20   33   34   36   34  ^36  34.25 -2.25   Opt   200  22/22   36 -36.00  -0.00    0.00   22.00%
18:30:20   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   36 -40.00  -0.00  -10.00   22.00%
18:33:20   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   36 -40.00  -0.00    0.00   22.00%
18:36:20   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   36 -40.00  -0.00    0.00   22.00%
18:39:20   33   34   36   34  ^36  34.25 -2.25   Opt   200  22/22   38 -36.00  -0.00   10.00   22.00%
18:42:21   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   37 -40.00   0.00  -10.00   22.00%
18:45:21   33   33   36   34  ^36  34.00 -2.50   Opt   200  22/22   35 -40.00  -0.00    0.00   22.00%


What does the columns "Temp Err", "P", "I" and "D" exactly telling me?
Yeah, "P", "I" and "D" stand for proportional, integral and derivative but what is the consequence for the fan control?
There are also correction values in the script ("Kp", "Ki", "Kd") and I just don't know if or when I need to change these values.

Last thing is: the column "FAN RPM" shows just the wrong value. The fans are actualy spinning with 500-600 RPM but are shown as 200.
"Temp Err" is the difference between the target HD temperature and the average HD temperature. The last line has "-2.50", which means the average HD temperature is 2.5 degrees colder than the large of 36.5

"P Corr", "I Corr" & "D Corr" show the amount of correction in percent fan speed that the script is applying. The last line shows a "P Corr" of -40, which means that the script would reduce the fan speed 40% from the previous value due to the temperature error. But, you have set the minimum fan duty cycle to 22%, so the script will not reduce the duty cycle to less than that.

"P Corr", "I Corr" & "D Corr" values in the log are useful if you are not happy with the performance and want to change the values of Kp, Ki or Kd. Look at the magnitude of "P Corr", "I Corr" & "D Corr" in different circumstances to see whether you should adjust Kp, Ki or Kd. Kp comes into play when the script is correcting for a temperature error. Kd comes into play when the average HD temp changes, as it determines the step change in fan speed that is apply to damp out the rate of temperature change. E.g, if the HD temp increases, Kd determine how much to increase the fan speed on this time around the loop.

For the issue with logged fan speed, confirm that the fan headers are correctly specified in the "## FAN HEADERS" portion of the script
 

fosaq

Dabbler
Joined
Jun 24, 2020
Messages
17
Thank you very much!

So, "Temp ERR" isn't an "error" and as long this value is not above 0 there is no need for the script to raise the fan speeds, right?

I changed the values for the array @hd_fan_list just to ("FANA") instead of ("FANA", "FANB", "FANC").
I thought these are the headers the script can deal with and it would determine if there is no FANB and/or FANC.
Now the fan speed is shown correctly.

The min. fan duty cycle of 22% is because the lowest speed these fans can spin is at 0x14 with the ipmi raw command and that correspond to 21.875%.

Additionaly I changed the headline to "Temp Diff" instead of "Temp Err" which I find more self-explaining and added another new line before the "PID Control Gains". I find this is better readable:
Code:
PID Fan Controller Log  ---  Target HD Temperature = 36.50 deg C  ---
PID Control Gains: Kp =  5.333, Ki =  0.000, Kd = 120.0
                               Max   Ave  Temp   Fan   Fan  Fan %   CPU    P      I      D      Fan
2022-03-20ada0 ada1 ada2 ada3 Temp  Temp  Diff  Mode   RPM Old/New Temp  Corr   Corr   Corr    Duty
15:26:04   33   33   36   34  ^36  34.00 -2.50   Opt   500 100/60   46 -40.00  -0.00    0.00   60.00%
15:29:04   33   33   36   34  ^36  34.00 -2.50   Opt  1900  60/22   35 -40.00   0.00    0.00   22.00%
15:32:04   33   32   36   34  ^36  33.75 -2.75   Opt   600  22/22   35 -44.00  -0.00  -10.00   22.00%
15:35:04   33   33   36   34  ^36  34.00 -2.50   Opt   600  22/22   37 -40.00  -0.00   10.00   22.00%
15:38:04   33   33   36   34  ^36  34.00 -2.50   Opt   600  22/22   34 -40.00  -0.00    0.00   22.00%
 

fosaq

Dabbler
Joined
Jun 24, 2020
Messages
17
Is there something in the script that stops logging at some point? I rebootet the system this morning and after ~10 hours uptime the log is no longer filled with new lines. Even the almost original script with just the fan speeds set and deactivating the line wich will set the fan mode to full stops after some time.
In both versions debug is set to 4.
I noticed this behavior since I use the script (like ~ 3 weeks ago) and every time the logging stops after some time (~10-30 hours).
The script is still running as I can see in 'htop'.
 

Gaspetaahl

Explorer
Joined
Sep 13, 2018
Messages
76
I tested the script and it worked initially. The fans started low and after I stressed the CPU the fans we're set to high. But after I stopped stressing the fans the fans didn't reset to low. Also I noticed the CPU Temps are displayed wrong in the log file.
I saw this in stdout after I stopped stressing the fans:
`Unable to send RAW command (channel=0x0 netfn=0x30 lun=0x0 cmd=0x70 rsp=0xcc): Invalid data field in request`
Im not sure what the ipmitool command is that fails. Unfortunatly it isnt logged
This is my PID_fan_control.log with the wrong CPU temp numbers.
```
PID Fan Controller Log --- Target 4 Disk HD Temperature = 38.00 deg C --- PID Control Gains: Kp = 2.667, Ki = 0.000, Kd = 30.0
Max Ave Temp Fan Fan Fan % CPU P I D Fan
2022-04-15ada0 ada1 ada2 ada3 ada4 da0 Temp Temp Err Mode RPM Old/New Temp Corr Corr Corr Duty
15:37:28 36 38 37 35 37 ^38 37.00 -1.00 Full 1250 36/56 37 -4.00 -0.00 0.00 56.00%
15:39:00 35 38 37 35 37 ^38 36.75 -1.25 Full 975 56/46 53 -5.00 -0.00 -5.00 46.00%
15:40:30 35 37 37 35 37 ^37 36.50 -1.50 Full 725 46/35 61 -6.00 -0.00 -5.00 35.00%
15:42:00 35 37 37 35 37 ^37 36.50 -1.50 Full 650 35/29 59 -6.00 -0.00 0.00 29.00%
15:43:28 35 37 37 35 37 ^37 36.50 -1.50 Full 1275 29/23 42 -6.00 -0.00 0.00 23.00%
15:44:58 35 37 36 34 36 ^37 36.00 -2.00 Full 1775 23/16 37 -8.00 -0.00 -10.00 16.00%
15:46:28 35 36 36 34 36 ^36 35.75 -2.25 Full 1775 16/16 37 -9.00 -0.00 -5.00 16.00%
15:47:58 35 36 35 34 35 ^36 35.25 -2.75 Full 1775 16/16 37 -11.00 -0.00 -10.00 16.00%
15:49:28 34 35 35 33 35 ^35 34.75 -3.25 Full 1775 16/16 35 -13.00 -0.00 -10.00 16.00%
15:50:58 34 35 35 33 34 ^35 34.50 -3.50 Full 1775 16/16 37 -14.00 -0.00 -5.00 16.00%
15:52:28 34 35 34 33 34 ^35 34.25 -3.75 Full 1775 16/16 35 -15.00 -0.00 -5.00 16.00%
15:53:58 33 34 34 32 34 ^34 33.75 -4.25 Full 1775 16/16 34 -17.00 -0.00 -10.00 16.00%
15:55:29 33 34 34 32 34 ^34 33.75 -4.25 Full 1775 16/16 34 -17.00 -0.00 0.00 16.00%
15:56:58 33 34 34 32 33 ^34 33.50 -4.50 Full 1775 16/16 35 -18.00 -0.00 -5.00 16.00%
15:58:28 33 33 33 32 33 ^33 33.00 -5.00 Full 750 16/16 36 -20.00 -0.00 -10.00 16.00%
15:59:59 33 34 33 32 33 ^34 33.25 -4.75 Full 725 16/16 36 -19.00 -0.00 5.00 16.00%
```
In the DEBUG_PID_fan_control.log the cpu temp numbers are alright
```
32.0
32.0
32.0
33.0
32.0
32.0
35.0
35.0
2022-04-15 16:19:52: core_temp = 32.0 C
2022-04-15 16:19:52: core_temp = 32.0 C
2022-04-15 16:19:52: core_temp = 32.0 C
2022-04-15 16:19:52: core_temp = 33.0 C
2022-04-15 16:19:52: core_temp = 32.0 C
2022-04-15 16:19:52: core_temp = 32.0 C
2022-04-15 16:19:52: core_temp = 35.0 C
2022-04-15 16:19:52: core_temp = 35.0 C
2022-04-15 16:19:52: CPU Temp: 35.0
2022-04-15 16:19:52: CPU Fan: low
2022-04-15 16:19:53: core_temps:
31.0
31.0
34.0
34.0
32.0
32.0
35.0
35.0
2022-04-15 16:19:53: core_temp = 31.0 C
2022-04-15 16:19:53: core_temp = 31.0 C
2022-04-15 16:19:53: core_temp = 34.0 C
2022-04-15 16:19:53: core_temp = 34.0 C
2022-04-15 16:19:53: core_temp = 32.0 C
2022-04-15 16:19:53: core_temp = 32.0 C
2022-04-15 16:19:53: core_temp = 35.0 C
2022-04-15 16:19:53: core_temp = 35.0 C
2022-04-15 16:19:53: CPU Temp: 35.0
2022-04-15 16:19:53: CPU Fan: low
```

Can someone help me out here?
 
Joined
Dec 2, 2015
Messages
730
I tested the script and it worked initially. The fans started low and after I stressed the CPU the fans we're set to high. But after I stopped stressing the fans the fans didn't reset to low. Also I noticed the CPU Temps are displayed wrong in the log file.
I saw this in stdout after I stopped stressing the fans:
`Unable to send RAW command (channel=0x0 netfn=0x30 lun=0x0 cmd=0x70 rsp=0xcc): Invalid data field in request`
Im not sure what the ipmitool command is that fails. Unfortunatly it isnt logged
This is my PID_fan_control.log with the wrong CPU temp numbers.
```
PID Fan Controller Log --- Target 4 Disk HD Temperature = 38.00 deg C --- PID Control Gains: Kp = 2.667, Ki = 0.000, Kd = 30.0
Max Ave Temp Fan Fan Fan % CPU P I D Fan
2022-04-15ada0 ada1 ada2 ada3 ada4 da0 Temp Temp Err Mode RPM Old/New Temp Corr Corr Corr Duty
15:37:28 36 38 37 35 37 ^38 37.00 -1.00 Full 1250 36/56 37 -4.00 -0.00 0.00 56.00%
15:39:00 35 38 37 35 37 ^38 36.75 -1.25 Full 975 56/46 53 -5.00 -0.00 -5.00 46.00%
15:40:30 35 37 37 35 37 ^37 36.50 -1.50 Full 725 46/35 61 -6.00 -0.00 -5.00 35.00%
15:42:00 35 37 37 35 37 ^37 36.50 -1.50 Full 650 35/29 59 -6.00 -0.00 0.00 29.00%
15:43:28 35 37 37 35 37 ^37 36.50 -1.50 Full 1275 29/23 42 -6.00 -0.00 0.00 23.00%
15:44:58 35 37 36 34 36 ^37 36.00 -2.00 Full 1775 23/16 37 -8.00 -0.00 -10.00 16.00%
15:46:28 35 36 36 34 36 ^36 35.75 -2.25 Full 1775 16/16 37 -9.00 -0.00 -5.00 16.00%
15:47:58 35 36 35 34 35 ^36 35.25 -2.75 Full 1775 16/16 37 -11.00 -0.00 -10.00 16.00%
15:49:28 34 35 35 33 35 ^35 34.75 -3.25 Full 1775 16/16 35 -13.00 -0.00 -10.00 16.00%
15:50:58 34 35 35 33 34 ^35 34.50 -3.50 Full 1775 16/16 37 -14.00 -0.00 -5.00 16.00%
15:52:28 34 35 34 33 34 ^35 34.25 -3.75 Full 1775 16/16 35 -15.00 -0.00 -5.00 16.00%
15:53:58 33 34 34 32 34 ^34 33.75 -4.25 Full 1775 16/16 34 -17.00 -0.00 -10.00 16.00%
15:55:29 33 34 34 32 34 ^34 33.75 -4.25 Full 1775 16/16 34 -17.00 -0.00 0.00 16.00%
15:56:58 33 34 34 32 33 ^34 33.50 -4.50 Full 1775 16/16 35 -18.00 -0.00 -5.00 16.00%
15:58:28 33 33 33 32 33 ^33 33.00 -5.00 Full 750 16/16 36 -20.00 -0.00 -10.00 16.00%
15:59:59 33 34 33 32 33 ^34 33.25 -4.75 Full 725 16/16 36 -19.00 -0.00 5.00 16.00%
```
In the DEBUG_PID_fan_control.log the cpu temp numbers are alright
```
32.0
32.0
32.0
33.0
32.0
32.0
35.0
35.0
2022-04-15 16:19:52: core_temp = 32.0 C
2022-04-15 16:19:52: core_temp = 32.0 C
2022-04-15 16:19:52: core_temp = 32.0 C
2022-04-15 16:19:52: core_temp = 33.0 C
2022-04-15 16:19:52: core_temp = 32.0 C
2022-04-15 16:19:52: core_temp = 32.0 C
2022-04-15 16:19:52: core_temp = 35.0 C
2022-04-15 16:19:52: core_temp = 35.0 C
2022-04-15 16:19:52: CPU Temp: 35.0
2022-04-15 16:19:52: CPU Fan: low
2022-04-15 16:19:53: core_temps:
31.0
31.0
34.0
34.0
32.0
32.0
35.0
35.0
2022-04-15 16:19:53: core_temp = 31.0 C
2022-04-15 16:19:53: core_temp = 31.0 C
2022-04-15 16:19:53: core_temp = 34.0 C
2022-04-15 16:19:53: core_temp = 34.0 C
2022-04-15 16:19:53: core_temp = 32.0 C
2022-04-15 16:19:53: core_temp = 32.0 C
2022-04-15 16:19:53: core_temp = 35.0 C
2022-04-15 16:19:53: core_temp = 35.0 C
2022-04-15 16:19:53: CPU Temp: 35.0
2022-04-15 16:19:53: CPU Fan: low
```

Can someone help me out here?
Unfortunately the extracts from the log and the debug log cover different times, so it is not possible to compare them. The log extract shows CPU temps from 34 to 37 for most of the time, with a short duration spike up to 61. Reviewing the code, the CPU temps in the log have the same source as the CPU temps in the debug log, so it is not obvious why they should differ.

I think the bigger issue is to figure out why the RAW command failed, as that is likely why the fan control stopped working. Which motherboard does your system have?
 

Gaspetaahl

Explorer
Joined
Sep 13, 2018
Messages
76
Unfortunately the extracts from the log and the debug log cover different times, so it is not possible to compare them. The log extract shows CPU temps from 34 to 37 for most of the time, with a short duration spike up to 61. Reviewing the code, the CPU temps in the log have the same source as the CPU temps in the debug log, so it is not obvious why they should differ.

I think the bigger issue is to figure out why the RAW command failed, as that is likely why the fan control stopped working. Which motherboard does your system have?
I have an X10SLL-F
 
Joined
Dec 2, 2015
Messages
730
I have an X10SLL-F
Hmm. That board should work. One more puzzle which shed some light on the problem - the log shows headers for six HDs (ada0..ada4 and da0), I only see five HD temperatures in each line. How many HDs does your system have?
 
Joined
Dec 2, 2015
Messages
730
And, which FreeNAS or TrueNAS version are you running? What is the make and model of the HDs?
 

Gaspetaahl

Explorer
Joined
Sep 13, 2018
Messages
76
And, which FreeNAS or TrueNAS version are you running? What is the make and model of the HDs?
Sorry for the late response. ada0 - 4 are my harddrives. da0 is my boot flash drive.
I tun TrueNAS 12.0-U8
The drives are WD Red drives from a few years ago
1 WDC WD80EFAX-68LHPN0
and 4 WDC WD80EZAZ-11TDBA0

I also attached the full debug log
 

Attachments

  • Debug_PID_fan_control.txt
    1.1 MB · Views: 246
Joined
Dec 2, 2015
Messages
730
Let's do some troubleshooting to attempt to identify why the script is not working correctly on your system.

Connect to your server via ssh, run the following commands, and report back with the output of each command:

camcontrol devlist
for n in 0 1 2 3 4;do;/usr/local/sbin/smartctl -A /dev/ada$n | grep Temperature_Celsius;done
ipmitool raw 0x30 0x70 0x66 0x00 0x00
ipmitool raw 0x30 0x70 0x66 0x00 0x01
 

Gaspetaahl

Explorer
Joined
Sep 13, 2018
Messages
76

<WDC WD80EFAX-68LHPN0 83.H0A83> at scbus0 target 0 lun 0 (pass0,ada0)
<WDC WD80EZAZ-11TDBA0 83.H0A83> at scbus2 target 0 lun 0 (pass1,ada1)
<WDC WD80EZAZ-11TDBA0 83.H0A83> at scbus3 target 0 lun 0 (pass2,ada2)
<WDC WD80EZAZ-11TDBA0 83.H0A83> at scbus4 target 0 lun 0 (pass3,ada3)
<WDC WD80EZAZ-11TDBA0 83.H0A83> at scbus5 target 0 lun 0 (pass4,ada4)
<AHCI SGPIO Enclosure 2.00 0001> at scbus6 target 0 lun 0 (ses0,pass5)
<SanDisk Extreme 1.00> at scbus8 target 0 lun 0 (da0,pass6)



# for n in 0 1 2 3 4; do /usr/local/sbin/smartctl -A /dev/ada$n | grep Temperature_Celsius; done
194 Temperature_Celsius 0x0002 180 180 000 Old_age Always - 36 (Min/Max 18/47)
194 Temperature_Celsius 0x0002 166 166 000 Old_age Always - 39 (Min/Max 17/47)
194 Temperature_Celsius 0x0002 175 175 000 Old_age Always - 37 (Min/Max 18/46)
194 Temperature_Celsius 0x0002 185 185 000 Old_age Always - 35 (Min/Max 17/46)
194 Temperature_Celsius 0x0002 171 171 000 Old_age Always - 38 (Min/Max 17/47)


# ipmitool raw 0x30 0x70 0x66 0x00 0x00
24


# ipmitool raw 0x30 0x70 0x66 0x00 0x01
16
 
Top