Script to control fan speed in response to hard drive temperatures

leonroy

Explorer
Joined
Jun 15, 2012
Messages
77
X9SCL+-F

Is this the BMC info you mean?

[root@freenas] ~# ipmitool bmc info
Device ID : 32
Device Revision : 1
Firmware Revision : 3.38
IPMI Version : 2.0
Manufacturer ID : 10876
Manufacturer Name : Supermicro
Product ID : 1572 (0x0624)
Product Name : Unknown (0x624)
Device Available : yes
Provides Device SDRs : no
Additional Device Support :
Sensor Device
SDR Repository Device
SEL Device
FRU Inventory Device
IPMB Event Receiver
IPMB Event Generator
Chassis Device
 

PigLover

Dabbler
Joined
May 29, 2016
Messages
11
Your BMC rev seems relatively current so its not likely due to old IPMI firmware.

The SuperMicro FAQ posted earlier just says it applies to X10 MBs. Silent on X9. So maybe out of luck? See here: http://www.supermicro.com/support/faqs/faq.cfm?faq=20882

I've tried it with a number of different X9, X10 and X11 boards. But the X9's and X10s were all E5 boards and the X11s were all Xeon-D. Perhaps its no-love for the E3 boards?

Sorry, don't know the answer.

Could be a good follow-up for you on the SM support board.
 

leonroy

Explorer
Joined
Jun 15, 2012
Messages
77
Bummer, thanks for looking into it. I'll ping Supermicro support and see what they have to say.
 

leonroy

Explorer
Joined
Jun 15, 2012
Messages
77
I took a quick look at another Supermicro server I just bought for use as a pfSense box. It has an A1SRi-2558F board in it. The A1SRi has no BIOS options for setting fan speed. Fan speed is configured via the IPMI admin pages.

On the X9SCL+-F fan speed is set via the BIOS only. I believe Supermicro might have not 'plumbed' in the link between IPMI and the fan controller on this particular board hence the BIOS control.

Annoying since I just purchased a bunch of PWM fans and wiring harnesses to make this Supermicro quieter. Guess I'll have to pickup a different board if I really have my heart set on being able to make this box silent.
 
Joined
Dec 2, 2015
Messages
730
PID Fan Control Script with Detailed Logs
Since being inspired by the ideas and facts presented in this thread, especially the tricks that Kevin discovered to control fans, I’ve spent way too much time working on this. My motivation is that the PWM fan control logic in my motherboard is lousy. Much of the time, setting the mode to Standard is inadequate and HeavyIO is overkill. Either way, the temperatures fluctuate a lot, so I don’t see switching between them in a script as an ideal solution.

Instead, adjusting fan duty cycle to maintain a steady temperature seems to be the way to go. There are a couple of problems though:
BMC won’t give up control. It’s hard to develop control logic that works when the BMC independently decides to change duty cycle. This may happen when you set a duty cycle near the upper/lower end, >80% or <30%. Limiting your own settings between those figures doesn’t help much, since the logic also doesn’t work right if you hit a control limit. I’ve taken two approaches to dealing with this:
When setting the duty cycle ≥ 50%, first set the mode to HeavyIO, and when setting <50%, first set mode to Standard. This has nothing to do with regulating temperature, it is just less likely the BMC will feel the need to make adjustments in those ranges.
Control the temperature so well that fans rarely, if ever, will get near the extremes of duty cycle.
Temperature readings are way too coarse for fine control. Using the temperature of the hottest drive (Tmax) is a good strategy in principle, but you only have 1 C resolution. Let’s say your setpoint (SP) is 36 C. You can successfully keep Tmax in the range 35-37, but you will never do better than that, and it will be very hard to keep it from oscillating. The solution I’ve found is using the mean temperature of all drives (Tmean). Even with slight temperature change, chances are that one of the drives will cross the degree threshold and its temperature reading will change. Using Tmean (with at least 1 decimal place) gives you much more and earlier information about temperature trends in your drives.
First, a few basics in case there are any fan newbies. There are two fan settings you can control: mode and duty cycle. You can read those plus RPM.
Mode. There are 3-5 modes, depending on the board. Mine has Standard, HeavyIO, and Full. Full is full speed all the time. Standard and HeavyIO have some degree of speed control based on board temperatures. HeavyIO is a higher speed at a given temperature. The details don’t seem to be well documented.
Duty cycle and RPM. Duty cycle is a percentage of full power applied to the fans. It is correlated with RPM, actual fan speed, but you can’t set RPM directly. And when you read RPM it is rounded to the nearest 100.
I read whatever I could understand on PID control. Turns out, as BiduleOhm said, it doesn’t have to be as complicated as they make it out to be. Here’s how it works. Based on temperature error from your setpoint, you calculate three corrections (P, I, and D) to adjust the duty cycle. Each has a tuning constant: Kp, Ki, and Kd. Note that T (time between control cycles) is sometimes included in the calculation of I and D. It makes it a bit more complicated, but I think then your tuning constants don’t break if you change T. However, I’ve never changed T to see what happens.

First you calculate the current error as ERRc = Tmean – SP. Most sources say to calculate error the other way, SP-Tmean, but it makes more sense to have error be positive when temps are too high and you need to increase fan speed.
P is for proportional. This is a correction proportional to the current error. Just multiply by the tuning constant. So the formula for P is simply Kp * ERRc.
I is for integral. This is a correction for cumulative error. So every cycle, you add the current error (ERRc) * T to cumulative error (ERR), and multiply by a tuning constant. So ERR = ERRc*T + ERR, then I = Ki * ERR. My understanding is this helps correct offset; where the temperature stays a bit below or above SP.
D is for derivative. That is change in current error with respect to time, or basically the slope of the error line. In practice it’s even simpler, you just need to subtract ERRc from the previous error (ERRp), divide by T and multiply by another tuning constant. So D = Kd * ((ERRc – ERRp) / T). This really does two things. When you start a scrub or the sun hits your NAS, you get a large positive error. The bigger the increase in error, the bigger D is. In this case, D and P are additive, aggressively increasing duty cycle. But then when the temps are cooling fast, coming back down to SP, P is still positive and D is negative, so D counters P and puts the brakes on the fans, reducing overshoot and subsequent oscillation.
Most of what I’ve read says, in the great majority of cases, D is not needed, and you can just use PI. So I worked with PI for a long time, trying all kinds of tuning, and always had oscillation. By then I was using Tmean instead of Tmax, and I put the derivative term in and it was MUCH more stable. On the other hand, I have not seen any sign of offset, and I does more harm than good, so I ended up setting Ki to 0.

Here are a few graphs showing some preliminary trials. First using Tmax as the process variable, SP is 36. Kp is too high.
View attachment 11763

Tried adding the I term. Many experiments I won't bore you with. This one using Tmean, still bad oscillation. This shows cumulative error too. No improvement View attachment 11764

And the script and tuning I ended up with, showing Tmax, Tmean, and duty cycle.. As a test, this starts with a large error. See how Tmean comes down to SP without overshooting. Then there are minor corrections through the day, and you can see the fans increase as the sun comes in in the afternoon.
View attachment 11765


Here is what the log looks like. The stuff on the right (starting with ERRc) can be turned off; I just use it for diagnosis and tuning. This log begins with starting the script. It gets equilibrated about 30 minutes in.
Code:
Saturday, May 14
          da0     da1     da2     da3     ada0    ada1    ada2    Tmax Tmean  RPM MODE     Fan% Curr/New
15:01:03  Spin 35 Spin 35 Spin 36 Spin 35 Spin 31 Spin 32 Spin 32 ^36  33.71  800 Standard 49/51  ERRc= 0.14; P=  0.58; I= 0.00; D=  1.15
15:06:05  Spin 35 Spin 35 Spin 36 Spin 36 Spin 31 Spin 32 Spin 32 ^36  33.86  800 HeavyIO  51/53  ERRc= 0.29; P=  1.15; I= 0.00; D=  1.14
15:11:06  Spin 35 Spin 35 Spin 36 Spin 36 Spin 32 Spin 33 Spin 32 ^36  34.14  800 HeavyIO  53/58  ERRc= 0.57; P=  2.29; I= 0.00; D=  2.28
15:16:07  Spin 35 Spin 35 Spin 36 Spin 36 Spin 32 Spin 33 Spin 32 ^36  34.14  900 HeavyIO  58/60  ERRc= 0.57; P=  2.29; I= 0.00; D=  0.00
15:21:08  Spin 35 Spin 35 Spin 36 Spin 36 Spin 32 Spin 33 Spin 32 ^36  34.14  900 HeavyIO  60/62  ERRc= 0.57; P=  2.29; I= 0.00; D=  0.00
15:26:09  Spin 35 Spin 35 Spin 36 Spin 35 Spin 32 Spin 33 Spin 32 ^36  34.00  900 HeavyIO  62/63  ERRc= 0.43; P=  1.72; I= 0.00; D= -1.13
15:31:10  Spin 35 Spin 34 Spin 36 Spin 35 Spin 32 Spin 32 Spin 32 ^36  33.71 1000 HeavyIO  63/61  ERRc= 0.14; P=  0.58; I= 0.00; D= -2.28
15:36:12  Spin 35 Spin 34 Spin 36 Spin 35 Spin 31 Spin 32 Spin 32 ^36  33.57  900 HeavyIO  61/60  ERRc= 0.00; P=  0.00; I= 0.00; D= -1.14
15:41:13  Spin 35 Spin 34 Spin 36 Spin 35 Spin 31 Spin 32 Spin 32 ^36  33.57  900 HeavyIO  60/60  ERRc= 0.00; P=  0.00; I= 0.00; D=  0.00
15:46:14  Spin 35 Spin 34 Spin 36 Spin 35 Spin 31 Spin 32 Spin 32 ^36  33.57  900 HeavyIO  60/60  ERRc= 0.00; P=  0.00; I= 0.00; D=  0.00
15:51:15  Spin 35 Spin 34 Spin 36 Spin 35 Spin 31 Spin 32 Spin 32 ^36  33.57  900 HeavyIO  60/60  ERRc= 0.00; P=  0.00; I= 0.00; D=  0.00
15:56:16  Spin 35 Spin 34 Spin 36 Spin 35 Spin 31 Spin 32 Spin 32 ^36  33.57  900 HeavyIO  60/60  ERRc= 0.00; P=  0.00; I= 0.00; D=  0.00

This is MUCH better than the control built into the boards. Tmean normally stays within 0.3 C of SP unless there is a disturbance, then within 0.5 C. It is damn near perfect. I guess SuperMicro doesn’t do something like this because (a) they’re more interested in protecting the board than the drives, and the board is less sensitive to temperature variation, (b) accessing drive temperatures depends on the OS, and (c) it requires tuning.

Tuning
PID tuning advice on the internet generally does not work well for controlling drive temperatures in my experience.
First run the script spincheck.sh (logs detailed data only, no control) and get familiar with your temperature and fan variations without any intervention.
Now in the settings of spinpid.sh, choose a setpoint that is an actual observed Tmean, given the number of drives you have. It should be the Tmean associated with the Tmax that you want.
Set Ki=0 and leave it there. You probably will never need it.
Start with Kp low. Use a value that results in a rounded correction=1 when error is the lowest value you observe other than 0 (i.e., when ERRc is minimal, Kp ~= 1 / ERRc). However, if you have few drives and thus coarser temperature monitoring, you may need a larger Kp. I would not go below 4.
Set Kd at about Kp*10
Get Tmean within ~0.3 degree of SP before starting script. At this stage you don’t want to test a large error, you want an equilibrated system.
Start script and run for a few hours or so. If Tmean oscillates (best to graph it), you probably need to reduce Kd. If no oscillation but response is too slow, raise Kd.
Stop script and get Tmean at least 1 C off SP. Restart. If there is overshoot and it goes through some cycles, you may need to reduce Kd.
If you have problems, examine P and D in the log and see which is messing you up. Most likely Kd needs tuning. You can try raising Kp, though too high and changes become too aggressive and you get overshoot and oscillation. You can even try using Ki. If you use Ki, make it small, ~ 0.1 or less.
Scripts

There are two bash scripts. spincheck.sh logs data only, it does not control anything. spinpid.sh logs and controls. Both scripts log:
disk status (spinning or standby)
disk temperature (Celsius) if spinning
max and mean disk temperature
fan rpm and mode
current fan duty cycle (plus new one for spinpid.sh)
optional diagnostic variables
The scripts include disks on motherboard as well as on an HBA. They get a list of devices from camcontrol devlist. I edit that to delete my SanDisk flash drives from the list; you may have to change that for your system. Suggestions in the script.

The mode code is primarily based on my board. If you have different modes you may need to make some minor tweaks. I have no idea if any of this is applicable to non-Supermicro boards.

As usual, you are responsible for anything you do on your system. This works for me, but for all I know it could make your box catch fire or cause a zombie apocalypse. I suggest you monitor closely at first.

After testing, if you decide to use it, you can run it as a post-init script (Tasks in the GUI) so it starts automatically after booting. In this case, to avoid ‘windup’ (a large error when starting with possibly cold drives), I suggest you add a ‘sleep 1200’ before the main loop. Then it will wait 20 minutes for the drives to warm up before doing anything.

spincheck.sh
Code:
#!/usr/local/bin/bash
# spincheck.sh version 2016-05-13. Run as superuser. See notes at end.

# Creates logfile and sends all stdout and stderr to the log, leaving the previous contents in place. If you want to append to existing log, add '-a' to the tee command.
LOG=spincheck.log
exec > >(tee -i $LOG) 2>&1

SP=33.57    #  Setpoint mean temperature, for information only

function get_disk_name {
# The awk statement works by taking the $LINE as input,
# setting ',' as a _F_ield separator and taking the second field it separates
# (ie after the separator), passing that to another awk that uses
# ')' as a separator, and taking the first field (ie before the separator).
# In other words, everything between ',' and ')' is kept.
    DEVID=$(echo $LINE | awk -F ',' '{print $2}' | awk -F ')' '{print$1}')
}

function print_header {
    i=0  # counting number of drives
    # Header is printed when script starts and each new day
    DATE=$(date +"%A, %b %d")
    echo $DATE
    echo -n "          "
    while read LINE ; do
        get_disk_name
        let "i += 1"
        printf "%-8s" $DEVID
     done <<< "$DEVLIST"        # while statement works on DEVLIST
    printf "%4s %5s %4s %-8s %s \n" "Tmax" "Tmean" "RPM" "MODE" "Fan%"
}

function data {
    Tmean=$(echo "scale=2; $Tsum / $i" | bc)
    ERRc=$(echo "scale=2; $Tmean - $SP" | bc)
    # Read duty cycle and mode, convert to decimal
    DUTY=$(ipmitool raw 0x30 0x70 0x66 0 0) # in hex
    DUTY=$(printf "0x%s" $DUTY) # adds Ox in front
    DUTY=`echo $(($DUTY))`      # converts to decimal
     MODE=$(ipmitool raw 0x30 0x45 0)
    MODE=$(printf "0x%s" $MODE)
    MODE=`echo $(($MODE))`
    # Text for mode
    case $MODE in
        0) MODEt="Standard" ;;
        4) MODEt="HeavyIO" ;;
        2) MODEt="Optimal" ;;
        1) MODEt="Full" ;;
    esac
    # Get reported fan speed in RPM, assume all the same
    # Takes the line with FAN1, then 2nd through the 5th
    # digit if there are that many
    RPM=$(ipmitool sdr | grep "FAN1" | grep -Eo '[0-9]{2,5}')
    # print current Tmax, speed, mode, and duty
    printf "^%-3d %5.2f %4d %-8s %2d" $Tmax $Tmean $RPM $MODEt $DUTY
    printf "   ERRc= %5.2f\n" $ERRc
}

echo "How many minutes do you want between spin checks?"
read T
SEC=$(bc <<< "$T*60")            # bc is a calculator

# Get list of drives; remove SanDisk
DEVLIST1=$(camcontrol devlist)
DEVLIST="$(echo "$DEVLIST1"|sed '/SanDisk/d' )"

print_header

# Main loop
while [ 1 ] ; do
    # Print header every quarter day.  Expression removes any
    # leading 0 so it is not seen as octal
    HM=$(date +%k%M); HM=`expr $HM + 0`
    R=$(( HM % 600 ))  # remainder after dividing by 6 hours
    if (( $R < $T )); then print_header; fi
    # print time on each line
    TIME=$(date "+%H:%M:%S"); echo -n "$TIME  "
    Tmax=0; Tsum=0  # initialize drive temps for new loop through drives
    while read LINE ; do
        get_disk_name
        TEMP=$(smartctl -a -n standby "/dev/$DEVID" | grep "Temp" | grep -o "..$")
        smartctl -n standby "/dev/$DEVID" > /var/tempfile
        RETURN=$?        # need to preserve because $? changes with each if comparison
        if [[ $RETURN == "0" ]] ; then
            STATE="Spin"
        elif [[ $RETURN == "2" ]] ; then
            STATE="STANDBY"
        else
            STATE="UNKNOWN"
        fi
        printf "%-8s" $STATE" "$TEMP
        # Update temperatures each drive
        let Tsum=$Tsum+$TEMP
        if [[ $TEMP > $Tmax ]]; then Tmax=$TEMP; fi
    done <<< "$DEVLIST"
    data  # manage data
    sleep $(($T*60)) # seconds between runs
done

# Logs:
#    - disk status (spinning or standby)
#   - disk temperature (Celsius) if spinning
#   - max and mean disk temperature
#   - fan rpm and mode
#   - current and new fan duty cycle
#   - optional diagnostic variables
# Includes disks on motherboard and on HBA.

# Uses joeschmuck's smartctl method (returns 0 if spinning, 2 in standby)
# https://forums.freenas.org/index.php?threads/how-to-find-out-if-a-drive-is-spinning-down-properly.2068/#post-28451
# Other method (camcontrol cmd -a) doesn't work with HBA

# See "DEVLIST=". Currently set to drop SanDisk flash drives from the camcontrol device list.
# You may need some other strategy, e.g., find something in the camcontrol devlist
# output that is unique to the drives you want to test, for instance only WDC drives:
# if [[ $LINE != *"WDC"* ]] . . .

spinpid.sh
Code:
#!/usr/local/bin/bash
# spinpid.sh version 2016-05-13. Run as superuser. See notes at end.

# Settings:
SP=33.57    #  Setpoint mean temperature
T=5            #  Time interval in minutes
Kp=4        #  Proportional tunable constant
Ki=0        #  Integral tunable constant
Kd=40        #  Derivative tunable constant
LOG=spinpid.log

# Creates logfile and sends all stdout and stderr to the log, leaving the previous contents in place.
# If you want to append to existing log, add '-a' to the tee command.
exec > >(tee -i $LOG) 2>&1

function get_disk_name {
# The awk statement works by taking $LINE as input,
# setting ',' as a _F_ield separator and taking the second field it separates
# (ie after the separator), passing that to another awk that uses
# ')' as a separator, and taking the first field (ie before the separator).
# In other words, everything between ',' and ')' is kept.
    DEVID=$(echo $LINE | awk -F ',' '{print $2}' | awk -F ')' '{print$1}')
}

# Header is printed when script starts and each quarter day
function print_header {
    i=0  # counting number of drives
    DATE=$(date +"%A, %b %d")
    echo $DATE
    echo -n "          "
    while read LINE ; do
        get_disk_name
        let "i += 1"
        printf "%-8s" $DEVID
     done <<< "$DEVLIST"        # while statement works on DEVLIST
    printf "%4s %5s %4s %-8s %s \n" "Tmax" "Tmean" "RPM" "MODE" "Fan% Curr/New"
}

function data {
    Tmean=$(echo "scale=3; $Tsum / $i" | bc)
    ERRp=$ERRc
    ERRc=$(echo "scale=2; $Tmean - $SP" | bc)
    ERR=$(echo "scale=2; $ERRc * $T + $I" | bc)
    P=$(echo "scale=2; $Kp * $ERRc" | bc)
    I=$(echo "scale=2; $Ki * $ERR" | bc)
    D=$(echo "scale=2; $Kd * ($ERRc - $ERRp) / $T" | bc)
    PID=$(echo "scale=2; $P + $I + $D" | bc)  # add 3 corrections
    PID=$(printf %0.f $PID)  # round
    PID=$( printf "0x%x" $PID )  # fully hexify with '0x' in front
    # Read duty cycle and mode, convert to decimal
    DUTY=$(ipmitool raw 0x30 0x70 0x66 0 0) # in hex
    DUTY=$(printf "0x%s" $DUTY) # adds Ox in front
    DUTY=`echo $(($DUTY))`      # converts to decimal
     MODE=$(ipmitool raw 0x30 0x45 0)
    MODE=$(printf "0x%s" $MODE)
    MODE=`echo $(($MODE))`
    # Text for mode
    case $MODE in
        0) MODEt="Standard" ;;
        4) MODEt="HeavyIO" ;;
        2) MODEt="Optimal" ;;
        1) MODEt="Full" ;;
    esac
    # Get reported fan speed in RPM.
    # Takes the line with FAN1, then 2nd through the 5th
    # digit if there are that many.
    RPM=$(ipmitool sdr | grep "FAN1" | grep -Eo '[0-9]{2,5}')
    # print current Tmax, speed, mode, and duty
    printf "^%-3d %5.2f %4d %-8s %2d/" $Tmax $Tmean $RPM $MODEt $DUTY
}

function adjust_fans {
    # Calculates correction and sets duty cycle, also optional diagnostic printout
    # Reset BMC if fans seem stuck: cool and >80% OR warm and <30%
#     if [[ $Tmax<$(($SP - 1)) && $DUTY>0x50 ]] || [[ $Tmax>$(($SP + 5)) && $DUTY<0x1E ]]; then
#         ipmitool bmc reset warm; fi
        # Add DUTY + correction
        DUTY=$( printf "0x%x" $(($DUTY+$PID)) )
    # Attempt to avoid BMC changing DUTY.  If new DUTY<50,
    # set mode to Standard, if >50, HighIO
    # But this inserts extra newline, removed with echo -n
    if (( $DUTY >= 50 )); then MODE2=4
        else MODE2=0; fi
    if (( $MODE != $MODE2 )); then
        echo -n $(ipmitool raw 0x30 0x45 1 $MODE2); sleep 1
    fi
    # Don't allow duty cycle beyond min/max (20/90%)
    if [[ $DUTY -gt 90 ]]; then DUTY=90; fi
    if [[ $DUTY -lt 25 ]]; then DUTY=25; fi
    printf "%d" $DUTY  # print new duty cycle as decimal integer
    # Uncomment for troubleshooting or just more info:
    printf "        ERRc=%5.2f; P=%6.2f; I=%5.2f; D=%6.2f" $ERRc $P $I $D
    # Set new duty cycle. This inserts a newline ending the log line
    ipmitool raw 0x30 0x70 0x66 1 0 $DUTY
}

I=0; ERRc=0  # Initialize errors to 0
# Get list of drives; remove SanDisk
DEVLIST1=$(camcontrol devlist)
DEVLIST="$(echo "$DEVLIST1"|sed '/SanDisk/d' )"

print_header

# Main loop
while [ 1 ] ; do
    # Print header every quarter day.  Expression removes any
    # leading 0 so it is not seen as octal
    HM=$(date +%k%M); HM=`expr $HM + 0`
    R=$(( HM % 600 ))  # remainder after dividing by 6 hours
    if (( $R < $T )); then print_header; fi
    # print time on each line
    TIME=$(date "+%H:%M:%S"); echo -n "$TIME  "
    Tmax=0; Tsum=0  # initialize drive temps for new loop through drives
    while read LINE ; do
        get_disk_name
        TEMP=$(smartctl -a -n standby "/dev/$DEVID" | grep "Temp" | grep -o "..$")
        smartctl -n standby "/dev/$DEVID" > /var/tempfile
        RETURN=$?        # need to preserve because $? changes with each if comparison
        if [[ $RETURN == "0" ]] ; then
            STATE="Spin"
        elif [[ $RETURN == "2" ]] ; then
            STATE="STANDBY"
        else
            STATE="UNKNOWN"
        fi
        printf "%-8s" $STATE" "$TEMP
        # Update temperatures each drive
        let Tsum=$Tsum+$TEMP
        if [[ $TEMP > $Tmax ]]; then Tmax=$TEMP; fi
    done <<< "$DEVLIST"
    data  # manage data
    adjust_fans
    sleep $(($T*60)) # seconds between runs
done

# Adjusts fan with PID control to maintain mean drive temp near set point
# Logs:
#    - disk status (spinning or standby)
#   - disk temperature (Celsius) if spinning
#   - max and mean disk temperature
#   - fan rpm and mode
#   - current and new fan duty cycle
#   - optional diagnostic variables
# Includes disks on motherboard and on HBA.

#  Relation between percent duty cycle, hex value of that number,
#  and RPMs for my fans.  RPM will vary among fans, is not
#  precisely related to duty cycle, and does not matter to the script.
#
#  Percent    Hex        RPM
#  10          A        300
#  20         14        400
#  30         1E        500
#  40         28        600/700
#  50         32        800
#  60         3C        900
#  70         46        1000/1100
#  80         50        1100/1200
#  90         5A        1200/1300
# 100         64        1300

# Tuning suggestions
# PID tuning advice on the internet generally does not work well in this application.
# First run the script spincheck.sh and get familiar with your temperature and fan variations without any intervention.
# Choose a setpoint that is an actual observed Tmean, given the number of drives you have.  It should be the Tmean associated with the Tmax that you want.
# Set Ki=0 and leave it there.  You probably will never need it.
# Start with Kp low.  Use a value that results in a rounded correction=1 when error is the lowest value you observe other than 0  (i.e., when ERRc is minimal, Kp ~= 1 / ERRc)
# Set Kd at about Kp*10
# Get Tmean within ~0.3 degree of SP before starting script.
# Start script and run for a few hours or so.  If Tmean oscillates (best to graph it), you probably need to reduce Kd.  If no oscillation but response is too slow, raise Kd.
# Stop script and get Tmean at least 1 C off SP.  Restart.  If there is overshoot and it goes through some cycles, you may need to reduce Kd.
# If you have problems, examine PK and PD in the log and see which is messing you up.  If all else fails you can try Ki. If you use Ki, make it small, ~ 0.1 or less.

# Uses joeschmuck's smartctl method (returns 0 if spinning, 2 in standby)
# https://forums.freenas.org/index.php?threads/how-to-find-out-if-a-drive-is-spinning-down-properly.2068/#post-28451
# Other method (camcontrol cmd -a) doesn't work with HBA

# See "DEVLIST=". Currently set to drop SanDisk flash drives from the camcontrol device list.
# You may need some other strategy, e.g., find something in the camcontrol devlist
# output that is unique to the drives you want to test, for instance only WDC drives:
# if [[ $LINE != *"WDC"* ]] . . .
I've been playing around with this PID fan control script for a few hours, and it works extremely well.

I did run into one problem that I will report in case someone else is caught by it. On my system (Supermicro X10SL7-F), the fan duty cycle reported to the script was usually wrong. It would report the correct value for a short period after the duty cycle had been updated by the script, then it would report 6%. The fan speed always went to the value commanded by the script, but the reported duty cycle was wrong. The erroneous reported duty cycle was used by the script each time around the loop, and caused the calculated new duty cycle to be much too low.

I solved this problem by specifying the initial duty cycle in the Main Loop, and commenting out the code that read the duty cycle reported by the motherboard in "function data". This allows the script to simply use the previously commanded duty cycle as the starting point for the calculations each time around the loop.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
I've been playing around with this PID fan control script for a few hours, and it works extremely well.

I did run into one problem that I will report in case someone else is caught by it. On my system (Supermicro X10SL7-F), the fan duty cycle reported to the script was usually wrong. It would report the correct value for a short period after the duty cycle had been updated by the script, then it would report 6%. The fan speed always went to the value commanded by the script, but the reported duty cycle was wrong. The erroneous reported duty cycle was used by the script each time around the loop, and caused the calculated new duty cycle to be much too low.

I solved this problem by specifying the initial duty cycle in the Main Loop, and commenting out the code that read the duty cycle reported by the motherboard in "function data". This allows the script to simply use the previously commanded duty cycle as the starting point for the calculations each time around the loop.
Thanks Kevin. I wonder why the board is reporting the wrong duty cycle. Then again, I wonder if the board changed the duty cycle after the script assigned it.

After PigLover reported that setting the mode to Full prevents the board from changing the duty cycle, I changed the script to just stay on Full. My room temperature now fluctuates from about 17 in the morning to mid 20s in late afternoon, so the script adjusts duty cycle from about 30 to 90%, and the board doesn't change it. I'll check out your changes when I get a chance and update the script in that lengthy post.
 
Joined
Dec 2, 2015
Messages
730
Thanks Kevin. I wonder why the board is reporting the wrong duty cycle. Then again, I wonder if the board changed the duty cycle after the script assigned it.
I checked fan speed manually several times over a couple of minutes after I noted that the reported duty cycle had changed. The fan speed remained at a value consistent with the duty cycle set by the script. I.e., the reported duty cycle changed, but the reported fan speed remained constant.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
It was very hot here last week, but in the mountains it cools off a lot at night so we open up the house. So the room with the NAS went from maybe 28 C in the afternoon to 17 C in the early morning. Weather got a bit cooler over these three days as the monsoons kicked in. Here's a graph to show how the PID script handled it.
Diurnal fan control.png


This shows the script starting with the mean drive temp heading up to about 35.3. The setpoint is 33.57. Fan duty cycle quickly went up to 95%, the highest I let it go, and stayed there till after 9 pm. (I guess I will need another fan if I'm going to do SMART tests on a hot day!)

In the cool morning fans were down around 35%. No matter the ambient temperature, drive temps generally remained between 33.3 and 34
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
Really impressive!
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Some awesome looking scripts here.

So, I'm using an X10-SRi-F, which has FAN-A and a five numbered fan headers.

I'm using 24 bay chassis, with 3x 120mm fans in a fan wall in the centre for pulling air through the drive block, which is ideal for PID controlling the drive block ;)

As I understand it, the FAN-A header is the "peripheral" zone, and the rest are the CPU zone. I have 2 cpu fans, and 2 exhaust fans, which I guess will be controlled by the system temperatures.

BUT if the FAN-A header is the sole peripheral zone, how should I connect up the 3 120mm fans? Just use a few fan splitters?

Alternatively, is it possible to use some of the unused "system" fan headers? Can the ipmi control functions be used that way?
 
Joined
Dec 2, 2015
Messages
730
I've got a different board (X10SF7-F), but on my board I can control the numbered fans using ipmi. You probably need to experiment.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
I've got a different board (X10SF7-F), but on my board I can control the numbered fans using ipmi. You probably need to experiment.

Do you just leave the CPU fan to be controlled by the BMC?
 
Joined
Dec 2, 2015
Messages
730
Do you just leave the CPU fan to be controlled by the BMC?
Yes, that is what I did, as it worked well during my testing.
 

Mr Snow

Dabbler
Joined
May 22, 2016
Messages
29
I've got a different board (X10SF7-F), but on my board I can control the numbered fans using ipmi. You probably need to experiment.

Hi Kevin,

Do you mean you can control each individual fan? Or just the region that it is in?

After reading this thread and doing some other research, I'm still a little confused about how you are letting the BMC control the CPU fan whilst doing fine grained control over your chassis fans. I have a X11SSM-F board with FANS 1-4 + FANA, which if I understand correctly, is FANS 1-4 are controlled by the CPU temp and FANA by the system temp, so if I adjust the mode for the CPU region, that is going to spin up all my fans including the CPU?

Sorry if this is answered elsewhere, I couldn't find a response that I understood :)

Regards,

CJ
 
Joined
Dec 2, 2015
Messages
730
Hi Kevin,

Do you mean you can control each individual fan? Or just the region that it is in?

After reading this thread and doing some other research, I'm still a little confused about how you are letting the BMC control the CPU fan whilst doing fine grained control over your chassis fans. I have a X11SSM-F board with FANS 1-4 + FANA, which if I understand correctly, is FANS 1-4 are controlled by the CPU temp and FANA by the system temp, so if I adjust the mode for the CPU region, that is going to spin up all my fans including the CPU?

Sorry if this is answered elsewhere, I couldn't find a response that I understood :)

Regards,

CJ
CJ - All I can tell you is what works on my board, with my BMC version. But you need to know how it works on your system. The only way to know for sure how your system works is to do some testing yourself. Check the speed of all the fans. Run a command to set a new speed, and check the speeds again to see which ones have responded. If you completely lose control of the fan speeds, run the command to reset the BMC (or reboot the system).
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
I've done a little testing now, with an x10sri-F

I replaced the fans with Noctuas to try and quiet it down

The first problem was the BMC thought the fans had stalled when it spun them down, thus it'd crank them to full, then slow them again. Leading to cycling. but by changing the thresholds this was resolved.

The next thing is that if I don't have the HD fans connected to FANA (all three of them) then they won't run high when HeavyIO is enable.

With optimal all fans are controlled by CPU temp. With HeavyIO the A fan is set high. With Full all fans are set high.

I'll be experimenting with the scripts in this thread later, but at least I've resolved the cycling and have now confirmed that by default on Optimal the CPU is being cooled correctly

Thus it should be good enough to just control FANA (hooked up to three fans) with the PID script.

The safe fallback can then be HeavyIO.
 
Last edited:

Mr Snow

Dabbler
Joined
May 22, 2016
Messages
29
CJ - All I can tell you is what works on my board, with my BMC version. But you need to know how it works on your system. The only way to know for sure how your system works is to do some testing yourself. Check the speed of all the fans. Run a command to set a new speed, and check the speeds again to see which ones have responded. If you completely lose control of the fan speeds, run the command to reset the BMC (or reboot the system).

Thanks for the response Kevin. I've had a play on my system and region 0 controls fans 1-4 and region 1 controls the FANA slot, which equates what @PigLover mentioned in a previous thread. Corroboration came when I set Heavy IO mode and only FANA ramped up.

I guess my question then is, if you have your CPU fan plugged in to FAN1, and you are using @Glorious1 's PID script, how are you excluding the CPU fan from being affected by the duty cycle changes?

Regards,

CJ
 
Joined
Dec 2, 2015
Messages
730
Thanks for the response Kevin. I've had a play on my system and region 0 controls fans 1-4 and region 1 controls the FANA slot, which equates what @PigLover mentioned in a previous thread. Corroboration came when I set Heavy IO mode and only FANA ramped up.

I guess my question then is, if you have your CPU fan plugged in to FAN1, and you are using @Glorious1 's PID script, how are you excluding the CPU fan from being affected by the duty cycle changes?

Regards,

CJ
I had a problem with my first motherboard, and had to RMA it. I damaged one of the mounting posts of the stock CPU cooler when removing it to RMA the board, so I had to buy a new cooler. I screwed up and the one I bought is not a PWM fan. It runs at a constant speed, no matter what the BMC is asking for. Testing shows that it is adequate for all loads, so I left it alone. I never got the system working with the original fan, so I don't know how it would have responded to the PID script.
 

Mr Snow

Dabbler
Joined
May 22, 2016
Messages
29
the one I bought is not a PWM fan. It runs at a constant speed, no matter what the BMC is asking for.

Ah, that would explain it. In that case, I'll have a play with the script and with the ipmitool and see if those raw commands can be applied to just the peripheral region. I have a Fractal Design Node 804 case, and I've used a Y cable to connect the two PWM fans on the HDD side in to just the FANA port. I'm hoping I can just set the duty cycle on the FANA slot without setting the mode to full (which would ramp up my CPU fan unnecessarily). I'll post my results here if I have some success.

Regards,

CJ
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Ah, that would explain it. In that case, I'll have a play with the script and with the ipmitool and see if those raw commands can be applied to just the peripheral region. I have a Fractal Design Node 804 case, and I've used a Y cable to connect the two PWM fans on the HDD side in to just the FANA port. I'm hoping I can just set the duty cycle on the FANA slot without setting the mode to full (which would ramp up my CPU fan unnecessarily). I'll post my results here if I have some success.

Regards,

CJ

Hi CJ, this is exactly what I'm trying to do, and I just received my last two fans for my chassis

Looking forward to any information you can share :)
 
Top