Script: Hybrid CPU & HD Fan Zone Controller

Mark Francis

Cadet
Joined
Nov 17, 2016
Messages
9
I'm running FreeNAS-9.3-STABLE-201605170422

Thank you for this script. It's reduced the noise from my fans massively.

Even though my fans seem to be cooling things correctly, this output concerns me:

Code:
2016-11-18 15:21:49: CPU Temp: 26.0 <= 35, CPU Fan going low.
2016-11-18 15:21:49: Maximum HD Temperature: 25
2016-11-18 15:21:49: Drives are cool enough, going to 30%
2016-11-18 15:22:00: CPU Fan speed: No reading
2016-11-18 15:22:12: CPU Fan speed: No reading
2016-11-18 15:22:24: CPU Fan speed: No reading
2016-11-18 15:22:35: CPU Fan speed: No reading
2016-11-18 15:22:47: CPU Fan speed: No reading
2016-11-18 15:22:59: CPU Fan speed: No reading
2016-11-18 15:23:10: CPU Fan speed: No reading
2016-11-18 15:23:22: CPU Fan speed: No reading
2016-11-18 15:23:34: CPU Fan speed: No reading
2016-11-18 15:23:45: CPU Fan speed: No reading
2016-11-18 15:23:57: CPU Fan speed: No reading
2016-11-18 15:24:09: CPU Fan speed: No reading
2016-11-18 15:24:09: Fan speeds are unreadable after 120 seconds, rebooting BMC
2016-11-18 15:24:09: Resetting BMC
2016-11-18 15:24:58: Maximum HD Temperature: 25
ipmitool: ipmi_sdr_get_record() failed
Get SDR 02e5 command failed: BMC initialization in progress
Get SDR 02e5 command failed: BMC initialization in progress
Get SDR 02e5 command failed: BMC initialization in progress
Get SDR 02e5 command failed: BMC initialization in progress
Get SDR 02e5 command failed: BMC initialization in progress
2016-11-18 15:25:09: CPU Fan speed: No reading
Error obtaining SDR info: BMC initialization in progress
Unable to open SDR for reading
2016-11-18 15:25:21: CPU Fan speed: No reading
2016-11-18 15:25:32: CPU Fan speed: No reading
2016-11-18 15:25:44: CPU Fan speed: No reading
2016-11-18 15:25:56: CPU Fan speed: No reading
2016-11-18 15:26:08: CPU Fan speed: No reading


And in debug lvl 2:

Code:
[root@freenas] ~# ./fan_control.pl
2016-11-18 13:59:41: Setting fan mode to 1 (full)
2016-11-18 13:59:46: CPU Temp: 31.0
2016-11-18 13:59:46: CPU Temp: 31.0 <= 35, CPU Fan going low.
2016-11-18 13:59:46: CPU Fan: low
2016-11-18 13:59:46: CPU Fan changing... (low)
2016-11-18 13:59:46: Setting Zone 0 duty cycle to 30%
2016-11-18 13:59:46: /dev/ada0: 28
2016-11-18 13:59:46: Maximum HD Temperature: 28
2016-11-18 13:59:46: Drives are cool enough, going to 30%
2016-11-18 13:59:46: Setting Zone 1 duty cycle to 30%
2016-11-18 13:59:47: CPU Temp: 30.0
2016-11-18 13:59:47: CPU Fan: low
2016-11-18 13:59:48: CPU Temp: 31.0
2016-11-18 13:59:48: CPU Fan: low
2016-11-18 13:59:49: CPU Temp: 31.0
2016-11-18 13:59:49: CPU Fan: low
2016-11-18 13:59:50: CPU Temp: 30.0
2016-11-18 13:59:50: CPU Fan: low
2016-11-18 13:59:51: CPU Temp: 30.0
2016-11-18 13:59:51: CPU Fan: low
2016-11-18 13:59:52: CPU Temp: 30.0
2016-11-18 13:59:52: CPU Fan: low
2016-11-18 13:59:53: CPU Temp: 30.0
2016-11-18 13:59:53: CPU Fan: low
2016-11-18 13:59:54: CPU Temp: 30.0
2016-11-18 13:59:54: CPU Fan: low
2016-11-18 13:59:55: CPU Temp: 30.0
2016-11-18 13:59:55: CPU Fan: low
2016-11-18 13:59:56: CPU Temp: 30.0
2016-11-18 13:59:56: CPU Fan: low
2016-11-18 13:59:57: CPU Temp: 29.0
2016-11-18 13:59:57: CPU Fan: low
2016-11-18 13:59:57: CPU Fan speed: No reading
2016-11-18 13:59:57: CPU Fan speed unavailable
2016-11-18 13:59:58: HD Fan speed: 400 RPM
2016-11-18 13:59:59: CPU Temp: 31.0
2016-11-18 13:59:59: CPU Fan: low
2016-11-18 14:00:00: CPU Temp: 31.0
2016-11-18 14:00:00: CPU Fan: low
2016-11-18 14:00:01: CPU Temp: 35.0
2016-11-18 14:00:01: CPU Fan: low
2016-11-18 14:00:02: CPU Temp: 37.0
2016-11-18 14:00:02: CPU Fan: low
2016-11-18 14:00:03: CPU Temp: 37.0
2016-11-18 14:00:03: CPU Fan: low
2016-11-18 14:00:04: CPU Temp: 37.0
2016-11-18 14:00:04: CPU Fan: low
2016-11-18 14:00:05: CPU Temp: 39.0
2016-11-18 14:00:05: CPU Fan: low
2016-11-18 14:00:06: CPU Temp: 37.0
2016-11-18 14:00:06: CPU Fan: low
2016-11-18 14:00:07: CPU Temp: 37.0
2016-11-18 14:00:07: CPU Fan: low
2016-11-18 14:00:08: CPU Temp: 38.0
2016-11-18 14:00:08: CPU Fan: low
2016-11-18 14:00:09: CPU Temp: 36.0
2016-11-18 14:00:09: CPU Fan: low


Should I be concerned?

I don't know Perl much (read: at all) but it looks to be throwing the error from line 797. It seems to be assigned to a var on line 790, which seems to be an array from $output, which is running "/usr/local/sbin/smartctl -A $disk_dev | grep Temperature_Celsius". So when I run it manually, I get:

Code:
[root@freenas] ~# /usr/local/sbin/smartctl -A /dev/ada0 | grep Temperature_Celsius
194 Temperature_Celsius	 0x0022   118   114   000	Old_age   Always	   -	   29


So my assumption is that line 790 should be changed from:

Code:
my $fan_speed = "$vals[2]";

to

my $fan_speed = "$vals[8]";


Am I on the right track?
 
Last edited:
Joined
Dec 2, 2015
Messages
730
So, I've been working on something.

Code:
# This script is designed to control both the CPU and HD fans in a Supermicro X10 based system according to both
# the CPU and HD temperatures in order to minimize noise while providing sufficient cooling to deal with scrubs
# and CPU torture tests.

# It relies on you having two fan zones.


Its a script that controls both the CPU and HD zones with a Dual Zone Supermicro IPMI board.

It was inspired by Kevin's script and research

Let me know what you think :)
Thanks for this dual zone script. I've been evaluating it while testing my new system with X10SRH-cF in a Norco RPC-4224 chassis. All in all, it works extremely well. The CPU control loop performed excellently during CPU stress testing, reacting very quickly when the CPU temps started climbing, and limiting the CPU temperatures to 77-78° even with all cores running at 100% for 30 minutes. The HD fan control also works well.

At the moment, I'm experimenting with a modified version of this this script that replaces the HD fan control with a PID control loop. It took a bit of script brain surgery to insert the new PID control loop, but I finally got it working a couple of hours ago. I'll test if for a few days, and post it for review and comment if it proves suitable.

My modified script also adds a feature to ramp up the chassis fans if the HD fan duty cycle is high, as that will help increase airflow though the HD zone. For those unfamiliar with the RPC-4224, the HD fans exhaust into the motherboard area, and that air must be exhausted by the chassis fans. Increasing chassis fan speed helps lower the pressure in the motherboard area, increasing the airflow through the HD fans.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Looking forward to seeing it :)

I always wanted to do that, but it just works well enough in my situation it wasn't worth spending any more time on :)

As it is, in a quiet office the other day I could hear the hard drives scrubbing, but not the fans.
 

Ceetan

Contributor
Joined
Apr 29, 2016
Messages
139
The script works great. Unfortunately, I can't seem to get it to autostart postinit. I did esentially the same thing you did @Stux, save for the fact that I added a folder under root called scripts.

Code:
#!/bin/bash
echo "Starting Hybrid Fan Controller..."
/root/scripts/hybrid_fan_controller.pl &>> /root/scripts/fan_control.log &

The startscript itself works as advertised when run in tmux, but
Code:
/root/scripts/start_fancontroller.sh


will not autostart it when put as a postinnit comand. ors script in a mount point on pool.

Both scrips have 755 level permissions.

Anyone has any idea of what I am doing wrong?

It mightbe worth nothing that the other postinit things seem to work.
 

Attachments

  • postinit.jpg
    postinit.jpg
    50.1 KB · Views: 603

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
No Idea :-/

What version of FreeNAS?
 

Ceetan

Contributor
Joined
Apr 29, 2016
Messages
139
currently: FreeNAS-9.10.2-U1 (86c7ef5) ...but I could not get it to work with FreeNAS 9.10.1 either
 

moon

Dabbler
Joined
Jul 17, 2014
Messages
32
Its behaving as if the duty cycle commands aren't working.

I'm also using a X10SL7 and I experienced the same behavior: fan mode set to full and not going to calculated duty cycle (as a consequence the program kept resetting the BMC).
The "ipmitool raw" command was not working.
After updating the IPMI firmware everything works as expected.

Noise is an important requirement for my box and I'd been dreaming for a controller like this for a while.
I'm now testing it and it seems it works very well.

Thank you @Stux and @Kevin Horton
 

sau

Cadet
Joined
Apr 19, 2015
Messages
3
Hi,

Thanks for the script! I was just thinking of doing something similar and found this. I'm not sure if anyone is interested but I made some minor changes to it to accommodate for running as a VM as well as using SAS drives rather than SATA. Happy to post a diff if anyone else is interested.

Thanks!

Steve
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
I'm interested in seeing the changes.

Also, for those with Xeon Ds, I saw a post which says the secondary zone is on FAN4 not FANA. Go figure.
 

sau

Cadet
Joined
Apr 19, 2015
Messages
3
Hi Stux,

Below are my changes:

I think the biggest difference was the devices names being da[0-9]+, excluding the VMware ones, the different smartctl output, and having to use the network for ipmi access. Also, being virtualized, the cpu temperature is not passed through via sysctl, it only shows a value of 0, the kernel warns about the invalid cpu temp during boot.

Code:

@@ -136,7 +136,7 @@
## IPMITOOL PATH
## The script needs to know where ipmitool is
-$ipmitool = "/usr/local/bin/ipmitool";
+$ipmitool = "/usr/local/bin/ipmitool -I lanplus -H <IP address> -U <admin userid> -f /root/.ipmi ";
## HD POLLING INTERVAL
## The controller will only poll the harddrives periodically. Since hard drives change temperature slowly
@@ -265,7 +265,7 @@
sub get_hd_list
{
-  my $disk_list = `camcontrol devlist | sed 's:.*(::;s:).*::;s:,pass[0-9]*::;s:pass[0-9]*,::' | egrep '^[a]*da[0-9]+\$' | tr '\012' ' '`;
+  my $disk_list = `camcontrol devlist | grep -v VMware | sed 's:.*(::;s:).*::;s:,pass[0-9]*::;s:pass[0-9]*,::' | egrep '^da[0-9]+|^[a]*da[0-9]+\$' | tr '\012' ' '`;
  dprint(3,"$disk_list\n");
  my @vals = split(" ", $disk_list);
@@ -285,7 +285,7 @@
  foreach my $item (@hd_list)
  {
  my $disk_dev = "/dev/$item";
-  my $command = "/usr/local/sbin/smartctl -A $disk_dev | grep Temperature_Celsius";
+  my $command = "/usr/local/sbin/smartctl -A $disk_dev | grep 'Current Drive Temperature'";

  dprint( 3, "$command\n" );
@@ -296,7 +296,7 @@
  my @vals = split(" ", $output);
  # grab 10th item from the output, which is the hard drive temperature (on Seagate NAS HDs)
-  my $temp = "$vals[9]";
+  my $temp = "$vals[3]";
  chomp $temp;
  if( $temp )
@@ -478,8 +478,8 @@
{
  my ($old_cpu_fan_level) = @_;
-#  my $cpu_temp = get_cpu_temp_ipmi();  # no longer used, because sysctl is better, and more compatible.
-  my $cpu_temp = get_cpu_temp_sysctl();
+  my $cpu_temp = get_cpu_temp_ipmi();  # no longer used, because sysctl is better, and more compatible.
+#  my $cpu_temp = get_cpu_temp_sysctl();
  my $cpu_fan_level = decide_cpu_fan_level( $cpu_temp, $old_cpu_fan_level );



For reference, below is my smartctl -A output. I believe it is definitely different between SAS and SATA. I just migrated my data from 3 SATA drives to these SAS ones and for whatever reason, it is different.

Code:
[root@nas] ~# smartctl -A /dev/da1
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
Current Drive Temperature:  34 C
Drive Trip Temperature:  85 C

Manufactured in week 46 of year 2012
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  20
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2297
Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 49237897668198400
 
Joined
Dec 2, 2015
Messages
730
I've created a new forum thread with a modified version of @Stux's script. My version replaces the fan control logic with a PID control loop.
 

golfleep

Dabbler
Joined
Nov 3, 2016
Messages
21
Oh, if you wanted to run my fan controller script in your VM FreeNAS you can get it to talk to the bmc over the 'network' even if it's not running on the metal ;)
@Stux

Thanks for your fan controller script - I saw your post above in a another thread, and I could use a little bit of help figuring out how to get the script (running in a FreeNAS VM) to communicate to the BMC over the network? What modifications to your script are needed to define where ipmitool is? Thanks in advance.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
@Stux

Thanks for your fan controller script - I saw your post above in a another thread, and I could use a little bit of help figuring out how to get the script (running in a FreeNAS VM) to communicate to the BMC over the network? What modifications to your script are needed to define where ipmitool is? Thanks in advance.

See this post

#70

You need to change the ipmitool path to unclude user/password and IP. and you need to use the get_cpu_temp_ipmi() method
 

boynep

Dabbler
Joined
Jan 9, 2012
Messages
29
Good Afternoon,

After initially testing with spincheck. I am actually running the script and seems to be not working properly.

When I run the script 2 of my fans which are connected through the FAN header 4 seems to be running at full speed even though my system is cooler than I have on the script.


CPU Temp Normal 45 degrees C
System Temp Normal 30 degrees C
Peripheral Temp Normal 30 degrees C
MB_10G Temp Normal 34 degrees C
DIMMA1 Temp Normal 32 degrees C
DIMMA2 Temp N/A Not Present!
DIMMB1 Temp N/A Not Present!
DIMMB2 Temp N/A Not Present!


Name
sort.gif
Status
sort.gif
Reading
sort.gif

FAN1 Normal 600 R.P.M
FAN2 Normal 900 R.P.M
FAN3 Normal 800 R.P.M
FAN4 Normal 2700 R.P.M

Not sure where I got it wrong.

My Script is attached below. Any help would be much appreciated. I would like to have disk below 36 and cpu below 65
Code:
#!/usr/local/bin/bash
# spinpid.sh version 2017-01-01. Run as superuser. See notes at end.

##############################################
#
#  Settings
#
##############################################

# Drive Settings:
SP=36		#  Setpoint mean temperature
#  Time interval for checking drives in minutes.  This will only
#  be honored accurately when it is an even multiple of CPU_T.
T=3			
Kp=4			#  Proportional tunable constant
Ki=0			#  Integral tunable constant
Kd=40		   #  Derivative tunable constant
PID=0

# CPU Settings:
CPU_T=60		#  Time interval for checking CPU in seconds
#  Reference temperature for scaling CPU_DUTY (NOT a setpoint).
#  Moving it up or down shifts the range of fan speeds
#  up or down the CPU temperature scale.
CPU_REF=68	 
#  Scalar for scaling CPU_DUTY.  Large number means
#  large changes in fan speed with change in CPU temp
#  (tighter control).  Try 20.
CPU_SCALE=20
# Fan minimum duty cycle (%) (to avoid stalling)
FAN_MIN=25

LOG=/mnt/HomeStorage/NAS/Stuff/freeNASScripts/spinpid.log

##############################################
# function get_disk_name
# Get disk name from current LINE of DEVLIST
##############################################
# The awk statement works by taking $LINE as input,
# setting '(' as a _F_ield separator and taking the second field it separates
# (ie after the separator), passing that to another awk that uses
# ',' as a separator, and taking the first field (ie before the separator).
# In other words, everything between '(' and ',' is kept.

# camcontrol output for disks on HBA seems to reverse every version,
# so need 2 options to get ada/da disk name.
function get_disk_name {
   if [[ $LINE == *",d"* ]] ; then	 # for (pass#,da#) (HBA disks sometimes)
	  DEVID=$(echo $LINE | awk -F ',' '{print $2}' | awk -F ')' '{print$1}')
   else								# for (ada#,pass#) (motherboard disks)
	  DEVID=$(echo $LINE | awk -F '(' '{print $2}' | awk -F ',' '{print$1}')
   fi
}

############################################################
# function print_header
# Called when script starts and each quarter day
############################################################
function print_header {
   DATE=$(date +"%A, %b %d")
   let "SPACES = DEVCOUNT * 5 + 70"  # 5 spaces per drive
   printf "\n%-*s %-8s %s \n" $SPACES "$DATE" "Fan %" "Interim CPU"
   echo -n "		  "
   while read LINE ; do
	  get_disk_name
	  printf "%-5s" $DEVID
   done <<< "$DEVLIST"			 # while statement works on DEVLIST
   printf "%4s %5s %5s %6s %5s %6s %3s %s %4s %-7s %s %s" "Tmax" "Tmean" "ERRc" "P" "I" "D" "CPU" "Driver" "RPM" "MODE" "Curr/New" "Adjustments"
}

#################################################
# function drive_data: Read, process, print data
#################################################
function drive_data {
   Tmean=$(echo "scale=3; $Tsum / $i" | bc)
   ERRp=$ERRc
   ERRc=$(echo "scale=2; $Tmean - $SP" | bc)
   ERR=$(echo "scale=2; $ERRc * $T + $I" | bc)
   P=$(echo "scale=2; $Kp * $ERRc" | bc)
   I=$(echo "scale=2; $Ki * $ERR" | bc)
   D=$(echo "scale=2; $Kd * ($ERRc - $ERRp) / $T" | bc)
   PID=$(echo "scale=2; $P + $I + $D" | bc)  # add 3 corrections
   PID=$(printf %0.f $PID)  # round
   # Read duty cycle, convert to decimal.
   # May need to disable these 3 lines as some boards apparently return
   # incorrect data. In that case just assume $DUTY hasn't changed.
   DUTY_CURR=$($IPMITOOL raw 0x30 0x70 0x66 0 0) # in hex
   DUTY_CURR=$(printf "0x%s" $DUTY_CURR)				 # add Ox in front
   DUTY_CURR=`echo $(($DUTY_CURR))`					  # convert to decimal
   # Read fan mode, convert to decimal.
   MODE=$($IPMITOOL raw 0x30 0x45 0) # in hex
   MODE=$(printf "0x%s" $MODE)				 # add Ox in front
   MODE=`echo $(($MODE))`					  # convert to decimal
   # Text for mode
   case $MODE in
	  0) MODEt="Standard" ;;
	  4) MODEt="HeavyIO" ;;
	  2) MODEt="Optimal" ;;
	  1) MODEt="Full" ;;
   esac
   # Get reported fan speed in RPM.
   # Takes the line with FAN1, then 2nd through the 5th
   # digit if there are that many.
   RPM=$($IPMITOOL sdr | grep "FAN1" | grep -Eo '[0-9]{2,5}')
   # print current Tmax, Tmean, CPU 0 temp, fan speed, mode, and duty and CPU 0 temperature
   printf "^%-3d %5.2f" $Tmax $Tmean
}

##############################################
# function CPU_check_adjust
# Get CPU temp. If above 59, calculate a new
# DUTY_CPU.
# If it is greater than the duty due to the
# drives, and different from current duty,
# apply it to the fans.
##############################################
function CPU_check_adjust {
   # Get temp of CPU 0, strip down to whole degrees C
   CPU_TEMP=$(sysctl -a dev.cpu.0.temperature | awk -F ' ' '{print $2}' | awk -F '.' '{print$1}')
   if [[ $CPU_TEMP -gt 65 ]]; then
	  DUTY_CPU=$( echo "scale=2; $CPU_TEMP - (($CPU_TEMP - $CPU_REF)/10 * -1 * $CPU_SCALE + $CPU_SCALE)" | bc )
	  DUTY_CPU=$( printf %0.f $DUTY_CPU )  # round
   else DUTY_CPU=20;
   fi
   if [[ $DUTY_CPU -gt $DUTY_DRIVE && $DUTY_CPU -ne $DUTY_NEW ]]; then
	  adjust_fans $DUTY_CPU
	  if [[ FIRST_TIME -eq 0 ]]; then printf "%d " $DUTY_CPU; fi
   fi
   FIRST_TIME=0
}

##############################################
# function DRIVES_check_adjust
# Print time on new log line.
# Go through each drive, getting and printing
# status and temp.  Calculate sum and max
# temp, then call function drive_data.
# Apply max of $PID and CPU_CORR to the fans.
##############################################
function DRIVES_check_adjust {
   echo  # start new line
   # print time on each line
   TIME=$(date "+%H:%M:%S"); echo -n "$TIME  "
   Tmax=0; Tsum=0  # initialize drive temps for new loop through drives
   i=0  # count number of spinning drives
   while read LINE ; do
	  get_disk_name
	TEMP=$(/usr/local/sbin/smartctl -a -n standby "/dev/$DEVID" | grep "Temperature_Celsius" | awk '{print $10}')
#	TEMP=$(/usr/local/sbin/smartctl -a -n standby "/dev/$DEVID" | grep "Temperature_Celsius" | pcregrep -o1 '([0-9]*)( \(.*\))?$')
	 /usr/local/sbin/smartctl -n standby "/dev/$DEVID" > /var/tempfile
	  RETURN=$?			   # need to preserve because $? changes with each 'if'
	  if [[ $RETURN == "0" ]] ; then
		 STATE="*"  # spinning
	  elif [[ $RETURN == "2" ]] ; then
		 STATE="_"  # standby
	  else
		 STATE="?"  # state unknown
	  fi
	  printf "%s%-2d  " "$STATE" $TEMP
	  # Update temperatures each drive; spinners only
	  if [ "$STATE" == "*" ] ; then
		 let "Tsum += $TEMP"
		 if [[ $TEMP > $Tmax ]]; then Tmax=$TEMP; fi;
		 let "i += 1"
	  fi
   done <<< "$DEVLIST"
   drive_data  # manage data
   let "DUTY_DRIVE = $DUTY_CURR + $PID"
  
   if [[ $DUTY_DRIVE -gt $DUTY_CPU ]]; then
	  DRIVER="Drives"
	  MAX=$DUTY_DRIVE
   else
	  DRIVER="CPU"
	  MAX=$DUTY_CPU
   fi
   adjust_fans $MAX  # passing higher duty to the function adjust_fans
}

##############################################
# function adjust_fans
# Add correction to current duty,
# set duty, print diagnostic data
##############################################
function adjust_fans {
   # Reset BMC if fans seem stuck: cool and >80% OR warm and <30%
   # if [[ $Tmean<$(($SP - 1)) && $DUTY>0x50 ]] || [[ $Tmean>$(($SP + 5)) && $DUTY<0x1E ]]; then
   #	$IPMITOOL bmc reset warm; fi
   # $1 is the new duty
   # passed to this function when called
   DUTY_NEW=$1
   # Don't allow duty cycle beyond 20/95%
   if [[ $DUTY_NEW -gt 95 ]]; then DUTY_NEW=95; fi
   if [[ $DUTY_NEW -lt $FAN_MIN ]]; then DUTY_NEW=$FAN_MIN; fi
   # Change if different from current duty
   if [[ $DUTY_NEW -ne $DUTY ]]; then
	  DUTYhex=$( printf "0x%x" $DUTY_NEW )  #  hexify
	  # Set new duty cycle. "echo -n ``" prevents newline generated in log
	  echo -n `$IPMITOOL raw 0x30 0x70 0x66 1 0 $DUTYhex`
   fi
}

#####################################################
# All this happens only at the beginning
# Initializing values, list of drives, print header
#####################################################
CPU_LOOPS=$( echo "$T * 60 / $CPU_T" | bc )  # Number of whole CPU loops per drive loop
IPMITOOL=/usr/local/bin/ipmitool
I=0; ERRc=0  # Initialize errors to 0

# Creates logfile and sends all stdout and stderr to the log, as well as to the console.
# If you want to append to existing log, add '-a' to the tee command.
exec > >(tee -i $LOG) 2>&1

# Get list of drives
DEVLIST1=$(/sbin/camcontrol devlist)
# Remove lines with flash drives or SSD; edit as needed
# You could use another strategy, e.g., find something in the camcontrol devlist
# output that is unique to the drives you want, for instance only WDC drives:
# if [[ $LINE != *"WDC"* ]] . . .
DEVLIST="$(echo "$DEVLIST1"|sed '/KINGSTON/d;/ADATA/d;/SanDisk/d;/INTEL/d;/TDKMedia/d;/TDKMedia/d')"
DEVCOUNT=$(echo "$DEVLIST" | wc -l)
# Set mode to 'Full' to avoid BMC changing duty cycle
# Need to wait a tick or it doesn't get 2nd command
# "echo -n ``" to avoid annoying newline generated in log
echo -n `$IPMITOOL raw 0x30 0x45 1 1`; sleep 1
# Then start with 50% duty cycle and let algorithm adjust from there
DUTY_NEW=50
DUTY_DRIVE=50
DUTYhex=$( printf "0x%x" $DUTY_NEW )
echo -n `$IPMITOOL raw 0x30 0x70 0x66 1 0 $DUTYhex`
sleep 3  # let fans respond

printf "\nDrive states:  * spinning;  _ standby;  ? unknown\n"
print_header
FIRST_TIME=1
CPU_check_adjust

###########################################
# Main loop through drives every T minutes
# and CPU every CPU_T seconds
###########################################
while [ 1 ] ; do
   # Print header every quarter day.  Expression removes any
   # leading 0 so it is not seen as octal
   HM=$(date +%k%M); HM=`expr $HM + 0`
   R=$(( HM % 600 ))  # remainder after dividing by 6 hours
   if (( $R < $T )); then
	  print_header;
   fi
   DRIVES_check_adjust
   printf "%6.2f %6.2f %5.2f %6.2f %3d %-6s %4d %-7s %2d/%-6d" $ERRc $P $I $D $CPU_TEMP $DRIVER $RPM $MODEt $DUTY_CURR $DUTY_NEW

   i=0
   while [ $i -lt $CPU_LOOPS ]; do
	  sleep $CPU_T
	  CPU_check_adjust
	  let i=i+1
   done
done

# Adjusts fans based on drive or CPU temperatures, whichever
# needs more cooling. Max temp among drives is maintained at a setpoint
# using a PID algorithm.  CPU temp regulation uses just core 0
# (they all stay within a few degrees of each other).  CPU temp
# need not and cannot be maintained at a setpoint, so PID is not
# used; instead fan duty cycle is simply increased with temp.

# Drives are checked and fans adjusted on a set interval, such as 6 minutes.
# Logging is done at that point.  CPU temps can spike much faster,
# so are checked at a shorter interval, such as 30 seconds.  Those
# adjustments are not logged.

# Logs:
#   - disk status (spinning or standby)
#   - disk temperature (Celsius) if spinning
#   - max and mean disk temperature
#   - CPU 0 temperature
#   - fan rpm and mode
#   - current and new fan duty cycle
#   - PID variables
#   - adjustments to fan duty cycle due to interim CPU loops

# Includes disks on motherboard and on HBA.

#  Relation between percent duty cycle, hex value of that number,
#  and RPMs for my fans.  RPM will vary among fans, is not
#  precisely related to duty cycle, and does not matter to the script.
#  It is merely reported.
#
#  Percent	Hex		RPM
#  10		  A		300
#  20		 14		400
#  30		 1E		500
#  40		 28		600/700
#  50		 32		800
#  60		 3C		900
#  70		 46		1000/1100
#  80		 50		1100/1200
#  90		 5A		1200/1300
# 100		 64		1300

# Some boards apparently report incorrect duty cycle.
# If that is happening, disable lines 86-88 in function drive_data.
# Then the script will assume the duty cycle is the
# same as it was last set.

# Tuning suggestions
# PID tuning advice on the internet generally does not work well in this application.
# First run the script spincheck.sh and get familiar with your temperature and fan variations without any intervention.
# Choose a setpoint that is an actual observed Tmean, given the number of drives you have.  It should be the Tmean associated with the Tmax that you want. 
# Set Ki=0 and leave it there.  You probably will never need it.
# Start with Kp low.  Use a value that results in a rounded correction=1 when error is the lowest value you observe other than 0  (i.e., when ERRc is minimal, Kp ~= 1 / ERRc)
# Set Kd at about Kp*10
# Get Tmean within ~0.3 degree of SP before starting script.
# Start script and run for a few hours or so.  If Tmean oscillates (best to graph it), you probably need to reduce Kd.  If no oscillation but response is too slow, raise Kd.
# Stop script and get Tmean at least 1 C off SP.  Restart.  If there is overshoot and it goes through some cycles, you may need to reduce Kd.
# If you have problems, examine PK and PD in the log and see which is messing you up.  If all else fails you can try Ki. If you use Ki, make it small, ~ 0.1 or less.

# Uses joeschmuck's smartctl method for drive status (returns 0 if spinning, 2 in standby)
# https://forums.freenas.org/index.php?threads/how-to-find-out-if-a-drive-is-spinning-down-properly.2068/#post-28451
# Other method (camcontrol cmd -a) doesn't work with HBA

# Removed from drive_data.  Though it was working
# it doesn't seem right to hexify PID ?????
   # PID=$( printf "0x%x" $PID )  # fully hexify with '0x' in front

 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
This thread is related to a different script
 

boynep

Dabbler
Joined
Jan 9, 2012
Messages
29
Apologies stux. However I might have solved the problem by looking at your thread. Looks like on my board X10SDV-2C-TLN2F only fan 1 and 2 could be controlled by script. Fan 3 and 4 were not controllable and were running at full speed regardless.

Solved by connecting all fans to fan1 and 2.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Think I saw something about fan4 (instead of fan A) being the secondary zone on the xdv boards
 

boynep

Dabbler
Joined
Jan 9, 2012
Messages
29
Yes, I saw that information and connected my fans to fan1 and fan2 and voila. The script from glorious1 works like a charm.

Sent from my SM-N910G using Tapatalk
 

Bill Cowger

Cadet
Joined
Aug 28, 2015
Messages
8
Can anybody tell how to run this script. I am still learning the cmd line and cannot figure out how to get this to run. Any help would be appreciated.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Can anybody tell how to run this script. I am still learning the cmd line and cannot figure out how to get this to run. Any help would be appreciated.

I do have instructions in the first post. Have you seen those?
 
Top