Script to control fan speed in response to hard drive temperatures

Incogito

Dabbler
Joined
Jan 4, 2017
Messages
12
Hello,

Thanks for all the goodness and hard work in this thread.
Just a trifling, small contribution :
  • I am running both Seagate ST4000VN008, and Western Digital WD40EFRX
  • The former (Seagate drives) display the "Temperature Celsius" in a different fashion (e.g. "31 (0 21 0 0 0)")
  • This is handled easily by changing:
    Code:
    /usr/local/sbin/smartctl -a -n standby "/dev/$DEVID" | grep "Temperature_Celsius" | grep -o "..$"
  • To
    Code:
    /usr/local/sbin/smartctl -a -n standby "/dev/$DEVID" | grep "Temperature_Celsius" | pcregrep -o1 '([0-9]*)( \(.*\))?$'
Alex
 

boynep

Dabbler
Joined
Jan 9, 2012
Messages
29
Just would like to say big thank you to glorious1, stux and Kevin. My system is similar to glorious1 as I only have one zone. I really appreciate the time you guys have put on to these.
 

boynep

Dabbler
Joined
Jan 9, 2012
Messages
29
Not sure what I am getting wrong but I am getting error
Code:


da1  da2  da3  da4  da5  da6  ada0 da7  da8  Tmax Tmean  ERRc CPU FAN1 FAN2 FAN3 FAN4 FANA Fan%0 Fan%1 MODE  

17:42:02  *37  *33  *33  *35  *41  *39  *0   ./spincheck.sh: line 111: let: Tsum += : syntax error: operand expected (error token is "+= ")

?0   ?0   ^41  31.14 -2.43  62  600 1400 1000 1400  ---    42    42 Standardq


The line in the code is
Code:
if [ "$STATE" == "*" ] ; then
         let "Tsum += $TEMP"
         if [[ $TEMP > $Tmax ]]; then Tmax=$TEMP; fi;
         let "i += 1"
      fi
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Something is fishy there. The 7th drive has a temperature that looks like 0, and for all I know could have a space after it. I'm not sure because the pasted output is out of alignment, but is that drive ada0? It also seems odd that it is listed out of order.

Please show the output of
smartctl -a -n standby "/dev/ada0" | grep "Temperature_Celsius"
(if you don't see a reasonable temperature on the output, please do the command without the '|' and everything following)
and
camcontrol devlist

You need sudo in front of those unless you are logged in as root.
 

boynep

Dabbler
Joined
Jan 9, 2012
Messages
29
Here is the output of given command. The drive in question is m.2 ssd from intel. Is the problem because there is no temperature information in the device ?
Code:

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff} p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff; min-height: 13.0px} span.s1 {font-variant-ligatures: no-common-ligatures} span.Apple-tab-span {white-space:pre}

[root@freenas] ~# cd /mnt/HomeStorage/NAS/Stuff/freeNASScripts

[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# smartctl -a -n standby "/dev/ada0" | grep "Temperature_Celsius"

[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# smartctl -a -n standby "/dev/ada0"

smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)

Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org


=== START OF INFORMATION SECTION ===

Device Model:	 INTEL SSDSCKKW480H6

Serial Number:	CVLY619500L2480F

LU WWN Device Id: 5 5cd2e4 14cbf9fe5

Firmware Version: LSF036C

User Capacity:	480,103,981,056 bytes [480 GB]

Sector Size:	  512 bytes logical/physical

Rotation Rate:	Solid State Device

Form Factor:	  M.2

Device is:		Not in smartctl database [for details use: -P showall]

ATA Version is:   ACS-3 (minor revision not indicated)

SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)

Local Time is:	Wed Mar  1 14:20:34 2017 AEDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

Power mode is:	ACTIVE or IDLE


=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED


General SMART Values:

Offline data collection status:  (0x03) Offline data collection activity

is in progress.

Auto Offline Data Collection: Disabled.

Self-test execution status:	  ( 242) Self-test routine in progress...

20% of test remaining.

Total time to complete Offline 

data collection: (	0) seconds.

Offline data collection

capabilities:  (0x53) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

No Offline surface scan supported.

Self-test supported.

No Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:			(0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:		(0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine 

recommended polling time:  (   2) minutes.

Extended self-test routine

recommended polling time:  (  30) minutes.

SCT capabilities:  	   (0x0039) SCT Status supported.

SCT Error Recovery Control supported.

SCT Feature Control supported.

SCT Data Table supported.


SMART Attributes Data Structure revision number: 1

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE

  5 Reallocated_Sector_Ct   0x0032   100   100   000	Old_age   Always	   -	   0

  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   50

 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   51

170 Unknown_Attribute	   0x0033   100   100   010	Pre-fail  Always	   -	   0

171 Unknown_Attribute	   0x0032   100   100   010	Old_age   Always	   -	   0

172 Unknown_Attribute	   0x0032   100   100   010	Old_age   Always	   -	   0

174 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   41

183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -	   0

184 End-to-End_Error		0x0033   100   100   090	Pre-fail  Always	   -	   0

187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -	   0

190 Airflow_Temperature_Cel 0x0032   048   067   000	Old_age   Always	   -	   48 (Min/Max 25/67)

192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   41

199 UDMA_CRC_Error_Count	0x0032   100   100   000	Old_age   Always	   -	   0

225 Unknown_SSD_Attribute   0x0032   100   100   000	Old_age   Always	   -	   76116

226 Unknown_SSD_Attribute   0x0032   100   100   000	Old_age   Always	   -	   0

227 Unknown_SSD_Attribute   0x0032   100   100   000	Old_age   Always	   -	   0

228 Power-off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   0

232 Available_Reservd_Space 0x0033   100   100   010	Pre-fail  Always	   -	   0

233 Media_Wearout_Indicator 0x0032   100   100   000	Old_age   Always	   -	   0

241 Total_LBAs_Written	  0x0032   100   100   000	Old_age   Always	   -	   76116

242 Total_LBAs_Read		 0x0032   100   100   000	Old_age   Always	   -	   91287

249 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   1378

252 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   4


SMART Error Log Version: 1

No Errors Logged


SMART Self-test log structure revision number 1

Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline	   Completed without error	   00%		50		 -

# 2  Short offline	   Completed without error	   00%		50		 -

# 3  Extended offline	Completed without error	   00%		50		 -

# 4  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

# 5  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

# 6  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

# 7  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

# 8  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

# 9  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#10  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#11  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#12  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#13  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#14  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#15  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#16  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#17  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#18  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#19  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#20  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#21  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -


SMART Selective self-test log data structure revision number 1

 SPAN		 MIN_LBA		 MAX_LBA  CURRENT_TEST_STATUS

	1  70403103932424  70403103932424  Not_testing

	2  70403103932424  70403103932424  Not_testing

	3  70403103932424  70403103932424  Not_testing

	4  70403103932424  70403103932424  Not_testing

	5  70403103932424  70403103932424  Not_testing

Selective self-test flags (0x4008):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.


[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# camcontrol devlist

<ATA KINGSTON SV300S3 BBF0>		at scbus0 target 0 lun 0 (pass0,da0)

<ATA WDC WD30EFRX-68E 0A82>		at scbus0 target 1 lun 0 (pass1,da1)

<ATA ST3000VN000-1HJ1 SC60>		at scbus0 target 2 lun 0 (pass2,da2)

<ATA TOSHIBA MK5065GS 1M>		  at scbus0 target 3 lun 0 (pass3,da3)

<ATA WDC WD3200BPVT-2 1A01>		at scbus0 target 4 lun 0 (pass4,da4)

<ATA Hitachi HDS72303 AA10>		at scbus0 target 5 lun 0 (pass5,da5)

<ATA ST3000VN000-1H41 SC44>		at scbus0 target 6 lun 0 (pass6,da6)

<INTEL SSDSCKKW480H6 LSF036C>	  at scbus1 target 0 lun 0 (pass7,ada0)

<TDKMedia Gold Flash PMAP>		 at scbus8 target 0 lun 0 (pass8,da7)

<TDKMedia Gold Flash PMAP>		 at scbus9 target 0 lun 0 (pass9,da8)

[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# 
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Right. You don't want to include a solid state drive, nor the flash drives that follow. You should add unique text to the line
DEVLIST="$(echo "$DEVLIST1"|sed '/KINGSTON/d;/ADATA/d;/SanDisk/d;/OCZ/d;/LSI/d')"
Just add your unique text to the list, without the quotes, like: ";/INTEL/d;/TDKMedia/d"
 

boynep

Dabbler
Joined
Jan 9, 2012
Messages
29
Thank you Glorious1. That fixed the issue. I am runniing spincheck.sh successfully. However, before I get my 2 noctua 3000rpm fans for HDD was testing it and ran into error with spinpid.

Code:


[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# ./spinpid.sh


Drive states:  * spinning;  _ standby;  ? unknown


Wednesday, Mar 01																					Fan %	Interim CPU 

		  da1  da2  da3  da4  da5  da6  Tmax Tmean  ERRc	  P	 I	  D CPU Driver  RPM MODE	Curr/New Adjustments

20:30:39  *37  ./spinpid.sh: line 158: printf: 0): invalid number

*0   ./spinpid.sh: line 161: let: Tsum += 0): syntax error in expression (error token is ")")

./spinpid.sh: line 158: printf: 1): invalid number

*1   ./spinpid.sh: line 161: let: Tsum += 1): syntax error in expression (error token is ")")

*34  ./spinpid.sh: line 158: printf: 8): invalid number

*8   ./spinpid.sh: line 161: let: Tsum += 8): syntax error in expression (error token is ")")

./spinpid.sh: line 158: printf: 0): invalid number

*0   ./spinpid.sh: line 161: let: Tsum += 0): syntax error in expression (error token is ")")

./spinpid.sh: line 106: printf: 8): invalid number

^8   13.33-20.24 -80.95  0.00 -134.91  60 CPU	 600 Full	50/25	  

^C



I have modified the devlist as per the spincheck as per below.

Code:
#!/usr/local/bin/bash
# spinpid.sh version 2017-01-01. Run as superuser. See notes at end.

##############################################
#
#  Settings
#
##############################################

# Drive Settings:
SP=33.57		#  Setpoint mean temperature
#  Time interval for checking drives in minutes.  This will only
#  be honored accurately when it is an even multiple of CPU_T.
T=6			
Kp=4			#  Proportional tunable constant
Ki=0			#  Integral tunable constant
Kd=40		   #  Derivative tunable constant
PID=0

# CPU Settings:
CPU_T=60		#  Time interval for checking CPU in seconds
#  Reference temperature for scaling CPU_DUTY (NOT a setpoint).
#  Moving it up or down shifts the range of fan speeds
#  up or down the CPU temperature scale.
CPU_REF=68	  
#  Scalar for scaling CPU_DUTY.  Large number means
#  large changes in fan speed with change in CPU temp
#  (tighter control).  Try 20.
CPU_SCALE=20
# Fan minimum duty cycle (%) (to avoid stalling)
FAN_MIN=25

LOG=/mnt/HomeStorage/NAS/Stuff/freeNASScripts/spinpid.log

##############################################
# function get_disk_name
# Get disk name from current LINE of 
##############################################
# The awk statement works by taking $LINE as input,
# setting '(' as a _F_ield separator and taking the second field it separates
# (ie after the separator), passing that to another awk that uses
# ',' as a separator, and taking the first field (ie before the separator).
# In other words, everything between '(' and ',' is kept.

# camcontrol output for disks on HBA seems to reverse every version,
# so need 2 options to get ada/da disk name.
function get_disk_name {
   if [[ $LINE == *",d"* ]] ; then	 # for (pass#,da#) (HBA disks sometimes)
	  DEVID=$(echo $LINE | awk -F ',' '{print $2}' | awk -F ')' '{print$1}')
   else								# for (ada#,pass#) (motherboard disks)
	  DEVID=$(echo $LINE | awk -F '(' '{print $2}' | awk -F ',' '{print$1}')
   fi
}

############################################################
# function print_header
# Called when script starts and each quarter day
############################################################
function print_header {
   DATE=$(date +"%A, %b %d")
   let "SPACES = DEVCOUNT * 5 + 70"  # 5 spaces per drive
   printf "\n%-*s %-8s %s \n" $SPACES "$DATE" "Fan %" "Interim CPU"
   echo -n "		  "
   while read LINE ; do
	  get_disk_name
	  printf "%-5s" $DEVID
   done <<< "$DEVLIST"			 # while statement works on DEVLIST
   printf "%4s %5s %5s %6s %5s %6s %3s %s %4s %-7s %s %s" "Tmax" "Tmean" "ERRc" "P" "I" "D" "CPU" "Driver" "RPM" "MODE" "Curr/New" "Adjustments"
}

#################################################
# function drive_data: Read, process, print data
#################################################
function drive_data {
   Tmean=$(echo "scale=3; $Tsum / $i" | bc)
   ERRp=$ERRc
   ERRc=$(echo "scale=2; $Tmean - $SP" | bc)
   ERR=$(echo "scale=2; $ERRc * $T + $I" | bc)
   P=$(echo "scale=2; $Kp * $ERRc" | bc)
   I=$(echo "scale=2; $Ki * $ERR" | bc)
   D=$(echo "scale=2; $Kd * ($ERRc - $ERRp) / $T" | bc)
   PID=$(echo "scale=2; $P + $I + $D" | bc)  # add 3 corrections
   PID=$(printf %0.f $PID)  # round
   # Read duty cycle, convert to decimal.
   # May need to disable these 3 lines as some boards apparently return
   # incorrect data. In that case just assume $DUTY hasn't changed.
   DUTY_CURR=$($IPMITOOL raw 0x30 0x70 0x66 0 0) # in hex
   DUTY_CURR=$(printf "0x%s" $DUTY_CURR)				 # add Ox in front
   DUTY_CURR=`echo $(($DUTY_CURR))`					  # convert to decimal
   # Read fan mode, convert to decimal.
   MODE=$($IPMITOOL raw 0x30 0x45 0) # in hex
   MODE=$(printf "0x%s" $MODE)				 # add Ox in front
   MODE=`echo $(($MODE))`					  # convert to decimal
   # Text for mode
   case $MODE in
	  0) MODEt="Standard" ;;
	  4) MODEt="HeavyIO" ;;
	  2) MODEt="Optimal" ;;
	  1) MODEt="Full" ;;
   esac
   # Get reported fan speed in RPM.
   # Takes the line with FAN1, then 2nd through the 5th
   # digit if there are that many.
   RPM=$($IPMITOOL sdr | grep "FAN1" | grep -Eo '[0-9]{2,5}')
   # print current Tmax, Tmean, CPU 0 temp, fan speed, mode, and duty and CPU 0 temperature
   printf "^%-3d %5.2f" $Tmax $Tmean 
}

##############################################
# function CPU_check_adjust
# Get CPU temp. If above 59, calculate a new
# DUTY_CPU.
# If it is greater than the duty due to the
# drives, and different from current duty,
# apply it to the fans.
##############################################
function CPU_check_adjust {
   # Get temp of CPU 0, strip down to whole degrees C
   CPU_TEMP=$(sysctl -a dev.cpu.0.temperature | awk -F ' ' '{print $2}' | awk -F '.' '{print$1}')
   if [[ $CPU_TEMP -gt 59 ]]; then
	  DUTY_CPU=$( echo "scale=2; $CPU_TEMP - (($CPU_TEMP - $CPU_REF)/10 * -1 * $CPU_SCALE + $CPU_SCALE)" | bc )
	  DUTY_CPU=$( printf %0.f $DUTY_CPU )  # round
   else DUTY_CPU=20;
   fi 
   if [[ $DUTY_CPU -gt $DUTY_DRIVE && $DUTY_CPU -ne $DUTY_NEW ]]; then
	  adjust_fans $DUTY_CPU
	  if [[ FIRST_TIME -eq 0 ]]; then printf "%d " $DUTY_CPU; fi
   fi
   FIRST_TIME=0
}

##############################################
# function DRIVES_check_adjust
# Print time on new log line. 
# Go through each drive, getting and printing 
# status and temp.  Calculate sum and max
# temp, then call function drive_data.
# Apply max of $PID and CPU_CORR to the fans.
##############################################
function DRIVES_check_adjust {
   echo  # start new line
   # print time on each line
   TIME=$(date "+%H:%M:%S"); echo -n "$TIME  "
   Tmax=0; Tsum=0  # initialize drive temps for new loop through drives
   i=0  # count number of spinning drives
   while read LINE ; do
	  get_disk_name
	  TEMP=$(/usr/local/sbin/smartctl -a -n standby "/dev/$DEVID" | grep "Temperature_Celsius" | grep -o "..$")
	  /usr/local/sbin/smartctl -n standby "/dev/$DEVID" > /var/tempfile
	  RETURN=$?			   # need to preserve because $? changes with each 'if'
	  if [[ $RETURN == "0" ]] ; then
		 STATE="*"  # spinning
	  elif [[ $RETURN == "2" ]] ; then
		 STATE="_"  # standby
	  else
		 STATE="?"  # state unknown
	  fi
	  printf "%s%-2d  " "$STATE" $TEMP
	  # Update temperatures each drive; spinners only
	  if [ "$STATE" == "*" ] ; then
		 let "Tsum += $TEMP"
		 if [[ $TEMP > $Tmax ]]; then Tmax=$TEMP; fi;
		 let "i += 1"
	  fi
   done <<< "$DEVLIST"
   drive_data  # manage data
   let "DUTY_DRIVE = $DUTY_CURR + $PID"
   
   if [[ $DUTY_DRIVE -gt $DUTY_CPU ]]; then 
	  DRIVER="Drives"
	  MAX=$DUTY_DRIVE
   else 
	  DRIVER="CPU"
	  MAX=$DUTY_CPU
   fi
   adjust_fans $MAX  # passing higher duty to the function adjust_fans
}

##############################################
# function adjust_fans 
# Add correction to current duty, 
# set duty, print diagnostic data
##############################################
function adjust_fans {
   # Reset BMC if fans seem stuck: cool and >80% OR warm and <30%
   # if [[ $Tmean<$(($SP - 1)) && $DUTY>0x50 ]] || [[ $Tmean>$(($SP + 5)) && $DUTY<0x1E ]]; then
   #	$IPMITOOL bmc reset warm; fi
   # $1 is the new duty
   # passed to this function when called
   DUTY_NEW=$1
   # Don't allow duty cycle beyond 20/95%
   if [[ $DUTY_NEW -gt 95 ]]; then DUTY_NEW=95; fi
   if [[ $DUTY_NEW -lt $FAN_MIN ]]; then DUTY_NEW=$FAN_MIN; fi
   # Change if different from current duty
   if [[ $DUTY_NEW -ne $DUTY ]]; then
	  DUTYhex=$( printf "0x%x" $DUTY_NEW )  #  hexify
	  # Set new duty cycle. "echo -n ``" prevents newline generated in log
	  echo -n `$IPMITOOL raw 0x30 0x70 0x66 1 0 $DUTYhex`
   fi
}

#####################################################
# All this happens only at the beginning
# Initializing values, list of drives, print header
#####################################################
CPU_LOOPS=$( echo "$T * 60 / $CPU_T" | bc )  # Number of whole CPU loops per drive loop
IPMITOOL=/usr/local/bin/ipmitool
I=0; ERRc=0  # Initialize errors to 0

# Creates logfile and sends all stdout and stderr to the log, as well as to the console.
# If you want to append to existing log, add '-a' to the tee command.
exec > >(tee -i $LOG) 2>&1

# Get list of drives
DEVLIST1=$(/sbin/camcontrol devlist)
# Remove lines with flash drives or SSD; edit as needed
# You could use another strategy, e.g., find something in the camcontrol devlist
# output that is unique to the drives you want, for instance only WDC drives:
# if [[ $LINE != *"WDC"* ]] . . .
DEVLIST="$(echo "$DEVLIST1"|sed '/SanDisk/d;/INTEL/d;/TDKMedia/d')"
DEVCOUNT=$(echo "$DEVLIST" | wc -l)
# Set mode to 'Full' to avoid BMC changing duty cycle
# Need to wait a tick or it doesn't get 2nd command
# "echo -n ``" to avoid annoying newline generated in log
echo -n `$IPMITOOL raw 0x30 0x45 1 1`; sleep 1
# Then start with 50% duty cycle and let algorithm adjust from there
DUTY_NEW=50
DUTY_DRIVE=50
DUTYhex=$( printf "0x%x" $DUTY_NEW )
echo -n `$IPMITOOL raw 0x30 0x70 0x66 1 0 $DUTYhex`
sleep 3  # let fans respond

printf "\nDrive states:  * spinning;  _ standby;  ? unknown\n"
print_header
FIRST_TIME=1
CPU_check_adjust

###########################################
# Main loop through drives every T minutes
# and CPU every CPU_T seconds
###########################################
while [ 1 ] ; do
   # Print header every quarter day.  Expression removes any
   # leading 0 so it is not seen as octal
   HM=$(date +%k%M); HM=`expr $HM + 0`
   R=$(( HM % 600 ))  # remainder after dividing by 6 hours
   if (( $R < $T )); then
	  print_header; 
   fi
   DRIVES_check_adjust
   printf "%6.2f %6.2f %5.2f %6.2f %3d %-6s %4d %-7s %2d/%-6d" $ERRc $P $I $D $CPU_TEMP $DRIVER $RPM $MODEt $DUTY_CURR $DUTY_NEW

   i=0
   while [ $i -lt $CPU_LOOPS ]; do
	  sleep $CPU_T
	  CPU_check_adjust
	  let i=i+1
   done
done

# Adjusts fans based on drive or CPU temperatures, whichever 
# needs more cooling. Max temp among drives is maintained at a setpoint
# using a PID algorithm.  CPU temp regulation uses just core 0 
# (they all stay within a few degrees of each other).  CPU temp
# need not and cannot be maintained at a setpoint, so PID is not
# used; instead fan duty cycle is simply increased with temp.

# Drives are checked and fans adjusted on a set interval, such as 6 minutes.
# Logging is done at that point.  CPU temps can spike much faster,
# so are checked at a shorter interval, such as 30 seconds.  Those 
# adjustments are not logged.

# Logs:
#   - disk status (spinning or standby)
#   - disk temperature (Celsius) if spinning
#   - max and mean disk temperature
#   - CPU 0 temperature
#   - fan rpm and mode
#   - current and new fan duty cycle
#   - PID variables
#   - adjustments to fan duty cycle due to interim CPU loops

# Includes disks on motherboard and on HBA. 

#  Relation between percent duty cycle, hex value of that number,
#  and RPMs for my fans.  RPM will vary among fans, is not
#  precisely related to duty cycle, and does not matter to the script.
#  It is merely reported.
#
#  Percent	Hex		RPM
#  10		  A		300
#  20		 14		400
#  30		 1E		500
#  40		 28		600/700
#  50		 32		800
#  60		 3C		900
#  70		 46		1000/1100
#  80		 50		1100/1200
#  90		 5A		1200/1300
# 100		 64		1300

# Some boards apparently report incorrect duty cycle.
# If that is happening, disable lines 86-88 in function drive_data.
# Then the script will assume the duty cycle is the
# same as it was last set.

# Tuning suggestions
# PID tuning advice on the internet generally does not work well in this application.
# First run the script spincheck.sh and get familiar with your temperature and fan variations without any intervention.
# Choose a setpoint that is an actual observed Tmean, given the number of drives you have.  It should be the Tmean associated with the Tmax that you want.  
# Set Ki=0 and leave it there.  You probably will never need it.
# Start with Kp low.  Use a value that results in a rounded correction=1 when error is the lowest value you observe other than 0  (i.e., when ERRc is minimal, Kp ~= 1 / ERRc)
# Set Kd at about Kp*10
# Get Tmean within ~0.3 degree of SP before starting script.
# Start script and run for a few hours or so.  If Tmean oscillates (best to graph it), you probably need to reduce Kd.  If no oscillation but response is too slow, raise Kd.
# Stop script and get Tmean at least 1 C off SP.  Restart.  If there is overshoot and it goes through some cycles, you may need to reduce Kd.
# If you have problems, examine PK and PD in the log and see which is messing you up.  If all else fails you can try Ki. If you use Ki, make it small, ~ 0.1 or less.

# Uses joeschmuck's smartctl method for drive status (returns 0 if spinning, 2 in standby)
# https://forums.freenas.org/index.php?threads/how-to-find-out-if-a-drive-is-spinning-down-properly.2068/#post-28451
# Other method (camcontrol cmd -a) doesn't work with HBA

# Removed from drive_data.  Though it was working
# it doesn't seem right to hexify PID ?????
   # PID=$( printf "0x%x" $PID )  # fully hexify with '0x' in front




Not sure where I am going wrong again.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Did you modify spinpid.sh before or after getting the errors you showed? It's not clear because you show me the errors, then you say you modified it.

You're showing me the whole script for some reason. Did you modify anything other than the line that removes non-spinning devices from DEVLIST?

You removed /KINGSTON/d; from the command, but you have a KINGSTON device. Is that a flash drive or a spinning hard drive? If the former, you need to put the KINGSTON deletion back in.

You have a real mixed bunch of drives according to the camcontrol devlist output. Some I haven't worked with before, and they may output temperatures differently. Please give me the output of
smartctl -a -n standby "/dev/da#" | grep "Temperature_Celsius" for each drive, changing da# to the respective drive name each time.

If you don't get a line that includes temperature for any drive, do the command again, without the pipe '|' and what follows. Look for the line that shows temperature, and show me that line. For each drive.
 

boynep

Dabbler
Joined
Jan 9, 2012
Messages
29
Code:

[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# smartctl -a -n standby "/dev/da0" | grep "Temperature_Celsius"

194 Temperature_Celsius	 0x0022   030   041   000	Old_age   Always	   -	   30 (Min/Max 9/41)

[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# smartctl -a -n standby "/dev/da1" | grep "Temperature_Celsius"

194 Temperature_Celsius	 0x0022   116   104   000	Old_age   Always	   -	   34

[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# smartctl -a -n standby "/dev/da2" | grep "Temperature_Celsius"

194 Temperature_Celsius	 0x0022   031   043   000	Old_age   Always	   -	   31 (0 10 0 0 0)

[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# smartctl -a -n standby "/dev/da4" | grep "Temperature_Celsius"

194 Temperature_Celsius	 0x0022   110   096   000	Old_age   Always	   -	   33

[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# smartctl -a -n standby "/dev/da5" | grep "Temperature_Celsius"

194 Temperature_Celsius	 0x0002   162   162   000	Old_age   Always	   -	   37 (Min/Max 13/58)

[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# smartctl -a -n standby "/dev/da6" | grep "Temperature_Celsius"

194 Temperature_Celsius	 0x0022   035   051   000	Old_age   Always	   -	   35 (0 13 0 0 0)

[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# smartctl -a -n standby "/dev/da7" | grep "Temperature_Celsius"

[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# smartctl -a -n standby "/dev/da8" | grep "Temperature_Celsius"

[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# smartctl -a -n standby "/dev/ada0" | grep "Temperature_Celsius"




[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# camcontrol devlist

<ATA KINGSTON SV300S3 BBF0>		at scbus0 target 0 lun 0 (pass0,da0)

<ATA WDC WD30EFRX-68E 0A82>		at scbus0 target 1 lun 0 (pass1,da1)

<ATA ST3000VN000-1HJ1 SC60>		at scbus0 target 2 lun 0 (pass2,da2)

<ATA TOSHIBA MK5065GS 1M>		  at scbus0 target 3 lun 0 (pass3,da3)

<ATA WDC WD3200BPVT-2 1A01>		at scbus0 target 4 lun 0 (pass4,da4)

<ATA Hitachi HDS72303 AA10>		at scbus0 target 5 lun 0 (pass5,da5)

<ATA ST3000VN000-1H41 SC44>		at scbus0 target 6 lun 0 (pass6,da6)

<INTEL SSDSCKKW480H6 LSF036C>	  at scbus1 target 0 lun 0 (pass7,ada0)

<TDKMedia Gold Flash PMAP>		 at scbus8 target 0 lun 0 (pass8,da7)

<TDKMedia Gold Flash PMAP>		 at scbus9 target 0 lun 0 (pass9,da8)



Code:

[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# smartctl -a -n standby "/dev/ada0" | grep "Temperature_Celsius"

[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# smartctl -a -n standby "/dev/ada0" 

smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)

Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org


=== START OF INFORMATION SECTION ===

Device Model:	 INTEL SSDSCKKW480H6

Serial Number:	CVLY619500L2480F

LU WWN Device Id: 5 5cd2e4 14cbf9fe5

Firmware Version: LSF036C

User Capacity:	480,103,981,056 bytes [480 GB]

Sector Size:	  512 bytes logical/physical

Rotation Rate:	Solid State Device

Form Factor:	  M.2

Device is:		Not in smartctl database [for details use: -P showall]

ATA Version is:   ACS-3 (minor revision not indicated)

SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)

Local Time is:	Thu Mar  2 12:13:36 2017 AEDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

Power mode is:	ACTIVE or IDLE


=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED


General SMART Values:

Offline data collection status:  (0x02) Offline data collection activity

was completed without error.

Auto Offline Data Collection: Disabled.

Self-test execution status:	  (   0) The previous self-test routine completed

without error or no self-test has ever 

been run.

Total time to complete Offline 

data collection: (	0) seconds.

Offline data collection

capabilities:  (0x53) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

No Offline surface scan supported.

Self-test supported.

No Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:			(0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:		(0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine 

recommended polling time:  (   2) minutes.

Extended self-test routine

recommended polling time:  (  30) minutes.

SCT capabilities:  	   (0x0039) SCT Status supported.

SCT Error Recovery Control supported.

SCT Feature Control supported.

SCT Data Table supported.


SMART Attributes Data Structure revision number: 1

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE

  5 Reallocated_Sector_Ct   0x0032   100   100   000	Old_age   Always	   -	   0

  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   50

 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   51

170 Unknown_Attribute	   0x0033   100   100   010	Pre-fail  Always	   -	   0

171 Unknown_Attribute	   0x0032   100   100   010	Old_age   Always	   -	   0

172 Unknown_Attribute	   0x0032   100   100   010	Old_age   Always	   -	   0

174 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   41

183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -	   0

184 End-to-End_Error		0x0033   100   100   090	Pre-fail  Always	   -	   0

187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -	   0

190 Airflow_Temperature_Cel 0x0032   041   067   000	Old_age   Always	   -	   41 (Min/Max 25/67)

192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   41

199 UDMA_CRC_Error_Count	0x0032   100   100   000	Old_age   Always	   -	   0

225 Unknown_SSD_Attribute   0x0032   100   100   000	Old_age   Always	   -	   76116

226 Unknown_SSD_Attribute   0x0032   100   100   000	Old_age   Always	   -	   0

227 Unknown_SSD_Attribute   0x0032   100   100   000	Old_age   Always	   -	   0

228 Power-off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   0

232 Available_Reservd_Space 0x0033   100   100   010	Pre-fail  Always	   -	   0

233 Media_Wearout_Indicator 0x0032   100   100   000	Old_age   Always	   -	   0

241 Total_LBAs_Written	  0x0032   100   100   000	Old_age   Always	   -	   76116

242 Total_LBAs_Read		 0x0032   100   100   000	Old_age   Always	   -	   91287

249 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   1378

252 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   4


SMART Error Log Version: 1

No Errors Logged


SMART Self-test log structure revision number 1

Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline	Completed without error	   00%		50		 -

# 2  Short offline	   Completed without error	   00%		50		 -

# 3  Short offline	   Completed without error	   00%		50		 -

# 4  Extended offline	Completed without error	   00%		50		 -

# 5  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

# 6  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

# 7  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

# 8  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

# 9  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#10  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#11  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#12  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#13  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#14  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#15  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#16  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#17  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#18  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#19  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#20  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -

#21  Vendor (0xff)	   Self-test routine in progress 150%	 65535		 -


SMART Selective self-test log data structure revision number 1

 SPAN		 MIN_LBA		 MAX_LBA  CURRENT_TEST_STATUS

	1  70403103932424  70403103932424  Not_testing

	2  70403103932424  70403103932424  Not_testing

	3  70403103932424  70403103932424  Not_testing

	4  70403103932424  70403103932424  Not_testing

	5  70403103932424  70403103932424  Not_testing

Selective self-test flags (0x4008):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.




[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# smartctl -a -n standby "/dev/da7" 

smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)

Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org


/dev/da7: Unknown USB bridge [0x0718:0x048a (0x100)]

Please specify device type with the -d option.


Use smartctl -h to get a usage summary


[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# smartctl -a -n standby "/dev/da8"

smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)

Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org


/dev/da8: Unknown USB bridge [0x0718:0x048a (0x100)]

Please specify device type with the -d option.


Use smartctl -h to get a usage summary




So, from the list the disk without the temperature are da7, da8 which are the TDKMedia flash drive and ada0 which is the intel SSD m.2.

The Kingston is an SATA SSD. I have added Kingston to the list. But still gettting same error. The error was before and after updating the DEVLIST line.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
The problem is the non-Western Digital drives (Seagate, Hitachi, Toshiba) have additional text after the temperature. For WD, which most people seem to use here, the temperature is the last 2 characters. So designing a grep statement to identify the two numbers that represent temperature for all the drives will be challenging. Let me cogitate a bit. It looks like we could identify the hyphen, followed by white space, followed by the two digits. I'll work on that.
 

boynep

Dabbler
Joined
Jan 9, 2012
Messages
29
Actually, Problem could not be the temperature. Spincheck.sh works fine
Code:


[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# ./spincheck.sh

How many whole minutes do you want between spin checks?

1


IMPORTANT NOTE ABOUT DUTY CYCLE (Fan%0 and Fan%1): Some boards apparently report incorrect duty cycle,

and can report duty cycle for zone 1 when that zone does not exist.


				Key to drive state symbols:  * spinning;  _ standby;  ? unknown

Thursday, Mar 02 

		  da1  da2  da3  da4  da5  da6  Tmax Tmean  ERRc CPU FAN1 FAN2 FAN3 FAN4 FANA Fan%0 Fan%1 MODE   

12:46:37  *35  *32  *32  *34  *38  *36  ^38  34.50  0.93  62  600 1400 1000 1400  ---	42	42 Standard
 

boynep

Dabbler
Joined
Jan 9, 2012
Messages
29
Ok, took on the suggested line posted by Incogito and changed

Code:


TEMP=$(/usr/local/sbin/smartctl -a -n standby "/dev/$DEVID" | grep "Temperature_Celsius" | pcregrep -o1 '([0-9]*)( \(.*\))?$')


The result is
Code:



[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# ./spinpid.sh


Drive states:  * spinning;  _ standby;  ? unknown


Thursday, Mar 02																					 Fan %	Interim CPU 

		  da1  da2  da3  da4  da5  da6  Tmax Tmean  ERRc	  P	 I	  D CPU Driver  RPM MODE	Curr/New Adjustments

12:57:08  *36  *32  *33  *34  *38  *36  ^38  34.83  1.26   5.05  0.00   8.42  60 Drives  600 Full	50/63	^C

[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# 
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Right, I forgot about incogito's post. Meanwhile I came up with this alternative that should also work.
TEMP=$(/usr/local/sbin/smartctl -a -n standby "/dev/$DEVID" | grep "Temperature_Celsius" | awk -F '-[ \t]+' '{print $2}' | awk -F '[0-9]{2}' '{print $1}')


What I don't get is why the change is needed in spinpid.sh but not in spincheck.sh. Maybe you already made that change in spincheck.sh?
 

boynep

Dabbler
Joined
Jan 9, 2012
Messages
29
Yes, sorry I had actually made the change in spincheck before posting in the forum which I completely forgot about.

Sent from my SM-N910G using Tapatalk
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
I found a simpler way to do it that should work for all the drive brands. @boynep, I would appreciate if you could test it since you have at least one of everything :smile:
It simply takes the tenth space-delimited field from the line. It may or may not be the last one.
TEMP=$(/usr/local/sbin/smartctl -a -n standby "/dev/$DEVID" | grep "Temperature_Celsius" | awk '{print $10}')
 

boynep

Dabbler
Joined
Jan 9, 2012
Messages
29
@Glorious1 This works perfectly.

Code:

[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# ./spinchecktest.sh

How many whole minutes do you want between spin checks?

1


IMPORTANT NOTE ABOUT DUTY CYCLE (Fan%0 and Fan%1): Some boards apparently report incorrect duty cycle,

and can report duty cycle for zone 1 when that zone does not exist.


				Key to drive state symbols:  * spinning;  _ standby;  ? unknown

Thursday, Mar 02 

		  da1  da2  da3  da4  da5  da6  Tmax Tmean  ERRc CPU FAN1 FAN2 FAN3 FAN4 FANA Fan%0 Fan%1 MODE   

23:19:20  *32  *29  *30  *31  *36  *34  ^36  32.00 -1.57  60  600 1200 1000 1300  ---	36	36 Standard

23:20:24  *32  *29  *30  *31  *36  *34  ^36  32.00 -1.57  60  600 1200 1000 1300  ---	36	36 Standard^C

[root@freenas] /mnt/HomeStorage/NAS/Stuff/freeNASScripts# 
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211

RobiWahn

Cadet
Joined
Apr 18, 2016
Messages
7
Hi all!
I didn't come around to test the script yet, but I think I could help adding the SuperMicro SATA DOMs to the exclusion list.
The lines from "camcontrol devlist" look like the following on my system:

Code:
<SATA SSD S9FM02.1>				at scbus1 target 0 lun 0 (pass8,ada0)
<SATA SSD S9FM02.1>				at scbus2 target 0 lun 0 (pass9,ada1)


So perhaps it would be good to add a simple ;/SSD/d to the default list, and who knows if that simple addition could catch a few other drives that identify themselves as SSDs without having to know/include every SSD manufacturer :)

Greetings!
Robert
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Great, thanks, I'll add that for the next version.

I should have mentioned, there is a discussion thread associated with the resource posting there (kind of a Discussion tab), it's a new feature.
 
Top