How to monitor system (CPU, HDD, mobo, GPU) temperatures on FreeNAS 8?

Jacopx

Patron
Joined
Feb 19, 2016
Messages
367
No clue what you're doing wrong. It works for me. /srhug.

I Have solved The problem! how can I calculate an average temperature of my 8-core? Someone can help me?
 

Jacopx

Patron
Joined
Feb 19, 2016
Messages
367
I Have change my hardware and now my Mail mail report show 16 CPUs temperature but I have only 8 cores, not 16 (it correspond to the threads numbers), but the temperature are the same in group of two! Some help?
12f20bc0d6c7c09cd7d367f007b3b17d.jpg
 

xxxGODxxx

Dabbler
Joined
Sep 8, 2015
Messages
20
May I know what can I change to the script if I keep getting "Syntax error: word unexpected (expecting ")")" ? I just copied the script and changed the email address to mine. The script works when I pasted it into the shel but not as a cron job

This is what I receive in my email when I used the script as a CRON job

Cron <root@freenas> PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" ( echo "To: ***@gmail.com " echo "Subject: System Temperatures INFO" echo " " ) > /var/cover adastat () { echo -n `camcontrol cmd $1 -a "E5 00 00 00 00 00 00 00 00 00 00 00" -r - | awk '{print $10 " " ; }'` " " ; } >> /var/cover echo echo System Temperatures - `date` >> /var/cover cat /etc/version >> /var/cover uptime | awk '{ print "\nSystem Load:",$8,$9,$10,"\n" }' >> /var/cover echo "CPU Temperature:" >> /var/cover sysctl -a | egrep -E "cpu\.[0-9]+\.temp" >> /var/cover echo echo "Drive Activity Status" >> /var/cover for i in $(sysctl -n kern.disks | awk '{for (i=NF; i!=0 ; i--) if(match($i, '/ada/')) print $i }' ); do echo -n $i:; adastat $i; done; echo ; echo >> /var/cover echo "HDD Temperature:" >> /var/cover for i in $(sysctl -n kern.disks | awk '{for (i=NF; i!=0 ; i--) if(match($i, '/ada/')) print $i }' ) do echo $i `smartctl -a /dev/$i | awk '/Temperature_Celsius/{DevTemp=$1

and a follow up message of

Syntax error: word unexpected (expecting ")")
 
Last edited:

Fox

Explorer
Joined
Mar 22, 2014
Messages
66

drwoodcomb

Explorer
Joined
Sep 15, 2016
Messages
74
Hi everyone,

I have been looking for a way to check my hard drive temperatures and I see you guys have been working on a way to do so. Would one of you mind telling me (or pointing me to somewhere that can show me) how to use this script? I dont think I can open Shell and just copy and paste but I dont know enough about FreeNAS to figure out how to run it.
 
Last edited by a moderator:

craniu3000bis

Dabbler
Joined
Jun 5, 2017
Messages
13
Hello,

I have read this topic line by line and manage to adjust this script to send a report to my email.
Unfortunately of the my drive is missing the temperature number.
Can you please help me find out what is the issue?

Code:
#! /usr/local/bin/bash

# Write email header to temp file
(
  echo "Subject: System Temperatures INFO"
  echo " "
) > /var/temp_report

# Define adastat function, which writes drive activity to temp file
adastat () {
  CM=$(camcontrol cmd $1 -a "E5 00 00 00 00 00 00 00 00 00 00 00" -r - | awk '{print $10}')
  if [ "$CM" = "FF" ] ; then
  echo " SPINNING" >> /var/temp_report
  elif [ "$CM" = "00" ] ; then
  echo " IDLE" >> /var/temp_report
  else
  echo " UNKNOWN ($CM)" >> /var/temp_report
  fi
}

# Write some general information
echo System Temperatures - `date` >> /var/temp_report
cat /etc/version >> /var/temp_report
uptime | awk '{ print "\nSystem Load:",$10,$11,$12,"\n" }' >> /var/temp_report

# Write CPU temperatures
echo "CPU Temperature:" >> /var/temp_report
sysctl -a | egrep -E "cpu\.[0-9]+\.temp" >> /var/temp_report
echo >> /var/temp_report

# Write HDD temperatures and status
echo "HDD Temperature:" >> /var/temp_report
for i in $(sysctl -n kern.disks | awk '{for (i=NF; i!=0 ; i--) if(match($i, '/ada/')) print $i }' )
do
echo -n $i: `smartctl -a /dev/$i | awk '/Temperature_Celsius/{DevTemp=$10;} /Serial Number:/{DevSerNum=$3}; /Device Model:/{DevVendor=$3; DevName=$4} \
END {printf "%s C - %s %s (%s) - ", DevTemp,DevVendor,DevName,DevSerNum }'` >> /var/temp_report;
adastat $i;
done

# Send status email
sendmail my_email_address@gmail.com < /var/temp_report
rm /var/temp_report
exit 0


The output is this:

System Temperatures - Wed Jul 5 22:49:27 EEST 2017
FreeNAS-11.0-U1 (aa82cc58d)

System Load: 0.70

CPU Temperature:
dev.cpu.0.temperature: 34.0C

HDD Temperature:
ada0: 44 C - WDC WD1600JB-00REA0 (WD-WMANM4587573) - SPINNING
ada1: 34 C - TOSHIBA MQ01ACF050 (65G8CAC4T) - SPINNING
ada2: 31 C - HGST HTS725050A7E630 (TF655AWH2DK8YL) - SPINNING
ada3: C - HGST HTS725050A7E630 (RCF50ACE27JNPM) - SPINNING
ada4: 34 C - WDC WD3200BEKT-60PVMT0 (WD-WX91A6367399) - SPINNING
ada5: 32 C - WDC WD3200BEKT-60PVMT0 (WD-WX91A6376433) - SPINNING
ada6: 34 C - WDC WD3200BEKT-60PVMT0 (WD-WX51A43X9510) - SPINNING

As you can see for ada3 is missing the number
 

styno

Patron
Joined
Apr 11, 2016
Messages
466
What is the output of smartctl -a /dev/ada3 ?
 

craniu3000bis

Dabbler
Joined
Jun 5, 2017
Messages
13
Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Hitachi/HGST Travelstar Z7K500
Device Model: HGST HTS725050A7E630
Serial Number: RCF50ACE27JNPM
LU WWN Device Id: 5 000cca 85edf9c2c
Firmware Version: GS2OA3C0
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Thu Jul 6 16:31:26 2017 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 45) seconds.
Offline data collection
capabilities: (0x51) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 99) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported. SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 062 Pre-fail Always - 0
2 Throughput_Performance 0x0025 100 100 040 Pre-fail Offline - 0
3 Spin_Up_Time 0x0023 240 100 033 Pre-fail Always - 1
4 Start_Stop_Count 0x0032 094 094 000 Old_age Always - 10377
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x002f 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0025 100 100 040 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 831
10 Spin_Retry_Count 0x0033 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 67
183 Runtime_Bad_Block 0x0032 100 100 001 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 070 055 045 Old_age Always - 30 (Min/Max 29/42)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 3
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 655370
193 Load_Cycle_Count 0x0032 089 089 000 Old_age Always - 111196
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 100 100 000 Old_age Always - 0
223 Load_Retry_Count 0x002a 100 100 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 4 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

styno

Patron
Joined
Apr 11, 2016
Messages
466
There you go, for some reason the drive is not reporting Temperature_Celsius in the smart output but Airflow_Temperature_Cel instead.
You can change the script to grep on Temperature_Ce to cover both.
 

Eniac74

Dabbler
Joined
Jan 9, 2015
Messages
41
I implemented the script that craniu3000bis posted above, and it works fine. However, I have two questions:

1. Regarding the temperature reading from the SATA SSD (ada2 in the list below). Is this really correct, or is it just a dummy value returned for SSDs?
2. Is there a logical explanation to why one disk (ada1) can stand out with a temperature 3 degrees above the rest? This seems to be a constant as it was the same yesterday when I ran the script.


HDD Temperature:
ada0: 38 C - HGST HUS724040ALE640 (PK1334PCK2X89S) - SPINNING
ada1: 41 C - HGST HUS724040ALE640 (PK1334PCK2WGBS) - SPINNING
ada2: 99 C - SATA SSD (67F407431F2400011724) - SPINNING
ada3: 38 C - HGST HUS724020ALA640 (PN2134P6KL6M1X) - SPINNING
ada4: 37 C - HGST HUS724020ALA640 (PN2134P6KK60EX) - SPINNING
 
Joined
Apr 9, 2015
Messages
1,258
Simply put drive location will determine what the temp is. It's not an issue with the script. Less airflow equals a higher temp and they are all pretty warm anyway IMHO. I get warnings if any drive gets to 38c.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478

lbakyl

Cadet
Joined
Aug 26, 2018
Messages
3
Hi,

I have modified the script above to be set up as a cron job that will only send emails if the temperature goes above a set treshold. This is because I prefer to only be notified if somethingis going on rather than receiving regular emails that do not require action to be taken.

Code:
#!/bin/bash
# Optimized for FreeBSD (FreeNAS etc.)
# Save it on your share (not on the system drive!) and run as a regular task (cron) from FreeNAS GUI (as /bin/bash /path/to/script.sh), e.g. every 30 minutes.
# Email will be sent only if a treshold is met, otherwise there is no output.
# The script was verified against https://www.shellcheck.net/ to ensure correct syntax.

#To-do: Display more about the drive, such as model and serial number, so it is easier to identify it!

# Original source that was heavily edited: https://www.reddit.com/r/freenas/comments/6pmewl/temps_and_fan_speeds_in_freenas/

HDD_ALERT_LEVEL="35"
CPU_ALERT_LEVEL="43"
NOTIFY_EMAIL_ADDRESS="abc@efg.com"

#Pre-defining variables
cpu_alert_trigger=0
hdd_alert_trigger=0

# Write email header to temp file
	(
  echo "To: $NOTIFY_EMAIL_ADDRESS"
  echo "Subject: CPU/HDD Temperature Warning"
  echo " "
  echo "Warning! The following components in your system are above the set temperature treshold!"
  echo " "
	) > /tmp/cpu_hdd_temp_check.log

# Check CPU temperature
COUNT_CPU=$(sysctl -a | grep -c "dev.cpu.[0-9].\temperature")
	# Save CPU's temp to a variable array starting from 1 (not zero), so the count is increase by 1
for ((cput=1;cput<"$COUNT_CPU+1";cput++))
do
	CPU_TEMP[$cput]=$(sysctl dev.cpu | grep temperature | awk '{print $2}' | awk -F'[^0-9]*' '$0=$1' | awk "NR==$cput")
	#echo "CPU no.$cput has temperature of ${CPU_TEMP[$cput]}"
	CPU_DETAILED_REPORT="$CPU_DETAILED_REPORT
	CPU no.$cput = ${CPU_TEMP[$cput]} Celsius"
	if [ "${CPU_TEMP[$cput]}" -ge "$CPU_ALERT_LEVEL" ]; then
		#echo "CPU number $cput is at or over the limit!"
		cpu_alert_trigger="1"
	fi
done


# Check HDD temperature
for disk in $(sysctl -n kern.disks)
do
	HDTEMP=$(smartctl -A /dev/"$disk" | grep -i temperature |  awk '{print $10}')
	echo "Temp of $disk is $HDTEMP."
	
	if [[ "$HDTEMP" -ge 1 ]]; then
		DETAILED_REPORT="$DETAILED_REPORT
	Temperature of $disk is $HDTEMP."
		if [[ "$HDTEMP" -ge "$HDD_ALERT_LEVEL" ]]; then
			DRIVES_OVER_LIMIT="$DRIVES_OVER_LIMIT $disk"
			hdd_alert_trigger="1"
		fi
	fi
done

# Set alert text if one of the CPU temperatures was reached.
if [ $cpu_alert_trigger -eq "1" ]; then
		echo "CPU temperature is at or over the limit of $CPU_ALERT_LEVEL:$CPU_DETAILED_REPORT" >> /tmp/cpu_hdd_temp_check.log
		echo " " >> /tmp/cpu_hdd_temp_check.log
fi

# Set alert text if one of the HDDs temperature was reached.
if [ $hdd_alert_trigger -eq "1" ]; then
	echo "These HDDs are over the temperature limit of $HDD_ALERT_LEVEL:$DETAILED_REPORT" >> /tmp/cpu_hdd_temp_check.log
	echo " " >> /tmp/cpu_hdd_temp_check.log
fi

# Send out an email if one of the checked parameters were reached.
if [ $cpu_alert_trigger -eq "1" ] || [ $hdd_alert_trigger -eq "1" ]; then
	#echo "Sending out email..."
	sendmail -t < /tmp/cpu_hdd_temp_check.log
fi

# Clean-up
rm /tmp/cpu_hdd_temp_check.log
exit
 
Last edited:

lbakyl

Cadet
Joined
Aug 26, 2018
Messages
3
I implemented the script that craniu3000bis posted above, and it works fine. However, I have two questions:

1. Regarding the temperature reading from the SATA SSD (ada2 in the list below). Is this really correct, or is it just a dummy value returned for SSDs?
2. Is there a logical explanation to why one disk (ada1) can stand out with a temperature 3 degrees above the rest? This seems to be a constant as it was the same yesterday when I ran the script.


HDD Temperature:
ada0: 38 C - HGST HUS724040ALE640 (PK1334PCK2X89S) - SPINNING
ada1: 41 C - HGST HUS724040ALE640 (PK1334PCK2WGBS) - SPINNING
ada2: 99 C - SATA SSD (67F407431F2400011724) - SPINNING
ada3: 38 C - HGST HUS724020ALA640 (PN2134P6KL6M1X) - SPINNING
ada4: 37 C - HGST HUS724020ALA640 (PN2134P6KK60EX) - SPINNING

Hi Ed, it looks like the script is not taking into account SSDs.

As for the higher temperature of ada1, I have seen older drives getting hotter over time (the older the hotter). Also, it depends on where in the case it is positioned, if there is a fan blowing at the others and not this one, etc. Try to keep the temperature below 40 C for the longest life of your drives.
 

diedrichg

Wizard
Joined
Dec 4, 2012
Messages
1,319
Hi,
I have modified the script above to be set up as a cron job that will only send emails if the temperature goes above a set treshold. This is because I prefer to only be notified if somethingis going on rather than receiving regular emails that do not require action to be taken.
Code:
# Check CPU temperature
COUNT_CPU=$(sysctl -a | grep -c "dev.cpu.[0-9].\temperature")
    # Save CPU's temp to a variable array starting from 1 (not zero), so the count is increase by 1
for ((cput=1;cput<"$COUNT_CPU+1";cput++))
Thank you for this. However, I'm getting a CPU_Temp-monitor.sh: 31: Syntax error: Bad for loop variable error when trying to run it. I'm on 11.2-U3 and my processor is a Xeon E3-1270 v3. Suggestions?
 

Freenasboy

Cadet
Joined
May 21, 2017
Messages
6
Sorry to necro this thread but I'm having a hard time editing the script to show temperatures of nvme disks
 

styno

Patron
Joined
Apr 11, 2016
Messages
466
Sorry to necro this thread but I'm having a hard time editing the script to show temperatures of nvme disks
Do they report smartctl output?
 

Freenasboy

Cadet
Joined
May 21, 2017
Messages
6
Running the old script outputs

HDD Temperature:
nvd0

So I try to use
smartctl -a /dev/nvd0
which gives this output:
/dev/nvd0: To monitor NVMe disks use /dev/nvme* device names

I don't know how to manipulate the script in order to get the "0" from "nvd0" then insert it into the script which runs "smartctl -a /dev/nvme0"

Someone please help...
 
Top