How to monitor system (CPU, HDD, mobo, GPU) temperatures on FreeNAS 8?

Jacopx · Apr 14, 2016

cyberjock said:
No clue what you're doing wrong. It works for me. /srhug.

I Have solved The problem! how can I calculate an average temperature of my 8-core? Someone can help me?

Jacopx · Apr 29, 2016

I Have change my hardware and now my Mail mail report show 16 CPUs temperature but I have only 8 cores, not 16 (it correspond to the threads numbers), but the temperature are the same in group of two! Some help?

xxxGODxxx · Jun 29, 2016

May I know what can I change to the script if I keep getting "Syntax error: word unexpected (expecting ")")" ? I just copied the script and changed the email address to mine. The script works when I pasted it into the shel but not as a cron job

This is what I receive in my email when I used the script as a CRON job

Cron <root@freenas> PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" ( echo "To: ***@gmail.com " echo "Subject: System Temperatures INFO" echo " " ) > /var/cover adastat () { echo -n `camcontrol cmd $1 -a "E5 00 00 00 00 00 00 00 00 00 00 00" -r - | awk '{print $10 " " ; }'` " " ; } >> /var/cover echo echo System Temperatures - `date` >> /var/cover cat /etc/version >> /var/cover uptime | awk '{ print "\nSystem Load:",$8,$9,$10,"\n" }' >> /var/cover echo "CPU Temperature:" >> /var/cover sysctl -a | egrep -E "cpu\.[0-9]+\.temp" >> /var/cover echo echo "Drive Activity Status" >> /var/cover for i in $(sysctl -n kern.disks | awk '{for (i=NF; i!=0 ; i--) if(match($i, '/ada/')) print $i }' ); do echo -n $i:; adastat $i; done; echo ; echo >> /var/cover echo "HDD Temperature:" >> /var/cover for i in $(sysctl -n kern.disks | awk '{for (i=NF; i!=0 ; i--) if(match($i, '/ada/')) print $i }' ) do echo $i `smartctl -a /dev/$i | awk '/Temperature_Celsius/{DevTemp=$1

and a follow up message of

Syntax error: word unexpected (expecting ")")

Fox · Jun 29, 2016

I know you guys are looking at CPU temp, but here is a thread with a python script that reads HD temps and other smart info.

Python Script to Monitor Drive Temps: https://forums.freenas.org/index.php?threads/python-script-to-monitor-drive-temps.22794/

Should be made to spit out JSON so it could be stored and graphed over time, just never had time to do it. I had hoped that someone else would do it.

drwoodcomb · Nov 27, 2016

Hi everyone,

I have been looking for a way to check my hard drive temperatures and I see you guys have been working on a way to do so. Would one of you mind telling me (or pointing me to somewhere that can show me) how to use this script? I dont think I can open Shell and just copy and paste but I dont know enough about FreeNAS to figure out how to run it.

craniu3000bis · Jul 5, 2017

Hello,

I have read this topic line by line and manage to adjust this script to send a report to my email.
Unfortunately of the my drive is missing the temperature number.
Can you please help me find out what is the issue?

Code:

#! /usr/local/bin/bash

# Write email header to temp file
(
  echo "Subject: System Temperatures INFO"
  echo " "
) > /var/temp_report

# Define adastat function, which writes drive activity to temp file
adastat () {
  CM=$(camcontrol cmd $1 -a "E5 00 00 00 00 00 00 00 00 00 00 00" -r - | awk '{print $10}')
  if [ "$CM" = "FF" ] ; then
  echo " SPINNING" >> /var/temp_report
  elif [ "$CM" = "00" ] ; then
  echo " IDLE" >> /var/temp_report
  else
  echo " UNKNOWN ($CM)" >> /var/temp_report
  fi
}

# Write some general information
echo System Temperatures - `date` >> /var/temp_report
cat /etc/version >> /var/temp_report
uptime | awk '{ print "\nSystem Load:",$10,$11,$12,"\n" }' >> /var/temp_report

# Write CPU temperatures
echo "CPU Temperature:" >> /var/temp_report
sysctl -a | egrep -E "cpu\.[0-9]+\.temp" >> /var/temp_report
echo >> /var/temp_report

# Write HDD temperatures and status
echo "HDD Temperature:" >> /var/temp_report
for i in $(sysctl -n kern.disks | awk '{for (i=NF; i!=0 ; i--) if(match($i, '/ada/')) print $i }' )
do
echo -n $i: `smartctl -a /dev/$i | awk '/Temperature_Celsius/{DevTemp=$10;} /Serial Number:/{DevSerNum=$3}; /Device Model:/{DevVendor=$3; DevName=$4} \
END {printf "%s C - %s %s (%s) - ", DevTemp,DevVendor,DevName,DevSerNum }'` >> /var/temp_report;
adastat $i;
done

# Send status email
sendmail my_email_address@gmail.com < /var/temp_report
rm /var/temp_report
exit 0

The output is this:

System Temperatures - Wed Jul 5 22:49:27 EEST 2017
FreeNAS-11.0-U1 (aa82cc58d)

System Load: 0.70

CPU Temperature:
dev.cpu.0.temperature: 34.0C

HDD Temperature:
ada0: 44 C - WDC WD1600JB-00REA0 (WD-WMANM4587573) - SPINNING
ada1: 34 C - TOSHIBA MQ01ACF050 (65G8CAC4T) - SPINNING
ada2: 31 C - HGST HTS725050A7E630 (TF655AWH2DK8YL) - SPINNING
ada3: C - HGST HTS725050A7E630 (RCF50ACE27JNPM) - SPINNING
ada4: 34 C - WDC WD3200BEKT-60PVMT0 (WD-WX91A6367399) - SPINNING
ada5: 32 C - WDC WD3200BEKT-60PVMT0 (WD-WX91A6376433) - SPINNING
ada6: 34 C - WDC WD3200BEKT-60PVMT0 (WD-WX51A43X9510) - SPINNING

As you can see for ada3 is missing the number

styno · Jul 6, 2017

What is the output of smartctl -a /dev/ada3 ?

craniu3000bis · Jul 6, 2017

Code:

smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Hitachi/HGST Travelstar Z7K500
Device Model: HGST HTS725050A7E630
Serial Number: RCF50ACE27JNPM
LU WWN Device Id: 5 000cca 85edf9c2c
Firmware Version: GS2OA3C0
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Thu Jul 6 16:31:26 2017 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 45) seconds.
Offline data collection
capabilities: (0x51) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 99) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported. SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 062 Pre-fail Always - 0
2 Throughput_Performance 0x0025 100 100 040 Pre-fail Offline - 0
3 Spin_Up_Time 0x0023 240 100 033 Pre-fail Always - 1
4 Start_Stop_Count 0x0032 094 094 000 Old_age Always - 10377
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x002f 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0025 100 100 040 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 831
10 Spin_Retry_Count 0x0033 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 67
183 Runtime_Bad_Block 0x0032 100 100 001 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 070 055 045 Old_age Always - 30 (Min/Max 29/42)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 3
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 655370
193 Load_Cycle_Count 0x0032 089 089 000 Old_age Always - 111196
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 100 100 000 Old_age Always - 0
223 Load_Retry_Count 0x002a 100 100 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 4 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

styno · Jul 6, 2017

There you go, for some reason the drive is not reporting Temperature_Celsius in the smart output but Airflow_Temperature_Cel instead.
You can change the script to grep on Temperature_Ce to cover both.

craniu3000bis · Jul 6, 2017

Worked like a charm, thank you very much.
The answer where in front of me :)

Eniac74 · Jul 8, 2017

I implemented the script that craniu3000bis posted above, and it works fine. However, I have two questions:

1. Regarding the temperature reading from the SATA SSD (ada2 in the list below). Is this really correct, or is it just a dummy value returned for SSDs?
2. Is there a logical explanation to why one disk (ada1) can stand out with a temperature 3 degrees above the rest? This seems to be a constant as it was the same yesterday when I ran the script.

HDD Temperature:
ada0: 38 C - HGST HUS724040ALE640 (PK1334PCK2X89S) - SPINNING
ada1: 41 C - HGST HUS724040ALE640 (PK1334PCK2WGBS) - SPINNING
ada2: 99 C - SATA SSD (67F407431F2400011724) - SPINNING
ada3: 38 C - HGST HUS724020ALA640 (PN2134P6KL6M1X) - SPINNING
ada4: 37 C - HGST HUS724020ALA640 (PN2134P6KK60EX) - SPINNING

nightshade00013 · Jul 8, 2017

Simply put drive location will determine what the temp is. It's not an issue with the script. Less airflow equals a higher temp and they are all pretty warm anyway IMHO. I get warnings if any drive gets to 38c.

Spearfoot · Jul 8, 2017

FWIW, I've written a set of scripts for FreeNAS that include one for reporting CPU and drive temperatures, details in the Resources section:

"Github repository for FreeNAS scripts, including disk burnin"

lbakyl · Aug 26, 2018

Hi,

I have modified the script above to be set up as a cron job that will only send emails if the temperature goes above a set treshold. This is because I prefer to only be notified if somethingis going on rather than receiving regular emails that do not require action to be taken.

Code:

#!/bin/bash
# Optimized for FreeBSD (FreeNAS etc.)
# Save it on your share (not on the system drive!) and run as a regular task (cron) from FreeNAS GUI (as /bin/bash /path/to/script.sh), e.g. every 30 minutes.
# Email will be sent only if a treshold is met, otherwise there is no output.
# The script was verified against https://www.shellcheck.net/ to ensure correct syntax.

#To-do: Display more about the drive, such as model and serial number, so it is easier to identify it!

# Original source that was heavily edited: https://www.reddit.com/r/freenas/comments/6pmewl/temps_and_fan_speeds_in_freenas/

HDD_ALERT_LEVEL="35"
CPU_ALERT_LEVEL="43"
NOTIFY_EMAIL_ADDRESS="abc@efg.com"

#Pre-defining variables
cpu_alert_trigger=0
hdd_alert_trigger=0

# Write email header to temp file
	(
  echo "To: $NOTIFY_EMAIL_ADDRESS"
  echo "Subject: CPU/HDD Temperature Warning"
  echo " "
  echo "Warning! The following components in your system are above the set temperature treshold!"
  echo " "
	) > /tmp/cpu_hdd_temp_check.log

# Check CPU temperature
COUNT_CPU=$(sysctl -a | grep -c "dev.cpu.[0-9].\temperature")
	# Save CPU's temp to a variable array starting from 1 (not zero), so the count is increase by 1
for ((cput=1;cput<"$COUNT_CPU+1";cput++))
do
	CPU_TEMP[$cput]=$(sysctl dev.cpu | grep temperature | awk '{print $2}' | awk -F'[^0-9]*' '$0=$1' | awk "NR==$cput")
	#echo "CPU no.$cput has temperature of ${CPU_TEMP[$cput]}"
	CPU_DETAILED_REPORT="$CPU_DETAILED_REPORT
	CPU no.$cput = ${CPU_TEMP[$cput]} Celsius"
	if [ "${CPU_TEMP[$cput]}" -ge "$CPU_ALERT_LEVEL" ]; then
		#echo "CPU number $cput is at or over the limit!"
		cpu_alert_trigger="1"
	fi
done


# Check HDD temperature
for disk in $(sysctl -n kern.disks)
do
	HDTEMP=$(smartctl -A /dev/"$disk" | grep -i temperature |  awk '{print $10}')
	echo "Temp of $disk is $HDTEMP."
	
	if [[ "$HDTEMP" -ge 1 ]]; then
		DETAILED_REPORT="$DETAILED_REPORT
	Temperature of $disk is $HDTEMP."
		if [[ "$HDTEMP" -ge "$HDD_ALERT_LEVEL" ]]; then
			DRIVES_OVER_LIMIT="$DRIVES_OVER_LIMIT $disk"
			hdd_alert_trigger="1"
		fi
	fi
done

# Set alert text if one of the CPU temperatures was reached.
if [ $cpu_alert_trigger -eq "1" ]; then
		echo "CPU temperature is at or over the limit of $CPU_ALERT_LEVEL:$CPU_DETAILED_REPORT" >> /tmp/cpu_hdd_temp_check.log
		echo " " >> /tmp/cpu_hdd_temp_check.log
fi

# Set alert text if one of the HDDs temperature was reached.
if [ $hdd_alert_trigger -eq "1" ]; then
	echo "These HDDs are over the temperature limit of $HDD_ALERT_LEVEL:$DETAILED_REPORT" >> /tmp/cpu_hdd_temp_check.log
	echo " " >> /tmp/cpu_hdd_temp_check.log
fi

# Send out an email if one of the checked parameters were reached.
if [ $cpu_alert_trigger -eq "1" ] || [ $hdd_alert_trigger -eq "1" ]; then
	#echo "Sending out email..."
	sendmail -t < /tmp/cpu_hdd_temp_check.log
fi

# Clean-up
rm /tmp/cpu_hdd_temp_check.log
exit

lbakyl · Aug 27, 2018

Eniac74 said:
I implemented the script that craniu3000bis posted above, and it works fine. However, I have two questions:

1. Regarding the temperature reading from the SATA SSD (ada2 in the list below). Is this really correct, or is it just a dummy value returned for SSDs?
2. Is there a logical explanation to why one disk (ada1) can stand out with a temperature 3 degrees above the rest? This seems to be a constant as it was the same yesterday when I ran the script.

HDD Temperature:
ada0: 38 C - HGST HUS724040ALE640 (PK1334PCK2X89S) - SPINNING
ada1: 41 C - HGST HUS724040ALE640 (PK1334PCK2WGBS) - SPINNING
ada2: 99 C - SATA SSD (67F407431F2400011724) - SPINNING
ada3: 38 C - HGST HUS724020ALA640 (PN2134P6KL6M1X) - SPINNING
ada4: 37 C - HGST HUS724020ALA640 (PN2134P6KK60EX) - SPINNING

Hi Ed, it looks like the script is not taking into account SSDs.

As for the higher temperature of ada1, I have seen older drives getting hotter over time (the older the hotter). Also, it depends on where in the case it is positioned, if there is a fan blowing at the others and not this one, etc. Try to keep the temperature below 40 C for the longest life of your drives.

diedrichg · Apr 21, 2019

lbakyl said:
Hi,
I have modified the script above to be set up as a cron job that will only send emails if the temperature goes above a set treshold. This is because I prefer to only be notified if somethingis going on rather than receiving regular emails that do not require action to be taken.

Code:
# Check CPU temperature COUNT_CPU=$(sysctl -a | grep -c "dev.cpu.[0-9].\temperature") # Save CPU's temp to a variable array starting from 1 (not zero), so the count is increase by 1 for ((cput=1;cput<"$COUNT_CPU+1";cput++))

Thank you for this. However, I'm getting a CPU_Temp-monitor.sh: 31: Syntax error: Bad for loop variable error when trying to run it. I'm on 11.2-U3 and my processor is a Xeon E3-1270 v3. Suggestions?

Freenasboy · Apr 22, 2021

Sorry to necro this thread but I'm having a hard time editing the script to show temperatures of nvme disks

styno · Apr 22, 2021

Freenasboy said:
Sorry to necro this thread but I'm having a hard time editing the script to show temperatures of nvme disks

Do they report smartctl output?

Freenasboy · Apr 22, 2021

Running the old script outputs

HDD Temperature:
nvd0

So I try to use
smartctl -a /dev/nvd0
which gives this output:
/dev/nvd0: To monitor NVMe disks use /dev/nvme* device names

I don't know how to manipulate the script in order to get the "0" from "nvd0" then insert it into the script which runs "smartctl -a /dev/nvme0"

Someone please help...

Freenasboy · Apr 22, 2021

styno said:
Do they report smartctl output?

Yes they do

Important Announcement for the TrueNAS Community.

How to monitor system (CPU, HDD, mobo, GPU) temperatures on FreeNAS 8?

Patron

Patron

Dabbler

Explorer

Explorer

Dabbler

Patron

Dabbler

Patron

Dabbler

Dabbler

Wizard

He of the long foot

Cadet

Cadet

Wizard

Cadet

Patron

Cadet

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "How to monitor system (CPU, HDD, mobo, GPU) temperatures on FreeNAS 8?"

Similar threads