How to monitor system (CPU, HDD, mobo, GPU) temperatures on FreeNAS 8?

djoole

Contributor
Joined
Oct 3, 2011
Messages
158
Everything is in the topic.
I would like to make a script to report all of this monitoring.

I've found the command
Code:
sysctl -a |egrep -E "cpu\.[0-9]+\.temp"
for CPU temp, but sysctl seems reporting wrong temperature (around 40°C).
I'm pretty sure that my CPU (i3 2100) is more than 40°C, being that it's fanless cooled (Thermalright HR-02)


For HDD temp i found the command
Code:
for i in $(sysctl -n kern.disks)
do printf "%s\t%s\t%s\n" $i $(smartctl -a /dev/$i | awk '/Serial Number/{x=$NF}$2~^Temperature/&&x{print $10"C",x}')
done

But it outputs an error that i can't solve :
Code:
awk: syntax error at source line 1
 context is
        /Serial >>>  Number/{x(NF)}$2~^ <<< Temperature/&&x{print $10"C",x}
awk: bailing out at source line 1
ada3


For mobo and GPU temp, didn't find anything.


Any help appreciated! :)
 

Custler

Cadet
Joined
Aug 14, 2011
Messages
4
For HDD temp i found the command
Code:
for i in $(sysctl -n kern.disks)
do printf "%s\t%s\t%s\n" $i $(smartctl -a /dev/$i | awk '/Serial Number/{x=$NF}$2~^Temperature/&&x{print $10"C",x}')
done

I rewrite it to more simple to understand :
Code:
#! /bin/sh

for i in $(sysctl -n kern.disks)
do
        DevTemp=`smartctl -a /dev/$i | awk '/Temperature_Celsius/{print $0}' | awk '{print $10 "C"}'`
        DevSerNum=`smartctl -a /dev/$i | awk '/Serial Number:/{print $0}' | awk '{print $3}'`
        DevName=`smartctl -a /dev/$i | awk '/Device Model:/{print $0}' | awk '{print $3}'`
        echo $i $DevTemp $DevSerNum $DevName
done
 
T

thomasdk81

Guest
Everything is in the topic.
I would like to make a script to report all of this monitoring.

I've found the command
Code:
sysctl -a |egrep -E "cpu\.[0-9]+\.temp"
for CPU temp, but sysctl seems reporting wrong temperature (around 40°C).
I'm pretty sure that my CPU (i3 2100) is more than 40°C, being that it's fanless cooled (Thermalright HR-02)

I got a i3-2100T using the default fan:
[root@freenas] ~# sysctl -a |egrep -E "cpu\.[0-9]+\.temp"
dev.cpu.0.temperature: 50.0C
dev.cpu.1.temperature: 50.0C
dev.cpu.2.temperature: 46.0C
dev.cpu.3.temperature: 46.0C
 

Custler

Cadet
Joined
Aug 14, 2011
Messages
4
I got on "FreeNAS 8.0-RELEASE amd64"
freenas# sysctl -a | egrep -E "cpu\.[0-9]+\.temp"
dev.cpu.0.temperature: 39.0C
dev.cpu.1.temperature: 38.0C
____________________________________
But nothing on "FreeNAS 8.0.1-RELEASE amd64" because of :

/# sysctl dev.cpu.0.temperature
sysctl: unknown oid 'dev.cpu.0.temperature'
____________________________________
And got on FreeNAS-8.0.2-RELEASE-amd64 (8288)

[root@freenas] ~# sysctl -a | egrep -E "cpu\.[0-9]+\.temp"
dev.cpu.0.temperature: 42.0C
dev.cpu.1.temperature: 39.0C
 

djoole

Contributor
Joined
Oct 3, 2011
Messages
158
Cluster, reviewing my topics to safekeep interesting stuff, i see your post and realize i must had skipped it at the time.
So, thanks it does work, but when i launch it, it takes a while before displaying the info for each disk, like if it made them work a lot... nevermind.

So my final script for system temps is
Code:
#! /bin/sh
echo "CPU temp :"
sysctl -a |egrep -E "cpu\.[0-9]+\.temp"
echo
echo "HDD temp :"
for i in $(sysctl -n kern.disks)
do
        DevTemp=`smartctl -a /dev/$i | awk '/Temperature_Celsius/{print $0}' | awk '{print $10 "C"}'`
        DevSerNum=`smartctl -a /dev/$i | awk '/Serial Number:/{print $0}' | awk '{print $3}'`
        DevName=`smartctl -a /dev/$i | awk '/Device Model:/{print $0}' | awk '{print $3}'`
        echo $i $DevTemp $DevSerNum $DevName
done


Returns :
Code:
CPU temp :
dev.cpu.0.temperature: 36.0C
dev.cpu.1.temperature: 36.0C
dev.cpu.2.temperature: 35.0C
dev.cpu.3.temperature: 35.0C

HDD temp :
ada7 26C S2H7J9AB807313 SAMSUNG
ada6 28C S2H7J9AB807309 SAMSUNG
ada5 28C S2H7J9AB807310 SAMSUNG
ada4 29C S2H7J1BB208475 SAMSUNG
ada3 37C 5XW1J1S1 ST32000542AS
ada2 39C 5XW1EXHR ST32000542AS
ada1 41C 5YD5RZNG ST2000DL003-9VT166
ada0 45C WD-WCAVY2756609 WDC
da0



First step is done.
Now, if you know a way to watch this temps and send an email alert if it reach thresholds... that would be the second step
 

NASbox

Guru
Joined
May 8, 2012
Messages
644
FreeNAS CPU / Hard Drive Temperature Script

Nice Job djoole ... thanks for the inspiration.... I cleaned up the awk script in the for loop a bit so that it is only necessary to call smartctl once for each disk drive to save the unnecessary overhead.

Code:
#! /bin/bash
adastat () { echo -n `camcontrol cmd $1 -a "E5 00 00 00 00 00 00 00 00 00 00 00" -r - | awk '{print $10 " " ; }'` " " ; }
echo 
echo System Temperatures  - `date`
cat /etc/version.freenas
uptime | awk '{ print "\nSystem Load:",$8,$9,$10,"\n" }'
echo "CPU Temperature:"
sysctl -a | egrep -E "cpu\.[0-9]+\.temp"
echo
echo "Drive Activity Status"
for i in $(sysctl -n kern.disks | awk '{for (i=NF; i!=0 ; i--) if (match($i, '/ada/')) print $i }'); do    echo -n $i:; adastat $i; done; echo ; echo
echo "HDD Temperature:"
for i in $(sysctl -n kern.disks | awk '{for (i=NF; i!=0 ; i--) if (match($i, '/ada/')) print $i }') 
do
   echo $i `smartctl -a /dev/$i | awk '/Temperature_Celsius/{DevTemp=$10;} /Serial Number:/{DevSerNum=$3}; /Device Model:/{DevName=$3} END { print DevTemp,DevSerNum,DevName }'`
done
echo


I also included/changed the following:
  • a few cosmetic labels
  • included system load to correlate with the CPU termperature
  • list the system disks in ascending order (ada0-adaX)
  • omitted any disks other than adaX since they didn't apply on my system ( da0 = boot flash drive - no temp sensor).
  • +++show drive status (00=stopped, FF=spinning) prior to temperature reading to determine if disk was active (smartctl causes disk to start spinning)

+++ Credit for the drive spin down status belongs to the authors of this thread
http://forums.freenas.org/showthrea...ing-down-properly&highlight=camcontrol+cmd+-a

Hope someone finds this useful and can improve it some more.
 

arryo

Dabbler
Joined
May 5, 2012
Messages
42
your script wakes my HDDs up to check the Temp. So I add -n standby so it checks temp only the ones that are active
 

NASbox

Guru
Joined
May 8, 2012
Messages
644
Modification of Monitor Script / Hard Drive Wake-up Now Optional

Good point arryo... I modified the script doesn't do that by default, script will now just say the temperature is unavailable and the drives stay idle. If you put a -w or -W option, it will cause the disks to wake up so you can get a snapshot of the resting temperature.

Code:
#! /bin/bash
#
#  Usage: lstemp [ -w or -W]
#  -w / -W: Wake up a sleeping drive to take it's temperature
#
adastat () { echo -n `camcontrol cmd $1 -a "E5 00 00 00 00 00 00 00 00 00 00 00" -r - | awk '{print $10 " " ; }'` " " ; }
echo 
echo System Temperatures  - `date`
cat /etc/version.freenas
uptime | awk '{ print "\nSystem Load:",$(NF-2),$(NF-1),$(NF),"\n" }'
echo "CPU Temperature:"
sysctl -a | egrep -E "cpu\.[0-9]+\.temp"
echo
echo "Drive Activity Status"
for i in $(sysctl -n kern.disks | awk '{for (i=NF; i!=0 ; i--) if (match($i, '/ada/')) print $i }'); do    echo -n $i:; adastat $i; done; echo ; echo
smartopt=`echo $@ | awk '{opt="-n standby"; if(match(tolower($0),'/-w/')) opt=""; print opt; }'`
echo "HDD Temperature:"
for i in $(sysctl -n kern.disks | awk '{for (i=NF; i!=0 ; i--) if (match($i, '/ada/')) print $i }') 
do
   echo $i `smartctl -a  $smartopt /dev/$i | awk 'BEGIN { DevName="N/A - Drive in standby mode" } /Temperature_Celsius/{DevTemp=$10;} /Serial Number:/{DevSerNum=$3}; /Device Model:/{DevName=$3} END { print DevTemp,DevSerNum,DevName }'`
done
echo
 

tingo

Contributor
Joined
Nov 5, 2011
Messages
137
Interestingly enough, the script thinks that two of my drives always are in standby:
Code:
[root@kg-f3] ~# /mnt/zstore/home-tingo/bin/lstemp

System Temperatures - Tue May 22 21:42:53 CEST 2012
FreeNAS-8.0.4-RELEASE-p2-x64 (11367)

System Load: 0.00, 0.00, 0.00 

CPU Temperature:

Drive Activity Status
ada0:FF  ada1:FF  ada2:FF  ada3:FF  ada4:00  ada5:00  

HDD Temperature:
ada0 35 S2R8J9HB911201 SAMSUNG
ada1 36 S2R8J9HB911056 SAMSUNG
ada2 35 S2R8J9HB911213 SAMSUNG
ada3 36 S2R8J9HB904753 SAMSUNG
ada4 N/A - Drive in standby mode
ada5 N/A - Drive in standby mode

If I use '-w' it reports ok:
Code:
[root@kg-f3] ~# /mnt/zstore/home-tingo/bin/lstemp -w

System Temperatures - Tue May 22 21:45:28 CEST 2012
FreeNAS-8.0.4-RELEASE-p2-x64 (11367)

System Load: 0.00, 0.00, 0.00 

CPU Temperature:

Drive Activity Status
ada0:FF  ada1:FF  ada2:FF  ada3:FF  ada4:00  ada5:00  

HDD Temperature:
ada0 35 S2R8J9HB911201 SAMSUNG
ada1 36 S2R8J9HB911056 SAMSUNG
ada2 35 S2R8J9HB911213 SAMSUNG
ada3 36 S2R8J9HB904753 SAMSUNG
ada4 36 S2R8J9HB911216 SAMSUNG
ada5 35 S2R8J9HB911211 SAMSUNG

and if I repeat the 'lstemp' again, it still thinks ada4 and ad5 is standby (which isn't very likeley, since I have just talked to the drives, and all drives are part of a raidz:
Code:
[root@kg-f3] ~# zpool status
  pool: zstore
 state: ONLINE
 scrub: resilver completed after 2h32m with 0 errors on Mon May 21 01:15:45 2012
config:

	NAME        STATE     READ WRITE CKSUM
	zstore      ONLINE       0     0     0
	  raidz1    ONLINE       0     0     0
	    ada0p2  ONLINE       0     0     0
	    ada1p2  ONLINE       0     0     0
	    ada2p2  ONLINE       0     0     0
	    ada3p2  ONLINE       0     0     0  344G resilvered
	    ada4p2  ONLINE       0     0     0
	    ada5p2  ONLINE       0     0     0

errors: No known data errors

FWIW, this is FreeNAS-8.0.4-RELEASE-p2-x64 (11367).
 

NASbox

Guru
Joined
May 8, 2012
Messages
644
Help Using camcontrol????

The problem has got to be with the camcontrol statement. Try typing this into a bash shell session:

Code:
adastat () { echo -n `camcontrol cmd $1 -a "E5 00 00 00 00 00 00 00 00 00 00 00" -r - | awk '{print $10 " " ; }'` " " ; }
echo `adastat ada4`


just to confirm that I'm right, but I'm sure the problem is with the camcontrol statement.

My best guess is that it has something to do with crossing the binary boundary from 3 to 4
but I don't understand what the control string does (just copied it) so I'm hoping
we have a hardware expert that understands camcontrol that can help out.
 

klint76

Cadet
Joined
Jul 23, 2012
Messages
6
Hi Guys

Love the script, i got....


system Temperatures - Mon Jul 30 22:32:54 WST 2012
cat: /etc/version.freenas: No such file or directory

System Load: 0.37, 0.41, 0.41

CPU Temperature:
dev.cpu.0.temperature: 47.0C
dev.cpu.1.temperature: 47.0C
dev.cpu.2.temperature: 50.0C
dev.cpu.3.temperature: 47.0C

Drive Activity Status
ada0:FF ada1:FF ada2:FF ada3:FF ada4:FF ada5:FF

HDD Temperature:
ada0 36 Z1E0VA9Y ST2000DM001-9YN164
ada1 29 Z1E0V690 ST2000DM001-9YN164
ada2 35 Z1E0VAAD ST2000DM001-9YN164
ada3 29 Z1E0V688 ST2000DM001-9YN164
ada4 29 Z1E0VA1G ST2000DM001-9YN164
ada5 29 Z1E0V66G ST2000DM001-9YN164


Question, what is a good temp and bad temp for a hdd?

Klint
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
If you use the Google white paper on "Failure Trends in a Large Disk Drive Population" the best temperature for hard drive lifespan is 30 to 40C. 40-45C 3 year failure rates are about 11% and the >45C 3 year failure rates are about 12-20%!

The white paper is a good read if you are interested.
 

madmax

Explorer
Joined
Aug 31, 2012
Messages
64
Anyone have a script for 8.2

having a problem with hd temps i get this error

HDD Temperature:
awk: newline in regular expression Device Num... at source line 2
context is
BEGIN { DevName="N/A - Drive in standby mode" } /Temperature_Celsius/{DevTemp=$10;} /Se
<<<
awk: syntax error at source line 2
awk: bailing out at source line 2
ada0
awk: newline in regular expression Device Num... at source line 2
context is
 

ithank

Cadet
Joined
Aug 19, 2012
Messages
7
@MadMax: The script presented worked for me after changing the call to version from 'version.freenas' to just 'version' and some fiddling with spaces in the 'for' statements:

Code:
#! /bin/bash
adastat () { echo -n `camcontrol cmd $1 -a "E5 00 00 00 00 00 00 00 00 00 00 00" -r - | awk '{print $10 " " ; }'` " " ; }
echo 
echo System Temperatures  - `date`
cat /etc/version
uptime | awk '{ print "\nSystem Load:",$8,$9,$10,"\n" }'
echo "CPU Temperature:"
sysctl -a | egrep -E "cpu\.[0-9]+\.temp"
echo
echo "Drive Activity Status"
for i in $(sysctl -n kern.disks | awk '{for (i=NF; i!=0 ; i--) if(match($i, '/ada/')) print $i }' ); do    echo -n $i:; adastat $i; done; echo ; echo
echo "HDD Temperature:"
for i in $(sysctl -n kern.disks | awk '{for (i=NF; i!=0 ; i--) if(match($i, '/ada/')) print $i }' ) 
do
   echo $i `smartctl -a /dev/$i | awk '/Temperature_Celsius/{DevTemp=$10;} /Serial Number:/{DevSerNum=$3}; /Device Model:/{DevName=$3} END { print DevTemp,DevSerNum,DevName }'`
done
echo
 

madmax

Explorer
Joined
Aug 31, 2012
Messages
64
@MadMax: The script presented worked for me after changing the call to version from 'version.freenas' to just 'version' and some fiddling with spaces in the 'for' statements:

Your right. It works now both in 8.2 and 8.3.1. Thanks!


I used the esmart script and put this script together to send myself email of this information
Code:
#! /usr/local/bin/sh

(
echo "To: your email address "
echo "Subject: System Temperatures INFO"
echo " "
) > /var/cover


adastat () { echo -n `camcontrol cmd $1 -a "E5 00 00 00 00 00 00 00 00 00 00 00" -r - | awk '{print $10 " " ; }'` " " ; } >> /var/cover
echo
echo System Temperatures - `date` >> /var/cover
cat /etc/version >> /var/cover
uptime | awk '{ print "\nSystem Load:",$8,$9,$10,"\n" }' >> /var/cover
echo "CPU Temperature:" >> /var/cover
sysctl -a | egrep -E "cpu\.[0-9]+\.temp" >> /var/cover
echo
echo "Drive Activity Status" >> /var/cover
for i in $(sysctl -n kern.disks | awk '{for (i=NF; i!=0 ; i--) if(match($i, '/ada/')) print $i }' ); do echo -n $i:; adastat $i; done; echo ; echo >> /var/cover
echo "HDD Temperature:" >> /var/cover
for i in $(sysctl -n kern.disks | awk '{for (i=NF; i!=0 ; i--) if(match($i, '/ada/')) print $i }' )
do
echo $i `smartctl -a /dev/$i | awk '/Temperature_Celsius/{DevTemp=$10;} /Serial Number:/{DevSerNum=$3}; /Device Model:/{DevName=$3} END { print DevTemp,DevSerNum,DevName }'` >> /var/cover
done
echo

sendmail -t < /var/cover
exit 0


when I do run it though i get an echo of the hard letters. Not sure which echo is the cause of it and if I can turn if off from outputting. Otherwise the email is looks good.
 
Joined
Nov 3, 2012
Messages
3
Very useful script, thanks.

My only question (so far) is how to access the data for LSI/IBM-M1015 card's drives?

I have 10 3TB drives - 2 on the MB's SATA3 ports, and 8 on the M1015 card. Running the script shows only the two MB drives (ada0 and ada1). The 8 M1015 drives (da0-da7) are not shown.

The M1015 is flashed with LSI "IT" f/w.

Thanks, Karl

Output...

System Temperatures - Sat Nov 3 15:15:04 EDT 2012
FreeNAS-8.3.0-RELEASE-x64 (r12701M)

System Load: 0.34, 0.35, 0.37

CPU Temperature:
dev.cpu.0.temperature: 45.0C
dev.cpu.1.temperature: 39.0C
dev.cpu.2.temperature: 33.0C
dev.cpu.3.temperature: 36.0C

Drive Activity Status
ada0:FF ada1:FF

HDD Temperature:
ada0 31 Z1F155F4 ST3000DM001-9YN166
ada1 34 Z1F188NZ ST3000DM001-9YN166
 

Yell

Explorer
Joined
Oct 24, 2012
Messages
74
Very useful script, thanks.

My only question (so far) is how to access the data for LSI/IBM-M1015 card's drives?

I have 10 3TB drives - 2 on the MB's SATA3 ports, and 8 on the M1015 card. Running the script shows only the two MB drives (ada0 and ada1). The 8 M1015 drives (da0-da7) are not shown.

The M1015 is flashed with LSI "IT" f/w.

Thanks, Karl

Output...

System Temperatures - Sat Nov 3 15:15:04 EDT 2012
FreeNAS-8.3.0-RELEASE-x64 (r12701M)

System Load: 0.34, 0.35, 0.37

CPU Temperature:
dev.cpu.0.temperature: 45.0C
dev.cpu.1.temperature: 39.0C
dev.cpu.2.temperature: 33.0C
dev.cpu.3.temperature: 36.0C

Drive Activity Status
ada0:FF ada1:FF

HDD Temperature:
ada0 31 Z1F155F4 ST3000DM001-9YN166
ada1 34 Z1F188NZ ST3000DM001-9YN166

thats because of the ata filter

Code:
if(match($i, '/ada/')) print $i }' 


we did this because normally "da" refers to the usb drive, which has no smart values ;)
 
Joined
Nov 3, 2012
Messages
3
thats because of the ata filter

Code:
if(match($i, '/ada/')) print $i }' 


we did this because normally "da" refers to the usb drive, which has no smart values ;)

Ah yes, I guess I should have read the script a little more carefully... :o It's been a looonnng time since I've played with awk, so I was a little lazy about it.

I can pull all 10 drive temps now.

Can't get drive act status off the LSI card, but I don't care as much about that.

Thanks!
 

Yell

Explorer
Joined
Oct 24, 2012
Messages
74
Ah yes, I guess I should have read the script a little more carefully... :o It's been a looonnng time since I've played with awk, so I was a little lazy about it.

I can pull all 10 drive temps now.

Can't get drive act status off the LSI card, but I don't care as much about that.

Thanks!

do you mean the "Drive Activity Status",
if yes, sorry to shatter your dreams but there is a second ada filter xD
Code:
for i in $(sysctl -n kern.disks | awk '{for (i=NF; i!=0 ; i--) if(match($i, '/ada/')) print $i }' ); do    echo -n $i:; adastat $i; done; echo ; echo


Ok srly, this will call the "adastat" function which will call "camcontrol" for each found disk, you may want to check the output of this tool against you drives on the LSI
 
Top