How to monitor system (CPU, HDD, mobo, GPU) temperatures on FreeNAS 8?

R.G. · Nov 30, 2014

Just got my first 10-disk system set up and running, and wanted to dink with predicted temperature rises and final temp prediction.

For most simple thermal processes, they hit a final temp you can estimate by simple exponential settling. It's a lot more complex as everything about heat transfer changes with everything else, but simple exponential rise isn't all that bad.

It occurred to me that for setting up a new system, one could have the system tell you when it thought it was *going to* exceed some temp after it settled. This would be a pure PITA to be running all the time, but it's remarkably useful when you're setting up a new system to have it squawk at you that it thinks it's under-cooled.

I found the scripts to email temps, and then realized that this kind of output could be logged into a temp log file, then examined for prognostications about things going south. A cron job to sample temps every few minutes and simply write them onto the end of a log file would not be that hard to do ( and I'm working on that now) but having it do a sanity check for "hey, how hot am I going to get if this keeps up?" to let it send off a red-flag email would be a really nice system burn-in thing to have.

I speculate so, at least.

Working...

Whattteva · Dec 15, 2014

R.G. said:
It occurred to me that for setting up a new system, one could have the system tell you when it thought it was *going to* exceed some temp after it settled. This would be a pure PITA to be running all the time, but it's remarkably useful when you're setting up a new system to have it squawk at you that it thinks it's under-cooled..

In case you didn't know. The existing SMART service on the system already does this for hard drives. You just have to setup some specified threshold temperature and it will automatically send you emails whenever the threshold is crossed on any of the drives.

R.G. · Dec 16, 2014

Actually, I did know about that, but I was on a somewhat different point.

When you're first starting a system, testing cooling vs loads, and trying out your CPU heat sinking and fan setup for the first time, you're - well I am at least - wondering what temperatures might go to, not what they are right now. And also how fast they're rising.

It happens that you can model temperature rise as a simple first-order declining exponential rise to an asymptote for small pulses of heat and small temp differences. That means if you know the temperature at a few points on the rise, you can estimate both the final temperature and how fast it will get there. That's not much help on things like the interior of a firecracker where the temp rise is so very fast and large, but it's a reasonable thing to do where the temperature rise times are on the order of seconds and minutes, and where you also have both pre-existing instrumentation (the SMART temperature readings and the internal CPU temp sense) and native computing power to do the math.

So you could (I'm still tinkering with it, don't have it running yet) read some temps at intervals, and compute an estimate of (1) will this setup exceed X temperature in the future? and (2) if so, how soon? The difference is between a warning that something is already overtemp, and that it is likely to go overtemp soon, the difference being that in the second case you have time to do something about it.

R.G. · Dec 20, 2014

I have found out that I have forgotten and need to re-learn everything I thought I knew about scripting. :)

But I did turn up some useful stuff. I'm seeing a time constant of something like 3 minutes on disk drives. This seems to vary depending on whether the drive is on the outside of a stack of drives or between two other drives (well, duuhhh.) :)

I also suspect my enclosure is over cooled for collecting this kind of data. I'm seeing rises of 2 to 3C in the first 45 minutes of a scrub.

PenalunWil · Mar 12, 2015

I just ran those earlier scripts on my NAS....

[root@willsnas ~]# sysctl -a |egrep -E "cpu\.[0-9]+\.temp"
dev.cpu.0.temperature: 26.0C
dev.cpu.1.temperature: 27.0C
[root@willsnas ~]# #! /bin/sh
[root@willsnas ~]#
[root@willsnas ~]# for i in $(sysctl -n kern.disks)
> do
> DevTemp=`smartctl -a /dev/$i | awk '/Temperature_Celsius/{print $0}' | awk '{print $10 "C"}'`
> DevSerNum=`smartctl -a /dev/$i | awk '/Serial Number:/{print $0}' | awk '{print $3}'`
> DevName=`smartctl -a /dev/$i | awk '/Device Model:/{print $0}' | awk '{print $3}'`
> echo $i $DevTemp $DevSerNum $DevName
> done
da6
da5 19C WD-WCC4JDNXK7L2 WDC
da4 19C WD-WCC4J1730341 WDC
da3 21C WD-WCC4J1755125 WDC
da2 23C WD-WCC4J1755028 WDC
da1 23C WD-WCC4J1775678 WDC
da0 22C WD-WCC4J1724714 WDC
[root@willsnas ~]#

I'm thinking that it may be a good idea to remove some of my cooling fans so that the temperature averages somewhere between 30 to 40 deg.C.

My NAS box is set up in the basement which isn't heated and the ambient temperature at the moment is something like 9 deg.C.

I recall reading somewhere that Cyberjock wrote that a low temperature under 30 deg. C is as detrimental as a high temperature as far as disk failure is concerned.

Bidule0hm · Mar 12, 2015

"it may be a good idea to remove some of my cooling fans so that the temperature averages somewhere between 30 to 40 deg.C." Yes, I agree. Cold drives (less than 30 C) isn't as bad as hot drives (more than 40 C) but it increases the failure rate significantly ;)

BTW if someone is still interested by an answer to the title question he can look at the "Useful scripts" link in my signature :)

Whattteva · Mar 12, 2015

I suppose it's just like your car engine. Overheated engine is obviously worse, but cold engine is pretty bad also; which is why they always advise you to bundle up all your trips to make one long trip instead of making several short trips that are separated enough time to cool your engine down. Your engine also operates more efficiently at its proper operating temperature range.

Bidule0hm · Mar 12, 2015

Yep, every mechanical system has a certain range of temp where it works better. For the drives "better" is "lower failure rate" because performances aren't affected (excepted in some extreme cases maybe).

And the drives don't like to be power cycled too, it's better to let them on 24/7 than to power cycle the server once a day for example ;)

Z300M · Mar 12, 2015

Whattteva said:
I suppose it's just like your car engine. Overheated engine is obviously worse, but cold engine is pretty bad also; which is why they always advise you to bundle up all your trips to make one long trip instead of making several short trips that are separated enough time to cool your engine down. Your engine also operates more efficiently at its proper operating temperature range.

And that's why remote starters for motor vehicles are a bad idea: they may make for greater driver comfort, but the best thing for the engine is to give it some work to do; IOW, get in and drive it.

PenalunWil · Mar 12, 2015

Bidule0hm said:
And the drives don't like to be power cycled too, it's better to let them on 24/7 than to power cycle the server once a day for example ;)

All my disks are WD Red 24/7 so no Widdleing here, but is power cycling turned off as default in FreeNAS 9.3?

My next job is to run some short SMART test in CLi to see if there has been any long term effect on the drives as they have been running at these low temperatures since I had them.

I noticed in a thread some place that Cyberjock was going to post some information on how to read SMART reports... I'm going have to hunt and bookmark that one.

Bidule0hm · Mar 12, 2015

Given the hard time some users have to let theirs drives spin-down it's unlikely they'll spin-down on their own, and it'll never gonna happen if you've put "HDD Standby" to "Always on" in the GUI ;)

"Cyberjock was going to post some information on how to read SMART reports" Not finished yet unfortunately. But it's not that difficult, if you look at the summary table in my SMART script you'll see what values are important to read ;)

JayG30 · Mar 16, 2015

KarlBlomquist said:
Very useful script, thanks.

My only question (so far) is how to access the data for LSI/IBM-M1015 card's drives?

Hello,

I wanted to perhaps shed some light onto this issue for others.
The issue I believe is related to SAS controllers/expanders. When you are using these the disks show as daX (SCSI) instead of adaX (ATA). The temperature and serial number camcontrol commands in the scripts presented will work fine because there is no difference in how that is accomplished.

HOWEVER, trying to get disk STATUS (ie. ready, standby, etc) will NOT work with the commands written. The issue is this line;

Code:

camcontrol cmd $1 -a "E5 00 00 00 00 00 00 00 00 00 00 00" -r -

If you look at man pages for camcontrol cmd you will see the following info;

Code:

 cmd     Allows    the user to send an arbitrary ATA or SCSI CDB to any
         device.  The cmd function requires the    -c argument to specify
         SCSI CDB or the -a argument to    specify    ATA Command Block reg-
         isters    values.     Other arguments are optional, depending on
         the command type.  The    command    and data specification syntax
         is documented in cam_cdbparse(3).  NOTE: If the CDB specified
         causes    data to    be transferred to or from the SCSI device in
         question, you MUST specify either -i or -o.

         -a cmd    [args]        This specifies the content of 12 ATA Com-
                    mand Block registers (command, features,
                    lba_low, lba_mid, lba_high,    device,
                    lba_low_exp, lba_mid_exp.  lba_high_exp,
                    features_exp, sector_count,    sec-
                    tor_count_exp).

         -c cmd    [args]        This specifies the SCSI CDB.  SCSI CDBs
                    may    be 6, 10, 12 or    16 bytes.

Note that is says "send an arbitrary ATA or SCSI CDB. The scripts are sending ATA commands as evident by the -a switch. For sending commands to daX SCSI devices you would need to use the -c switch and specify a SCSI CDB. However I do not know what the CDB should be for getting disk activity status.

Wolfeman0101 · Aug 24, 2015

Any idea why when I run sysctl -a |egrep -E "cpu\.[0-9]+\.temp" I get nothing?

cyberjock · Aug 25, 2015

Worked for me just now on a TrueNAS 9.3 system. So I tend to think you have/had a typo, your version of FreeNAS 8 doesn't support the commands, or your hardware isn't compatible. Many AMD boards will not report a temperature in FreeBSD/FreeNAS (just one more reason we aren't AMD fans here).

aTwonk · Feb 10, 2016

Digging this thread up to add a version of the script that should work with SCSI (daX) drives as well as ATA (adaX)

Code:

#! /usr/local/bin/bash

# Write email header to temp file
(
  echo "To: you@yourEmailAddress"
  echo "Subject: System Temperatures INFO"
  echo " "
) > /var/cover

# Define (a)dastat functions, which writes drive activity to temp file
dastat () {
  CM=$(camcontrol tur $1 | awk '{print $3}')
  if [ "$CM" = "ready" ] ; then
  echo " SPINNING" >> /var/cover
  elif [ "$CM" = "not" ] ; then
  echo " IDLE" >> /var/cover
  else
  echo " UNKNOWN ($CM)" >> /var/cover
  fi
}

adastat () {
  CM=$(camcontrol cmd $1 -a "E5 00 00 00 00 00 00 00 00 00 00 00" -r - | awk '{print $10}')
  if [ "$CM" = "FF" ] ; then
  echo " SPINNING" >> /var/cover
  elif [ "$CM" = "00" ] ; then
  echo " IDLE" >> /var/cover
  else
  echo " UNKNOWN ($CM)" >> /var/cover
  fi
}

# Write some general information
echo System Temperatures - `date` >> /var/cover
cat /etc/version >> /var/cover
uptime | awk '{ print "\nSystem Load:",$10,$11,$12,"\n" }' >> /var/cover

# Write CPU temperatures
echo "CPU Temperature:" >> /var/cover
sysctl -a | egrep -E "cpu\.[0-9]+\.temp" >> /var/cover
echo >> /var/cover

# Write HDD temperatures and status
echo "HDD Temperature:" >> /var/cover
for i in $(sysctl -n kern.disks | awk '{for (i=NF; i!=0 ; i--) print $i }' )
do
echo -n $i: `smartctl -a /dev/$i | awk '/Temperature_Celsius/{DevTemp=$10;} /Serial Number:/{DevSerNum=$3}; /Device Model:/{DevVendor=$3;
DevName=$4} \
END {printf "%s C - %s %s (%s) - ", DevTemp,DevVendor,DevName,DevSerNum }'` >> /var/cover;
if [[ $i == ada* ]]; then
 adastat $i
elif [[ $i == da* ]]; then
 dastat $i
fi
done

# Send status email
sendmail -t < /var/cover
exit 0

Jacopx · Mar 15, 2016

Hi, i've just put my scripts (folder: scripts/), if i ran directly it from shell with the command: sudo bash scripts/temp.sh, after ask me the password, it works great. Instead if i try to use this script without sudo it doesn't work. I don't now how to execute this script with a cron, without writing sudo. Can anyone help me?

cyberjock · Mar 15, 2016

Jacopx said:
Hi, i've just put my scripts (folder: scripts/), if i ran directly it from shell with the command: sudo bash scripts/temp.sh, after ask me the password, it works great. Instead if i try to use this script without sudo it doesn't work. I don't now how to execute this script with a cron, without writing sudo. Can anyone help me?

Just choose the "root" user when creating the cronjob in the GUI...

Does that not accomplish the goal?

TAC · Mar 15, 2016

Whattteva said:
I suppose it's just like your car engine. Overheated engine is obviously worse, but cold engine is pretty bad also; which is why they always advise you to bundle up all your trips to make one long trip instead of making several short trips that are separated enough time to cool your engine down. Your engine also operates more efficiently at its proper operating temperature range.

I've worked on the engine oil life algo for an automotive manufacturer. At 90 deg C one engine revolution counts as one rev, but at low and high temps the penalty factor goes up and one rev can count as much as 9 revolutions when computing the oil life remaining.

Jacopx · Mar 18, 2016

cyberjock said:
Just choose the "root" user when creating the cronjob in the GUI...

Does that not accomplish the goal?

No, i already tried! :(

cyberjock · Mar 24, 2016

No clue what you're doing wrong. It works for me. /srhug.

Important Announcement for the TrueNAS Community.

How to monitor system (CPU, HDD, mobo, GPU) temperatures on FreeNAS 8?

Explorer

Wizard

Explorer

Explorer

Contributor

Server Electronics Sorcerer

Wizard

Server Electronics Sorcerer

Guru

Contributor

Server Electronics Sorcerer

Contributor

Patron

Inactive Account

Cadet

Patron

Inactive Account

Contributor

Patron

Inactive Account

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "How to monitor system (CPU, HDD, mobo, GPU) temperatures on FreeNAS 8?"

Similar threads