Temperature monitoring

Joined
Jul 13, 2013
Messages
286
"strict" doesn't occur in any of the files in the graphing tool repository.

It ought to be in rrd-graph.sh; that's where the "rrdtool graph" invocations are that generate the png files. I'm guessing "--rigid" or maybe "--alt-autoscale" are relevant, but haven't dug into rrdtool docs.

[EDIT: Yep, taking off --rigid should let it expand to what the data needs.]

Also, it's spelled "celsius".

[EDIT: Attached changes01.txt is the limits changes, spelling change, and error redirection changes I've mentioned here lately, which I am currently running. I can't quite make a real pull request since my personal work isn't in an open repository anywhere, and the server I'm using doesn't have a way to make only certain pieces readable. But there's a full patch in there, so it can just be applied from that, should you wish.]
 

Attachments

  • changes01.txt
    2.8 KB · Views: 654
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,456
[EDIT: Yep, taking off --rigid should let it expand to what the data needs.]
That's what happens when I go from memory on my iPad. Yes, --rigid was what I was thinking of. Sorry for the error.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
Also, it's spelled "celsius".
Ooh, pedantry! :D

Actually, it's degree Celsius. Celsius is capitalized because it is a degree, whereas proper SI units are not - like tesla or newton are lowercase, whereas degree Celsius and degree Fahrenheit have this sort of camelCase going on (though it should really be dromedaryCase, since there's only one hump).
 
Joined
Jul 13, 2013
Messages
286
That's what happens when I go from memory on my iPad. Yes, --rigid was what I was thinking of. Sorry for the error.

It told me there was something worth looking for, and gave enough of a hint to let me recognize it, so you're still solidly in plus territory there :smile:.
 
Joined
Jul 13, 2013
Messages
286
Ooh, pedantry! :D

Hey, I have professional copy-editors in my weekly social circles. Plural. You can't scare me! :smile: (So I have plenty of practice getting corrected.)

My particular pedantry was about the spelling, which I got right. When I changed it in the code, I didn't change the casing. But I didn't know the principles behind handling those casing cases, so that was useful.
 

Jacopx

Patron
Joined
Feb 19, 2016
Messages
367
Is possibile that, with this script, my drive can't go in IDLE mode? If it's true, Is there a way to solve it?
 
Joined
Jul 13, 2013
Messages
286
Yes, smartctl calls, which this script makes, can block idle mode. (However, even without this script I've never managed to get any disk on a FreeNAS box to go into idle mode. )

There are options to smartctl to avoid querying drives at different levels of "idle". They're not being used in this script, I've been looking at adding them myself (but, since I can't get any drive to go idle anyway, I couldn't test it, and it's a low priority). Okay, it's --nocheck=<powermode> where powermode is one of never, sleep, standby, or idle. See the smartctl man page.

I remember reading comments suggesting that sharing via Samba will block idle due to something-or-other, and that's the whole reason I'm running FreeNAS boxes, so I haven't been able to test with unshared filesystems. I am, in any case, interested in the whole question of letting disks spin down.
 
Joined
Jul 13, 2013
Messages
286
A drive got detached from my array (not sure why; that particular drive identifier, ada4, has shown problems across three physical drives, but the slot isn't hot or strange and it's not the only drive on that controller; I've got some debugging to do).

And that resulted in the graphing data-gathering script failing each run with the message "ERROR: temps-5min.rrd: expected 16 data source readings (got 15) from N".

I'm not even sure this is "wrong"; however, it does mean that when my array is more than usually at risk, I suddenly lose my temperature monitoring. I'm pretty sure this is not the ideal outcome. (I haven't dug into the code; I rather think it's falling afoul of finding from defined names that there were 16 things to read from, and having one of them fail to respond, and deciding that's anomalous and it's not confident of the data and refusing to risk messing things up by updating the RRD file.)
 
Joined
Jul 13, 2013
Messages
286
A drive got detached from my array (not sure why; that particular drive identifier, ada4, has shown problems across three physical drives, but the slot isn't hot or strange and it's not the only drive on that controller; I've got some debugging to do).

And that resulted in the graphing data-gathering script failing each run with the message "ERROR: temps-5min.rrd: expected 16 data source readings (got 15) from N".

I'm not even sure this is "wrong"; however, it does mean that when my array is more than usually at risk, I suddenly lose my temperature monitoring. I'm pretty sure this is not the ideal outcome. (I haven't dug into the code; I rather think it's falling afoul of finding from defined names that there were 16 things to read from, and having one of them fail to respond, and deciding that's anomalous and it's not confident of the data and refusing to risk messing things up by updating the RRD file.)
 
Joined
Jul 13, 2013
Messages
286
Improved code to get the drive temp info via smartctl.

This code improves three aspects.
  1. It avoids spinning up drives that have been spun down for power-saving just to sample their temperature (the --nocheck switch).
  2. It returns "N" (RRD for "no data") when the drive being queried doesn't exist. This should solve the problem I reported this afternoon with temp graphing failing when I have a failed drive in the array.
  3. It might be more efficient (using smartctl -A to get less data to grep through, using cut rather than awk to pick out the 10th field). (This is probably of no importance; the amount of cpu burned running this every 5 minutes is completely irrelevant on a modern processor. Old habits die hard :smile: .)
I've got this set up as a standalone shell script; it might integrate by becoming a bash function or something.

Code:
#! /bin/bash

dev="$1"
devpath="/dev/$1"

# old version
#  DevTemp=`/usr/local/sbin/smartctl -a /dev/$i | grep '194 *Temperature_Celsius' | awk '{print \
$10}'`;

temp=$( smartctl --nocheck=standby -A "$devpath" | grep ^194 | cut -w -f 10 )
# If no temp, indicate "no data" to RRD
if [ -z "$temp" ]; then
    temp="N"
fi
echo $temp


Note that there's still a longer-term problem with the number of drives -- if you add a drive while the graphing app is running, it will also start failing I believe. Deleting the .rrd data file will resolve that, but at the cost of throwing out your existing data. Since the drive ids that the graphing app uses are inherently dynamic, and can be different on each boot, some kind of complete rethink might be in order here, but I'm not sure what the right approach is. Meanwhile, this works fine most of the time, and can be started over fine by just deleting the data file, so that's pretty functional.
 
Joined
Jul 13, 2013
Messages
286
So here's my new version of temps-rrd-format.sh, integrating the above external script (it's now internal). Apologies for the ".txt" at the end, uploading a .sh file is not permitted. The file should be saved as temps-rrd-format.sh just like the original version.

[EDIT: Something upstream from this doesn't like what it's doing. I've deleted the file for now, will post again if I get it actually working. :oops: ]
 
Last edited:
Joined
Jul 13, 2013
Messages
286
Okay, this time for sure. Attached file is improved temps-rrd-format.sh (with extension ".txt" added because txt is on the list of allowed upload formats). Description of changes in previous two messages.
 

Attachments

  • temps-rrd-format.sh.txt
    1.7 KB · Views: 733

metalsaber

Cadet
Joined
Jun 12, 2016
Messages
2
New to the FreeNAS and forums. I downloaded this scripts as they will be useful to me. However I get the following when running the initial scripts in shell:
Code:
/mnt/vol1/Dataset1/temperature/rrd.sh     
/mnt/vol1/Dataset1/temperature/temps-1min.rrd


Returns:
Code:
[root@freenas ~]# /mnt/vol1/Dataset1/temperature/rrd.sh     
/mnt/vol1/Dataset1/temperature/temps-1min.rrd                                                 
ERROR: /mnt/vol1/Dataset1/temperature/temps-1min.rrd: expected 6 data source readings (got 5) from N     


If I run the test-simple.sh I get the following:
Code:
[root@freenas ~]# /mnt/vol1/Dataset1/temperature/temps-simple.sh               
dev.cpu.1.temperature: 37.0C                                                   
dev.cpu.0.temperature: 35.0C                                                   
da0 -                                                                          
ada0 - 36C                                                                     
ada1 - 37C                                                                     
ada2 - 35C                                                                     
[root@freenas ~]#      


It appears its counting da0 as the 6th but it has no value thus failing. Any thoughts?
 
Joined
Jul 13, 2013
Messages
286
I've seen that message if the number of disks changes since the data file (.rrd) was created. Since that's a supported, not too exceptional, thing in a fileserver, that's kind of a problem. I've made revisions to the scripts to sort-of fix this (it's not a good full fix, but it avoids the error and keeps graphing disks that it knows about and which still exist, while ignoring disks it didn't originally know about or which do not still exist).

I think I may have posted that version earlier in this or some temp monitoring thread...yes, I think it's the version just before your message, in fact.
 

metalsaber

Cadet
Joined
Jun 12, 2016
Messages
2
I downloaded the temps-rrd-format.sh.txt file. I assume thats the one you meant? If so, I uploaded that to my directory (after removing the .txt extension). I'm still getting the same error.

Correction* I'm now getting this: Now instead of 6, it's down to 2.

Code:
[root@freenas ~]# /mnt/vol1/Dataset1/temperature/rrd.sh       
/mnt/vol1/Dataset1/temperature/temps-1min.rrd                                                   
ERROR: /mnt/vol1/Dataset1/temperature/temps-1min.rrd: expected 6 data source readings (got 2) from N        [root@freenas ~]#    
 
Joined
Jul 13, 2013
Messages
286
Progress! ;)

You may be the first person to use the version I uploaded here, so it's possible I forgot some bit in another file or something. If you like I can send you an image of my current repository, which will at least guarantee you have all the bits in sync and matching what I'm doing. As I hope I'm making clear now (and tried to before), these changes of mine are quick hacks to get things running for my personal needs, and I'm not currently in a position to commit the time to taking this over and pushing it up to higher quality standards (if the original author was even willing for that to happen). (If you want my current image, I'd need a way to contact you, by preference email. My email is quite easy to find, and is dd-b "at" dd-b "daht" net, translated to simple ascii, if you see what I mean.)

Short form: Willing to share what I've got, not able to promise good support :rolleyes: .

One area I know there is incompleteness in is the code for recognizing which drives to track. If you have different kinds of controller than I and the original author do, that may play into this.

(First run when no .rrd file exists creates an rrd file based on assumptions about what drives to track. Future runs query the drives listed in that rrd file only, and know how to return "no data" if necessary. Therefore what you report happening "can't happen". Therefore the problem is likely to be something on the order of the two different places that recognize what a "drive" is aren't fully in sync, or something. That's a classic example of why that sort of thing maybe shouldn't be replicated in two places, of course.)
 

captain118

Dabbler
Joined
Oct 1, 2014
Messages
21
Is there a concern if the script runs while a scheduled smart test (long or short) is in progress?
 

Xyrgh

Explorer
Joined
Apr 11, 2016
Messages
69
Excellent scripts, thanks.

Just one question, what is the best way to output to a different directory (ie. a subdirectory called 'output' underneath the main scripts directory). I want to serve these publicly and don't want to share the whole script folder.

/edit Nevermind, figured it out. Just added the directory after the '$BASEDIR' part in the graph script (right near the bottom). Now I can share just the output graph directory.

Thanks for the awesome scripts.
 
Last edited:

Jacopx

Patron
Joined
Feb 19, 2016
Messages
367
Last edited by a moderator:
Top