Temperature monitoring

pirateghost

Unintelligible Geek
Joined
Feb 29, 2012
Messages
4,219
this seems pretty freaking awesome. Thanks. but i get this error.
Code:
./rrd.sh: line 24: $1: unbound variable
based on the script contents, you need to supply a filename to output the data to.

edit: have a look at the crontab.txt for the example of how it should be run
 

Seren

Dabbler
Joined
Feb 18, 2016
Messages
22
Sure, it's attached since it's too big to post inline.

Yeah, it looks like I missed a section in the graphing script that was only looking for ada devices. I've pushed a change and it should work now.
 

Seren

Dabbler
Joined
Feb 18, 2016
Messages
22
@Jacopx: @danb35 is correct about how to install it. I'll update the Readme to be a bit clearer.

@i3luefire: @pirateghost is correct: You are running the script with no arguments. You need to give it a path to your .rrd file that will contain the temperature data (it will be created if it doesn't already exist). I've updated the script so you should get a better error message (it was exiting before saying what the problem was).
 
Joined
Jul 13, 2013
Messages
286
They don't appear in the web GUI.
Thanks for saying that explicitly. Saves me wondering and stumbling around. Since it's using the same tool and they look the same and all it's easy to just think they become part of the "reporting" system. Not that anything claims that; the mistake would be entirely on my part.

They're still very cool / useful !
 

i3luefire

Explorer
Joined
Jan 4, 2014
Messages
69
i updated to the most recent commit. working nicely now with all drives showing. not much data yet but i can see all the drives there now. Thanx for this Seren
lQXcY14.png
 
Joined
Jul 13, 2013
Messages
286
Updated readme looks entirely clear. And I seem to have it installed; though so far I've only had the data-gathering script run, I don't have anything to actually look at yet.

For my uses etc., home (semi-pro video production and lots of photography) or small business uses, the environmental conditions are one of the key risks, so anything that helps me monitor those is contributing to the safety of my data.

Thank you!
 
Joined
Jul 13, 2013
Messages
286
In rrd.sh, all the error cases use "echo" to output messages, then exit with a non-zero status.

However, the installation instructions direct me to check the "redirect STDOUT" box when creating the tasks to run these scripts. This causes " > /dev/null " to be appended to the crontab command. So as I understand it, any error messages your scripts echo to STDOUT are getting thrown away....

Seems like you should be redirecting your echos to STDERR? (I haven't actually had errors, so this is theory, not tested practice)
 
Last edited:
Joined
Jul 13, 2013
Messages
286
Yep, working away happy as a clam. Thanks!
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Yeah, it looks like I missed a section in the graphing script that was only looking for ada devices. I've pushed a change and it should work now.
This seems to have done the trick. Excellent!
 

Jacopx

Patron
Joined
Feb 19, 2016
Messages
367
What is wrong?? I have already check the dataset permission! :(
 

Attachments

  • Screenshot 2016-05-06 10.17.16.png
    Screenshot 2016-05-06 10.17.16.png
    203.4 KB · Views: 776

Jacopx

Patron
Joined
Feb 19, 2016
Messages
367

Jacopx

Patron
Joined
Feb 19, 2016
Messages
367
My Xeon have 8 cores with Hypertreading (16 total) and the graphs is really hard to read, assuming that the 95% of times the temperature difference between 2 threads in the same core is the same we can try to use only one of this values (or made an average) to remove some lines from the graphs, it sounds good or I'm crazy? [emoji4]

29e7ef52feb025350f6ecadb602ee601.jpg
 
Last edited:
Joined
Jul 13, 2013
Messages
286
This has gotta be me somehow. But I'm stumped; maybe somebody else using this code can see what I got wrong?

Your graphing scripts are working fine on one server (fsfs), but I can't get them to work on another (rebma). Both are running 9.10-Stable latest (well, I actually haven't checked yet today, but they were yesterday; they could be one release out of date now I suppose).

On Rebma, the cron jobs are running:

Code:
From /var/log/cron:
May  7 15:35:00 rebma /usr/sbin/cron[9361]: (root) CMD (PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /mnt/z01/info/temps/rrd-graph.ssh /mnt/z01/info/temps/temp-5min.rrd)
May  7 15:35:00 rebma /usr/sbin/cron[9363]: (root) CMD (PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /mnt/z01/info/temps/rrd.sh /mnt/z01/info/temps/temp-5min.rrd)


Note I'm currently not redirecting stdout to /dev/null; however, no email is turning up at the root email address.

Here's the directory the scripts are in (and hence the pngs should appear in).

Code:
[ddb@rebma ~]$ ls -al /mnt/z01/info/temps/
total 104
drwxr-xr-x  2 root  wheel       7 May  7 14:03 .
drwxr-xr-x  3 root  wheel       3 May  7 13:22 ..
-rwxr-xr-x  1 root  wheel    6033 May  7 13:35 rrd-graph.sh
-rwxr-xr-x  1 root  wheel    2264 May  7 13:35 rrd.sh
-rw-r--r--  1 root  wheel  389264 May  7 14:04 temps-5min.rrd
-rwxr-xr-x  1 root  wheel    1184 May  7 13:35 temps-rrd-format.sh
-rwxr-xr-x  1 root  wheel     754 May  7 13:35 temps-simple.sh


The .rrd file that is present (with the 14:04 timestamp) resulted from my running rrd.sh manually (as root) once. That worked, and created the file. But there are no .png files, and this has been running for over an hour.

Protections look right to me. No "+", so no ACLs to possibly confuse things.

Oh, and temps-simple.sh runs manually fine.

EDITED: I do see that the rrd file from my manual run has the wrong name, so it's no surprise that the graphing run isn't picking that up. But this does nothing I can see to explain why rrd.sh isn't generating a file.

EDITED: kept thrashing, restarted, thrashed some more, slept. Working now. Wish I knew what had been wrong.
 
Last edited:
Joined
Jul 13, 2013
Messages
286
So, how much data is there supporting your line at 40C for "max safe temperature" (drives)?

temps-5min-drives.png


Among many other things, these graphs point out differences in cooling in different spots in the cases, too :)
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
The 40C max has been discussed quite a bit around here; IIRC it comes from a study Google did of hard drive reliability, where it was noted that failure rates increased significantly when the drives were run above that temp.
 
Joined
Jul 13, 2013
Messages
286
The Google study finds some effects at 45C for newer drives, maybe at 40C for older drives. Backblaze, on the other hand, finds few correlations (weak ones for some Seagate drives); however they're mostly looking at lower temperatures, so that doesn't directly refute the Google findings.

I guess it's remaining complicated. No real surprise there.
 
Joined
Jul 13, 2013
Messages
286
And maybe the temp scale for CPU temps should be changed. At least according to this page, temps well above 50 are quite ordinary, and the "idle" temps are often around 35C.

Which is good since one of mine is running consistently above 50C.
temps-5min-cpus.png

(by poking with the simple-temps script, I see that cpu1 drops below 50 occasionally, the other three cores do not)
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Yes, I noticed the same thing. Look through the scripts for the command that makes the graph, and remove the --strict parameter from it.
 
Top