Scripts to report SMART, ZPool and UPS status, HDD/CPU T°, HDD identification and backup the config

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
You can use the test POH, the current POH and the current timestamp to calculate the test timestamp, but it's too much trouble for me to add it because it's not useful enough.

I try to keep the script as simple as possible (KISS principle) since it's a monitoring script, the last thing you want is a failed script that doesn't monitor properly what it should. It's also why the if/else are hwo they are, it's to default to the worst case in case something goes really wrong ;)
 

hyram

Dabbler
Joined
Jan 16, 2015
Messages
10
Ok... figured out a way to do what I want. Not sure how "clean" it is.

Don't use the built in SMART tests, but rather write you're own script that preforms the SMART test and records the test completion time that is output when smartctl is run to a separate file:

Code:
#!/bin/bash

### Parameters ###
timelogfile="/tmp/smart_time_log.tmp"
drives="ada0 ada1 ada2 ada3 ada4 ada5"

### Delete any old log file ###
rm ${timelogfile}

###### Run tests ######
for drive in $drives
do
    (
        echo -n -e ${drive}" Short test completed "
        smartctl -t short /dev/${drive} | grep 'after' | tail -c 25
    ) >> ${timelogfile}
done


You need separate scripts for long and short tests. Add them as Cron jobs in place of the SMART tests.

Then modify BiduleOhm's script to read this file and append the time to each drive's info (added lines 6 and 21 below):

Code:
.
.
.
### Parameters ###
logfile="/tmp/smart_report.tmp"
timelogfile="/tmp/smart_time_log.tmp"
.
.
.
###### for each drive ######
for drive in $drives
do
    brand=`smartctl -i /dev/${drive} | grep "Model Family" | awk '{print $3, $4, $5}'`
    serial=`smartctl -i /dev/${drive} | grep "Serial Number" | awk '{print $3}'`
    (
        echo ""
        echo "########## SMART status report for ${drive} drive (${brand}: ${serial}) ##########"
        smartctl -n never -H -A -l error /dev/${drive}
        smartctl -n never -l selftest /dev/${drive} | grep "# 1 \|Num" | cut -c6-
        echo ""
        grep ${drive} ${timelogfile} | cut -d " " -f2-
        echo ""
    ) >> ${logfile}
done
.
.
.


The output would look like this:

Code:
.
.
.
No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Short offline       Completed without error       00%      2337         -

Short test completed Mon Mar 23 17:49:48 2015
.
.
.


I suppose this falls into the "to each his own" category.
 

ovizii

Patron
Joined
Jun 30, 2014
Messages
435
getting an error running the zpool script on a new freenas install. is this because I probably didn't run any scrubs yet?

Code:
freenas# ./zpool-report.sh
Failed conversion of ``2015-Mar-30_19:25:25'' using format ``%Y-%b-%e_%H:%M:%S''
date: illegal time format
usage: date [-jnRu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ...
            [-f fmt date | [[[[[cc]yy]mm]dd]HH]MM[.ss]] [+format]
./zpool-report.sh: arithmetic expression: expecting primary: "((1428229269 - ) + 43200) / 86400"
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Nope, there's a test in the script and I tested it on a non-scrubbed pool (it's even in the example result).

The error says clearly that it can't convert 2015-Mar-30_19:25:25 with the %Y-%b-%e_%H:%M:%S format. Why? I don't know, everything seems right.

Can you post the output of cat -e zpool-report.sh please? (in between code tags or in a pastebin)
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Yeah, I don't see anything wrong.

What version of FreeNAS you use? (the complete one, with the update number in the line, you can see it in the system tab of the GUI)
 

ikonspirasi

Cadet
Joined
Apr 3, 2015
Messages
9
wow many thanks for the scripts, i have tried the "Display CPU and HDD temperatures", here are the results:
Code:
freenas# sh cpudisplay.sh

cpu.0.temperature: 47 C
cpu.1.temperature: 46 C
cpu.2.temperature: 48 C
cpu.3.temperature: 48 C

ada0 WD-WCC4ME0TENL1: 36 C
ada1 WD-WCC4M7L93SC4: 37 C
ada2 WD-WCAZAD467709: 35 C
ada3 WD-WMC4N1223410: 36 C
ada4 Z4Z1XZFF       : 38 C
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Any chance you'd be willing to point me in the direction for handling SAS drives? Temps and serial numbers aren't showing.

Code:
[root@freenas1] ~# sh temps.sh

cpu.0.temperature: 47 C
cpu.1.temperature: 41 C
cpu.2.temperature: 45 C
cpu.3.temperature: 42 C
cpu.4.temperature: 45 C
cpu.5.temperature: 42 C
cpu.6.temperature: 40 C
cpu.7.temperature: 42 C

da0                :  C
da1                :  C
da2                :  C
da3                :  C
da4                :  C
da5                :  C
da6                :  C
da7                :  C

[root@freenas1] ~# smartctl -i /dev/da1
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               WD
Product:              WD4001FYYG-01SL3
Revision:             VR07
Compliance:           SPC-4
User Capacity:        4,000,787,030,016 bytes [4.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x50000c0f01e1273c
Serial number:        WMC1F8675309
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Apr  6 17:54:45 2015 EDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

[root@freenas1] ~# smartctl -A /dev/da1
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
Current Drive Temperature:     30 C
Drive Trip Temperature:        69 C

Manufactured in week 10 of year 2014
Specified cycle count over device lifetime:  1048576
Accumulated start-stop cycles:  415
Specified load-unload count over device lifetime:  1114112
Accumulated load-unload cycles:  0
Elements in grown defect list: 0

[root@freenas1] ~# 
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
As cyberjock said it in one of his posts in this thread, the SAS drives have different SMART infos. However if you just want the temp it's pretty easy, replace these lines:
Code:
serial=`smartctl -i /dev/${drive} | grep "Serial Number" | awk '{print $3}'`
temp=`smartctl -A /dev/${drive} | grep "Temperature_Celsius" | awk '{print $10}'`


By these lines:
Code:
serial=`smartctl -i /dev/${drive} | grep "Serial number" | awk '{print $3}'`
temp=`smartctl -A /dev/${drive} | grep "Current Drive Temperature" | awk '{print $4}'`


As an aside there is no need to use sh to execute the script, you can use ./your_script.sh directly ;)
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Thx! I forgot the capital "N" in number and I didn't know to change the print $10 to $4 (I did change the grep term correctly though). What are the odds of 2 failures. :smile: And, yes, I am familiar with ./script.sh. I was just too lazy at the time chmod it.

Thanks for the help!
 

ovizii

Patron
Joined
Jun 30, 2014
Messages
435
Yeah, I don't see anything wrong.

What version of FreeNAS you use? (the complete one, with the update number in the line, you can see it in the system tab of the GUI)
Sorry for the delay, was out of the house. Btw. this used to work for me, I just had recently messed up my configuration and had to reinstall freeNAS and since then it gave me this issue.

BuildFreeNAS-9.3-STABLE-201503270027
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
I'm on an "old" update (I stopped updating because of the big mess in the updates from a few weeks ago and I haven't had the time to update recently...) so it's maybe that, I can't test for now.

It can also be that you have a corrupted date bin.
 

ovizii

Patron
Joined
Jun 30, 2014
Messages
435
I've gone to System => Update => Verify Install and it all seems OK.
Hm, I can wait until the next update and see if it starts working again.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Ok, thanks for the feedback. There's something really weird going on and unless I can reproduce it, I can't fix it. Let's say it's ok for now ;)
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
I've added the SAS version of the temperatures and drives identification scripts ;)

The SAS version of the SMART script is on the way :)
 

Trapizomba

Dabbler
Joined
Mar 23, 2015
Messages
24
Hi...

I made a script to alert high temperatures of the HDDs based on S.M.A.R.T.... The goal of my script is to be more hardware independent as possible...

But unfortunately I didn't tested it with SAS disks

Code:
#!/bin/bash

### Define parametros de alerta ###
temp_max=35  # Temperatura em graus celcius
email="seu_endereco_de_email"  # E-mail do destinatario do alerta
assunto="FreeNAS: Alerta de temperatura HDD" # Assunto do email

# Define variavel de graus celcius
CEL=$'\xe2\x84\x83'

# Monta array de discos do sistema
arrdiscs=($(camcontrol devlist | awk '{print substr($NF, 8, length($NF)-8)}'))

# Cria array para alerta de temperatura
alertdiscs=()

# Recupera a temperatura dos discos com base no arrdiscs
for i in "${arrdiscs[@]}"
do
temp=("$(smartctl -A "/dev/$i" | egrep ^194 | awk '{print $10}')")
if [ -n "$temp" ] && [ $temp -gt $temp_max ]; then
alertdiscs+=("[$i]: $temp$CEL")
fi
done

# Compara os valores de temperatura com a variavel temp_max e gera alerta
if [ ${#alertdiscs[@]} -gt 0 ]; then
# Envia email de alerta
printf "%b\n" "Discos com temperatura acima do limite ($temp_max$CEL):\n${alertdiscs[@]}" | /usr/bin/mail -s "$assunto" "$email"
fi


So... No need to define the name of the HDD devices... :)

With this script an email will be send ONLY if the temperature of one or more HDDs is higher than that specified in the max_temp variable. In the message body will be listed only the hard drives with high temperatures.

Hope it help someone... ;)
 
Last edited:

alexg

Contributor
Joined
Nov 29, 2013
Messages
197
@Bidule0hm

Thank you for these scripts. I had to make minor change in your zpool report script. There is no "-p" option and need to remove percent sign


Code:
used=`zpool list -H -o capacity ${pool} | cut -d'%' -f1`
 
Top