Scripts to report SMART, ZPool and UPS status, HDD/CPU T°, HDD identification and backup the config

nojohnny101

Wizard
Joined
Dec 3, 2015
Messages
1,477
both commands give same output. strange, I've never had those errors before.

/dev/xpt0 control device couldn't opened: Permission denied

Unable to get CAM device list

/dev/ada0: Unable to detect device type

Please specify device type with the -d option.


Use smartctl -h to get a usage summary

I should say I am not running them as root but the temps.sh file is owned by the user trying to execute it. Is root required to access smart data from the drives regardless?
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
IIRC you need root permissions to read the SMART data. Can you retry the command as root or with sudo?
 

nojohnny101

Wizard
Joined
Dec 3, 2015
Messages
1,477
that was it. I knew it as soon as I typed it. Thanks @Bidule0hm you're contribution to this forum is noted!

sudo worked.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
You're welcome ;)
 
Joined
Oct 2, 2014
Messages
925
@Bidule0hm , is it possible to make the Display drives identification infos and Display CPU and HDD temperatures scripts to send to email like the zpool status and SMART report scripts do?
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Yes, it's pretty simple: instead of echoing the things to the terminal you echo them in a temp file and you then use sendmail to send the content of that file, look at how I did it in the others scripts ;)
 

Johannez

Explorer
Joined
Jan 25, 2016
Messages
59
Bidule0hm thank you for updating the zpool script, it works now also when a scrub has been cancelled.

I have a question, i am using your script for hdd temp and device id but i don't understand why it (only) sends a email when the cron job starts on the given time.
In the script there is no email information. When i run the script with "Run Now" it does not send a email.

I am trying to understand this process a bit more.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
It's not the script but FreeNAS who send the mail. It depends on if the script print something and if the options of output redirection are un/checked in the GUI for this CRON.
 

Johannez

Explorer
Joined
Jan 25, 2016
Messages
59
Thank you that clears it up a bit, weird tough i don't get a mail when i run "Run Now" but i know where to look now so i will check that out further.
 

Johannez

Explorer
Joined
Jan 25, 2016
Messages
59
Bidule0hm after changing the script it works but i get 2 mails, one with the working zpool information where it shows where the scrub has been cancelled, and one email with this information in it:

Code:
date: illegal option -- -
usage: date [-jnRu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ...
            [-f fmt date | [[[[[cc]yy]mm]dd]HH]MM[.ss]] [+format]

Subject of the email:
Code:
Cron <root@freenas> PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" sh /root/scripts/zpool.sh > /dev/null


Soon a new scrub will start so the problem will probably disappear then but i still wanted to share this.
 

Xyrgh

Explorer
Joined
Apr 11, 2016
Messages
69
I have a couple of questions.

First off, I changed the tempWarn and tempCrit to 45 and 50 respectively, but when the report runs, it still shows '?' for all the drives, even for the ones that are within range (both the old and new range), ie:

Code:
########## SMART status report summary for all drives ##########

+------+---------------+----+-----+-----+-----+-------+-------+--------+------+------+------+-------+----+
|Device|Serial         |Temp|Power|Start|Spin |ReAlloc|Current|Offline |UDMA  |Seek  |High  |Command|Last|
|      |               |    |On   |Stop |Retry|Sectors|Pending|Uncorrec|CRC   |Errors|Fly   |Timeout|Test|
|      |               |    |Hours|Count|Count|       |Sectors|Sectors |Errors|      |Writes|Count  |Age |
+------+---------------+----+-----+-----+-----+-------+-------+--------+------+------+------+-------+----+
|ada1 ?|CVLT6AAAAAAAAAAAAAA|  |  104|     |     |      0|       |        |     0|   N/A|   N/A|    N/A|   4|
|da0 ? |WD-AAAAAAAAAAAA| 41 |  922|   25|    0|      0|      0|       0|     0|   N/A|   N/A|    N/A|   9|
|da1 ? |WD-AAAAAAAAAAAA| 36 |  656|   16|    0|      0|      0|       0|     0|   N/A|   N/A|    N/A|   9|
|da2 ? |WD-AAAAAAAAAAAA| 39 |  658|   20|    0|      0|      0|       0|     0|   N/A|   N/A|    N/A|   9|
|da3 ? |WD-AAAAAAAAAAAA| 38 |  300|   13|    0|      0|      0|       0|     0|   N/A|   N/A|    N/A|   5|
|da4 ? |WD-AAAAAAAAAAAA| 39 |  299|    6|    0|      0|      0|       0|     0|   N/A|   N/A|    N/A|   5|
|da5 ? |WD-AAAAAAAAAAAA| 43 |  658|   17|    0|      0|      0|       0|     0|   N/A|   N/A|    N/A|   9|
|da6 ? |WD-AAAAAAAAAAAA| 43 |  923|   22|    0|      0|      0|       0|     0|   N/A|   N/A|    N/A|   9|
|da7 ? |WD-AAAAAAAAAAAA| 41 |  300|    9|    0|      0|      0|       0|     0|   N/A|   N/A|    N/A|   5|
|da8 ? |WD-AAAAAAAAAAAA| 41 |  658|   17|    0|      0|      0|       0|     0|   N/A|   N/A|    N/A|   9|
|da9 ? |WD-AAAAAAAAAAAA| 44 |  659|   20|    0|      0|      0|       0|     0|   N/A|   N/A|    N/A|   9|
|da10 ?|WD-AAAAAAAAAAAA| 45 |  922|   20|    0|      0|      0|       0|     0|   N/A|   N/A|    N/A|   9|
|da11 ?|WD-AAAAAAAAAAAA| 41 |  300|   17|    0|      0|      0|       0|     0|   N/A|   N/A|    N/A|   5|
|da12  |               |  |     |     |     |       |       |        |      |   N/A|   N/A|    N/A|   0|
|da13  |               |  |     |     |     |       |       |        |      |   N/A|   N/A|    N/A|   0|
+------+---------------+----+-----+-----+-----+-------+-------+--------+------+------+------+-------+----+


Also, as you can see, ada1 is an SSD, but it's not reporting the temperature or other values correctly, this is the output from the SSD (ada1, Intel):

Code:
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       104
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       7
170 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always       -       0
171 Unknown_Attribute       0x0032   100   100   010    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   100   100   010    Old_age   Always       -       0
174 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       3
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   090    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   020   035   000    Old_age   Always       -       20 (Min/Max 15/35)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       3
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
225 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always       -       10162
226 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always       -       0
227 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always       -       0
228 Power-off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0
232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       10162
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       20290
249 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       199
252 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       1


I believe the SSD issues are related to the 'uknown SSD attributes' and also that temperature is name 'Airflow_temperature_celsius'.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
That's because the last test age value is over the threshold :)

The script isn't designed for SSDs and I'll probably not have the time to make that kind of changes, sorry.
 

Xyrgh

Explorer
Joined
Apr 11, 2016
Messages
69
That's because the last test age value is over the threshold :)

The script isn't designed for SSDs and I'll probably not have the time to make that kind of changes, sorry.

Thanks! The server has been off for a week so that makes sense.

And no worries with the SSD, I just chucked it in there just because. I'll leave it in there purely for the reallocated sector count.

Thanks for the prompt response.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
You're welcome ;)
 

lazybones

Dabbler
Joined
Jan 16, 2016
Messages
45
Hi there!

I have replaced the drives section with the adaptive lines but when I run the script I have a "?" beside the devicename, this kinda makes me wonder if it's wrong?
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
That means at least one parameter is over its warning threshold; quote from the OP:
If a drive is over a chosen limit temperature, or has any reallocated, pending or uncorrectable sectors, or if the last test age is over testAgeWarn then the chosen warning symbol will be added to the end of the device name. If it is over the critical temperature or if any of the reallocated, pending or uncorrectable sectors value is over sectorsCrit then it's the critical symbol that will be added instead of the warning symbol.

NB: by default the warning symbol is '?' and the critical symbol is '!'.
 

VladTepes

Patron
Joined
May 18, 2016
Messages
287
Would be cool if these scripts were implemented in FreeNAS 10 (when it comes) via the GUI... just tickboxes etc. You know, for idiots. Like me :)
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
NB: for now those scripts aren't tested with FreeNAS 10 so be careful if you use them on FN 10 to test them and be sure they work.
 

VladTepes

Patron
Joined
May 18, 2016
Messages
287
Twould be even better if implemented in the FreeNAS 9.10 stable build GUI.....
 
Top