Scripts to report SMART, ZPool and UPS status, HDD/CPU T°, HDD identification and backup the config

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

BigMike

Dabbler
Joined
Jan 23, 2017
Messages
28
Would it be good to add a new section to the scripts, to monitor FreeNAS server memory? We know it's important to use ECC memory, and users do generally test it when first setting up a new server (and then suspect it when weird problems pop up), but does anyone actually scan logs routinely looking for indicators that ECC memory has detected an error? Wouldn't it feel good if this script looked for ECC memory problems?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Would it be good to add a new section to the scripts, to monitor FreeNAS server memory? We know it's important to use ECC memory, and users do generally test it when first setting up a new server (and then suspect it when weird problems pop up), but does anyone actually scan logs routinely looking for indicators that ECC memory has detected an error? Wouldn't it feel good if this script looked for ECC memory problems?
I haven't actually had a memory error on my system so I don't know exactly how FreeNAS would respond, but I would expect it to give you an error message in a similar fashion to what it does when we have a hard drive failure.

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

BigMike

Dabbler
Joined
Jan 23, 2017
Messages
28
I'm not sure that FreeNAS would alert me of ECC memory problems like it would with a disk failure. I've been assuming it was up to the administrator to review logs looking for warning signs - which means I looked once, but I'm not going to keep that up ... I just looked at this old'ish thread:

https://lists.freebsd.org/pipermail/freebsd-performance/2012-April/004585.html

where they discuss what kind of things might logged to indicate ECC memory problems. If FreeNAS isn't already looking for these indicators it would be a great addition!
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I'm not sure that FreeNAS would alert me of ECC memory problems like it would with a disk failure. I've been assuming it was up to the administrator to review logs looking for warning signs - which means I looked once, but I'm not going to keep that up ... I just looked at this old'ish thread:

https://lists.freebsd.org/pipermail/freebsd-performance/2012-April/004585.html

where they discuss what kind of things might logged to indicate ECC memory problems. If FreeNAS isn't already looking for these indicators it would be a great addition!
Where I work, we have a server room with a couple hundred servers and in the past six years we have only had 3 or 4 sticks of memory need to be replaced. It is not a common thing to get memory errors with quality memory and the main reason for ECC memory is to make you aware of it, so you can take action, because without ECC there is no way to know it is even happening until your system crashes. The other reason is to try to prevent your system from crashing to begin with because a single bit error will be corrected on the fly.
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
I'd never seen this thread before, looks really useful.
When I get back I'll configure the temperature one. I've been trying to get that looked at for a long time.
https://redmine.ixsystems.com/issues/23360 (new job, old job deleted sadly)
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I'd never seen this thread before, looks really useful.
When I get back I'll configure the temperature one. I've been trying to get that looked at for a long time.
https://redmine.ixsystems.com/issues/23360 (new job, old job deleted sadly)
I have a system at work that has no access to email so I modified the script to just write a file where I can go check it.
 

karearea

Cadet
Joined
Nov 13, 2016
Messages
1
Thanks, I didn't look in the discussion there.

These are the two lines of the script that need to be changed:
Code:
#	scrubErrors="$(zpool status "$pool" | grep "scan" | awk '{print $8}')"
	scrubErrors="$(zpool status "$pool" | grep "scan" | awk '{print $10}')"

#	scrubDate="$(zpool status "$pool" | grep "scan" | awk '{print $15"-"$12"-"$13"_"$14}')"
	scrubDate="$(zpool status "$pool" | grep "scan" | awk '{print $17"-"$14"-"$15"_"$16}')"

Worked a treat, thanks Chris
 

LIGISTX

Guru
Joined
Apr 12, 2015
Messages
525
Not sure if this would be a relatively simple script, but I am a huge noob so I wouldn't know how to go about writing it regardless of simplicity.

I have a dataset that I allotted to a friend to upload his pictures for me to edit in lightroom, but since I am only mostly a nice guy, I gave him a size limit of 500 GB. Would it be possible to modify the zpool script to work for a specific dataset?

It would be handy to be able to send emails about the datasets usage. Our FX cameras can chew through storage space in a hurry!
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
I have a system at work that has no access to email so I modified the script to just write a file where I can go check it.
Just to confirm the disk temp monitoring, this is still possible with FreeNAS 11.1 U1? I really want better monitoring of my disk heat, a lot better. #1 issue with my system.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Just to confirm the disk temp monitoring, this is still possible with FreeNAS 11.1 U1?
I don't know what temperature monitoring you are speaking of, I use a shell script ./get_hdd_temp.sh and that still works the same as ever.
Ex:
Code:
   da0:   29C [2.00TB] Z4Z2X			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
   da1:   28C [2.00TB] W4Z22			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
   da2:   29C [4.00TB] Z307P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da3:   29C [4.00TB] Z307P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da4:   29C [4.00TB] Z307P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da5:   29C [4.00TB] Z307P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da6:   29C [2.00TB] Z4Z3B			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
   da7:   29C [4.00TB] Z305P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da8:   30C [4.00TB] Z305M			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da9:   29C [2.00TB] Z4Z3A			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
  da10:   29C [2.00TB] Z4Z3A			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
  da11:   29C [2.00TB] W4Z29			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
  ada0:   27C [5.00TB] W4J1A			 Seagate Desktop HDD.15 (ST5000DM000-1FK178)
  ada1:   27C [5.00TB] W4J19			 Seagate Desktop HDD.15 (ST5000DM000-1FK178)
  ada2:   27C [5.00TB] W4J1A			 Seagate Desktop HDD.15 (ST5000DM000-1FK178)
  ada3:   28C [5.00TB] W4J1D			 Seagate Desktop HDD.15 (ST5000DM000-1FK178)
  ada4:   35C [40.0GB] NZ05T7725		 FUJITSU MHW2040BS
  ada5:   35C [40.0GB] NZ05T7725		 FUJITSU MHW2040BS

Do you need the source code for that?
 
Last edited:

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
I didn't tested my scripts with FreeNAS 11 as I'm still on 9.3 but it should work as I get the info from SMART, not from FreeNAS, it should be the same. Let me know if it doesn't.
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
I don't know what temperature monitoring you are speaking of, I use a shell script ./get_hdd_temp.sh and that still works the same as ever.
Ex:
Code:
   da0:   29C [2.00TB] Z4Z2X			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
   da1:   28C [2.00TB] W4Z22			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
   da2:   29C [4.00TB] Z307P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da3:   29C [4.00TB] Z307P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da4:   29C [4.00TB] Z307P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da5:   29C [4.00TB] Z307P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da6:   29C [2.00TB] Z4Z3B			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
   da7:   29C [4.00TB] Z305P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da8:   30C [4.00TB] Z305M			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da9:   29C [2.00TB] Z4Z3A			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
  da10:   29C [2.00TB] Z4Z3A			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
  da11:   29C [2.00TB] W4Z29			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
  ada0:   27C [5.00TB] W4J1A			 Seagate Desktop HDD.15 (ST5000DM000-1FK178)
  ada1:   27C [5.00TB] W4J19			 Seagate Desktop HDD.15 (ST5000DM000-1FK178)
  ada2:   27C [5.00TB] W4J1A			 Seagate Desktop HDD.15 (ST5000DM000-1FK178)
  ada3:   28C [5.00TB] W4J1D			 Seagate Desktop HDD.15 (ST5000DM000-1FK178)
  ada4:   35C [40.0GB] NZ05T7725		 FUJITSU MHW2040BS
  ada5:   35C [40.0GB] NZ05T7725		 FUJITSU MHW2040BS

Do you need the source code for that?

Can that thing run every ... 30 minutes and dump it to some kind of CSV file or something so I can plot stuff or anything?
I have 6 disks in my system and I'd love to get a better idea of their heat throughout the day(s) especially in summer like it is now.

Also for goodness sakes, this should be baked in to the OS as default by now :(
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Can that thing run every ... 30 minutes and dump it to some kind of CSV file or something so I can plot stuff or anything?
I have 6 disks in my system and I'd love to get a better idea of their heat throughout the day(s) especially in summer like it is now.

Also for goodness sakes, this should be baked in to the OS as default by now :(
There have been people asking for that to be in the GUI as a bar graph of some sort

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Also for goodness sakes, this should be baked in to the OS as default by now :(
Yes, many people have complained that this isn't in the GUI. Many others have asked in response why it should be. "Synology does it" isn't really an adequate answer to this question--what value does it add to see a graph of disk temps? The design philosophy seems to be, set the alerts where you want them, and the system will alert you if the disks reach those temps.

OTOH, they've added a graph of CPU temps, so maybe they're moving on this issue.
 

jasonhalljax

Cadet
Joined
Dec 17, 2017
Messages
5
I find the scripts very helpful and use them all, but I find that on the smart script, my drives are all in a always-warned state as the last test age keeps counting up. I assume this is the last time that a smart test was done. Mine currently says 10 days per this morning's email, but I know two tests have run in the last 10 days (a short and a long test) based on my schedule, and the power-on hours are going up in the emails. Any idea what's going on here? Thanks.
 

LIGISTX

Guru
Joined
Apr 12, 2015
Messages
525
OTOH, they've added a graph of CPU temps, so maybe they're moving on this issue.

Wait, they have? What version was this added? My test box is on 11.1 iirc and I don’t recall seeing that... can’t login and check at the moment because I turned it off and am not home to turn it on and check lol.

As far as the drive temp issue goes, I am on the “give us an option to turn on a graph” side of the argument. I would be interested to see what temps my drives hit during heavy usage rather then just wait for my smart emails...



Sent from my iPhone using Tapatalk
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
It's in 11.1, old GUI:
upload_2018-1-22_14-4-21.png
 

Green750one

Dabbler
Joined
Mar 16, 2015
Messages
36

LIGISTX

Guru
Joined
Apr 12, 2015
Messages
525
Crazy. I have been using 11.1 for a few weeks as I have set up ZFS replication from production box to backup box and I have never noticed that graph... crazy.

Thanks for the wake up call.


Sent from my iPhone using Tapatalk
 
Top