Scripts to report SMART, ZPool and UPS status, HDD/CPU T°, HDD identification and backup the config

Chris Moore · Dec 16, 2017

Amsoil_Jim said:
I recently update to version 11.1 and this morning my cronjob ran and I received this email

The script you are using is slightly different from the one I have but I updated mine and posted it here:
https://github.com/ChrisMoore2/FreeNAS-scripts/blob/master/zpool_report.sh

You might be able to compare them and see what needs to be changed.

BigMike · Dec 17, 2017

Would it be good to add a new section to the scripts, to monitor FreeNAS server memory? We know it's important to use ECC memory, and users do generally test it when first setting up a new server (and then suspect it when weird problems pop up), but does anyone actually scan logs routinely looking for indicators that ECC memory has detected an error? Wouldn't it feel good if this script looked for ECC memory problems?

Chris Moore · Dec 17, 2017

BigMike said:
Would it be good to add a new section to the scripts, to monitor FreeNAS server memory? We know it's important to use ECC memory, and users do generally test it when first setting up a new server (and then suspect it when weird problems pop up), but does anyone actually scan logs routinely looking for indicators that ECC memory has detected an error? Wouldn't it feel good if this script looked for ECC memory problems?

I haven't actually had a memory error on my system so I don't know exactly how FreeNAS would respond, but I would expect it to give you an error message in a similar fashion to what it does when we have a hard drive failure.

Sent from my SAMSUNG-SGH-I537 using Tapatalk

BigMike · Dec 18, 2017

I'm not sure that FreeNAS would alert me of ECC memory problems like it would with a disk failure. I've been assuming it was up to the administrator to review logs looking for warning signs - which means I looked once, but I'm not going to keep that up ... I just looked at this old'ish thread:

https://lists.freebsd.org/pipermail/freebsd-performance/2012-April/004585.html

where they discuss what kind of things might logged to indicate ECC memory problems. If FreeNAS isn't already looking for these indicators it would be a great addition!

Chris Moore · Dec 18, 2017

BigMike said:
I'm not sure that FreeNAS would alert me of ECC memory problems like it would with a disk failure. I've been assuming it was up to the administrator to review logs looking for warning signs - which means I looked once, but I'm not going to keep that up ... I just looked at this old'ish thread:

https://lists.freebsd.org/pipermail/freebsd-performance/2012-April/004585.html

where they discuss what kind of things might logged to indicate ECC memory problems. If FreeNAS isn't already looking for these indicators it would be a great addition!

Where I work, we have a server room with a couple hundred servers and in the past six years we have only had 3 or 4 sticks of memory need to be replaced. It is not a common thing to get memory errors with quality memory and the main reason for ECC memory is to make you aware of it, so you can take action, because without ECC there is no way to know it is even happening until your system crashes. The other reason is to try to prevent your system from crashing to begin with because a single bit error will be corrected on the fly.

diskdiddler · Dec 18, 2017

I'd never seen this thread before, looks really useful.
When I get back I'll configure the temperature one. I've been trying to get that looked at for a long time.
https://redmine.ixsystems.com/issues/23360 (new job, old job deleted sadly)

Chris Moore · Dec 18, 2017

diskdiddler said:
I'd never seen this thread before, looks really useful.
When I get back I'll configure the temperature one. I've been trying to get that looked at for a long time.
https://redmine.ixsystems.com/issues/23360 (new job, old job deleted sadly)

I have a system at work that has no access to email so I modified the script to just write a file where I can go check it.

karearea · Dec 19, 2017

Chris Moore said:
Thanks, I didn't look in the discussion there.

These are the two lines of the script that need to be changed:
Code:
# scrubErrors="$(zpool status "$pool" | grep "scan" | awk '{print $8}')" scrubErrors="$(zpool status "$pool" | grep "scan" | awk '{print $10}')" # scrubDate="$(zpool status "$pool" | grep "scan" | awk '{print $15"-"$12"-"$13"_"$14}')" scrubDate="$(zpool status "$pool" | grep "scan" | awk '{print $17"-"$14"-"$15"_"$16}')"

Worked a treat, thanks Chris

LIGISTX · Dec 26, 2017

Not sure if this would be a relatively simple script, but I am a huge noob so I wouldn't know how to go about writing it regardless of simplicity.

I have a dataset that I allotted to a friend to upload his pictures for me to edit in lightroom, but since I am only mostly a nice guy, I gave him a size limit of 500 GB. Would it be possible to modify the zpool script to work for a specific dataset?

It would be handy to be able to send emails about the datasets usage. Our FX cameras can chew through storage space in a hurry!

diskdiddler · Jan 19, 2018

Chris Moore said:
I have a system at work that has no access to email so I modified the script to just write a file where I can go check it.

Just to confirm the disk temp monitoring, this is still possible with FreeNAS 11.1 U1? I really want better monitoring of my disk heat, a lot better. #1 issue with my system.

Chris Moore · Jan 19, 2018

diskdiddler said:
Just to confirm the disk temp monitoring, this is still possible with FreeNAS 11.1 U1?

I don't know what temperature monitoring you are speaking of, I use a shell script ./get_hdd_temp.sh and that still works the same as ever.
Ex:

Code:

   da0:   29C [2.00TB] Z4Z2X			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
   da1:   28C [2.00TB] W4Z22			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
   da2:   29C [4.00TB] Z307P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da3:   29C [4.00TB] Z307P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da4:   29C [4.00TB] Z307P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da5:   29C [4.00TB] Z307P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da6:   29C [2.00TB] Z4Z3B			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
   da7:   29C [4.00TB] Z305P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da8:   30C [4.00TB] Z305M			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da9:   29C [2.00TB] Z4Z3A			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
  da10:   29C [2.00TB] Z4Z3A			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
  da11:   29C [2.00TB] W4Z29			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
  ada0:   27C [5.00TB] W4J1A			 Seagate Desktop HDD.15 (ST5000DM000-1FK178)
  ada1:   27C [5.00TB] W4J19			 Seagate Desktop HDD.15 (ST5000DM000-1FK178)
  ada2:   27C [5.00TB] W4J1A			 Seagate Desktop HDD.15 (ST5000DM000-1FK178)
  ada3:   28C [5.00TB] W4J1D			 Seagate Desktop HDD.15 (ST5000DM000-1FK178)
  ada4:   35C [40.0GB] NZ05T7725		 FUJITSU MHW2040BS
  ada5:   35C [40.0GB] NZ05T7725		 FUJITSU MHW2040BS

Do you need the source code for that?

Bidule0hm · Jan 20, 2018

I didn't tested my scripts with FreeNAS 11 as I'm still on 9.3 but it should work as I get the info from SMART, not from FreeNAS, it should be the same. Let me know if it doesn't.

diskdiddler · Jan 21, 2018

Chris Moore said:

I don't know what temperature monitoring you are speaking of, I use a shell script ./get_hdd_temp.sh and that still works the same as ever.
Ex:

Code:

   da0:   29C [2.00TB] Z4Z2X			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
   da1:   28C [2.00TB] W4Z22			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
   da2:   29C [4.00TB] Z307P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da3:   29C [4.00TB] Z307P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da4:   29C [4.00TB] Z307P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da5:   29C [4.00TB] Z307P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da6:   29C [2.00TB] Z4Z3B			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
   da7:   29C [4.00TB] Z305P			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da8:   30C [4.00TB] Z305M			 Seagate Desktop HDD.15 (ST4000DM000-1F2168)
   da9:   29C [2.00TB] Z4Z3A			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
  da10:   29C [2.00TB] Z4Z3A			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
  da11:   29C [2.00TB] W4Z29			 Seagate Barracuda 7200.14 (AF) (ST2000DM001-1ER164)
  ada0:   27C [5.00TB] W4J1A			 Seagate Desktop HDD.15 (ST5000DM000-1FK178)
  ada1:   27C [5.00TB] W4J19			 Seagate Desktop HDD.15 (ST5000DM000-1FK178)
  ada2:   27C [5.00TB] W4J1A			 Seagate Desktop HDD.15 (ST5000DM000-1FK178)
  ada3:   28C [5.00TB] W4J1D			 Seagate Desktop HDD.15 (ST5000DM000-1FK178)
  ada4:   35C [40.0GB] NZ05T7725		 FUJITSU MHW2040BS
  ada5:   35C [40.0GB] NZ05T7725		 FUJITSU MHW2040BS

Do you need the source code for that?

Can that thing run every ... 30 minutes and dump it to some kind of CSV file or something so I can plot stuff or anything?
I have 6 disks in my system and I'd love to get a better idea of their heat throughout the day(s) especially in summer like it is now.

Also for goodness sakes, this should be baked in to the OS as default by now :(

Chris Moore · Jan 22, 2018

diskdiddler said:
Can that thing run every ... 30 minutes and dump it to some kind of CSV file or something so I can plot stuff or anything?
I have 6 disks in my system and I'd love to get a better idea of their heat throughout the day(s) especially in summer like it is now.

Also for goodness sakes, this should be baked in to the OS as default by now :(

There have been people asking for that to be in the GUI as a bar graph of some sort

Sent from my SAMSUNG-SGH-I537 using Tapatalk

danb35 · Jan 22, 2018

diskdiddler said:
Also for goodness sakes, this should be baked in to the OS as default by now :(

Yes, many people have complained that this isn't in the GUI. Many others have asked in response why it should be. "Synology does it" isn't really an adequate answer to this question--what value does it add to see a graph of disk temps? The design philosophy seems to be, set the alerts where you want them, and the system will alert you if the disks reach those temps.

OTOH, they've added a graph of CPU temps, so maybe they're moving on this issue.

jasonhalljax · Jan 22, 2018

I find the scripts very helpful and use them all, but I find that on the smart script, my drives are all in a always-warned state as the last test age keeps counting up. I assume this is the last time that a smart test was done. Mine currently says 10 days per this morning's email, but I know two tests have run in the last 10 days (a short and a long test) based on my schedule, and the power-on hours are going up in the emails. Any idea what's going on here? Thanks.

LIGISTX · Jan 22, 2018

danb35 said:
OTOH, they've added a graph of CPU temps, so maybe they're moving on this issue.

Wait, they have? What version was this added? My test box is on 11.1 iirc and I don’t recall seeing that... can’t login and check at the moment because I turned it off and am not home to turn it on and check lol.

As far as the drive temp issue goes, I am on the “give us an option to turn on a graph” side of the argument. I would be interested to see what temps my drives hit during heavy usage rather then just wait for my smart emails...

Sent from my iPhone using Tapatalk

danb35 · Jan 22, 2018

It's in 11.1, old GUI:

Green750one · Jan 22, 2018

danb35 said:
It's in 11.1, old GUI:
View attachment 22488

That's CPU not disks

Sent from my G3221 using Tapatalk

LIGISTX · Jan 22, 2018

Crazy. I have been using 11.1 for a few weeks as I have set up ZFS replication from production box to backup box and I have never noticed that graph... crazy.

Thanks for the wake up call.

Sent from my iPhone using Tapatalk

Important Announcement for the TrueNAS Community.

Scripts to report SMART, ZPool and UPS status, HDD/CPU T°, HDD identification and backup the config

Hall of Famer

Dabbler

Hall of Famer

Dabbler

Hall of Famer

Wizard

Hall of Famer

Cadet

Guru

Wizard

Hall of Famer

Server Electronics Sorcerer

Wizard

Hall of Famer

Hall of Famer

Cadet

Guru

Hall of Famer

Dabbler

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Scripts to report SMART, ZPool and UPS status, HDD/CPU T°, HDD identification and backup the config"

Similar threads