multi_report.sh version for Core and Scale 3.0

joeschmuck · Sep 30, 2022

Davvo said:
1.6d is beta or stable?

Stable. Well I hope. Beta will be on github, I think that is what I wrote.

TooMuchData · Oct 5, 2022

TooMuchData updated multi_report.sh version for Core and Scale with a new update entry:

Joe is the Eveready Bunny of scripting!

# v1.6d (05 October 2022)
# - Thanks goes out to ChrisRJ for offering some great suggestions to enhance and optimize the script.
# - Updated gptid text and help text areas (clarifying information)
# - Updated the -dump parameter to -dump [all] and included non-SMART attachments.
# - Added Automatic UDMA_CRC, MultiZone, and Reallocated Sector Compensation to -config advanced option K.
# - Fixed Warranty Date always showing as expired.
# - Added Helium and Raw Read Error Rates to...

Read the rest of this update entry...

TooMuchData · Oct 8, 2022

TooMuchData updated multi_report.sh version for Core and Scale with a new update entry:

Corrected Version 1.6d

# v1.6d (05 October 2022)
# - Thanks goes out to ChrisRJ for offering some great suggestions to enhance and optimize the script.
# - Updated gptid text and help text areas (clarifying information)
# - Updated the -dump parameter to -dump [all] and included non-SMART attachments.
# - Added Automatic UDMA_CRC, MultiZone, and Reallocated Sector Compensation to -config advanced option K.
# - Fixed Warranty Date always showing as expired.
# - Added Helium and Raw Read Error Rates to...

Read the rest of this update entry...

TooMuchData · Oct 8, 2022

TooMuchData updated multi_report.sh version for Core and Scale with a new update entry:

Corrected and Improved

# v1.6d-1 (08 October 2022)
# - Bug Fix for converting multiple numbers from Octal to Decimal. The previous process worked "most" of the time
# -- but we always aim for 100% working.
#
# The multi_report_config file is compatable with version back to v1.6d.
#
# v1.6d (05 October 2022)
# - Thanks goes out to ChrisRJ for offering some great suggestions to enhance and optimize the script.
# - Updated gptid text and help text areas (clarifying information)
# - Updated the -dump...

Read the rest of this update entry...

TooMuchData · Oct 9, 2022

TooMuchData updated multi_report.sh version for Core and Scale with a new update entry:

More fixes from The Schmuck

# v1.6d-2 (09 October 2022)
# - Bug fix for NVMe power on hours.
# --- Unfortunately as the script gets more complex it's very easy to induce a problem. And since I do not have
# --- a lot of different hardware, I need the users to contact me and tell me there is an issue so I can fix it.
# --- It's unfortunate that I've have two bug fixes already but them's the breaks.
# - Updated to support more drives Min/Max temps and display the non-existant value if nothing is obtained vice...

Read the rest of this update entry...

Deeda · Oct 17, 2022

Thanks Joe, just updated to the latest version and it's working well.

isopropyl · Oct 19, 2022

What is the proper way to run this with TrueNAS?
I see the field to input e-mail, and I have e-mail notifications setup. I inputted the e-mail in that field. So my question is simply how do I set the script to run, and where do I place it?

Davvo · Oct 19, 2022

isopropyl said:
What is the proper way to run this with TrueNAS?
I see the field to input e-mail, and I have e-mail notifications setup. I inputted the e-mail in that field. So my question is simply how do I set the script to run, and where do I place it?

I created a folder for the script and have a cronjob running it every week.

Code:

######### INSTRUCTIONS ON USE OF THIS SCRIPT
#
# This script will perform three main functions:
# 1: Generate a report and send an email on your drive(s) status.
# 2: Create a copy of your Config File and attach to the same email.
# 3: Create a statistical database and attach to the same email.
#
# In order to configure the script properly read over the User-definable Parameters before making any changes.
# Make changes as indicated by the section instructions.
#
# To run the program from the command line, use ./program_name.sh [-h] for additional help instructions,
# and [-config] to run the configuration routine (highly recommended).
#
# If you create an external configuration file, you never have to edit the script,
# so how many times do I need to say it is highly recommended?  And I may force the
# change to require the external configuration file.
#
# You may need to make the script executable using "chmod +x program_name.sh"
#

Deeda · Oct 26, 2022

Hi Joe,

Running the latest version, have just noticed the email reports for one of my servers report the pool size incorrectly. Please see screenshot attached.

joeschmuck · Oct 26, 2022

Deeda said:
Hi Joe,

Running the latest version, have just noticed the email reports for one of my servers report the pool size incorrectly. Please see screenshot attached.

Can you provide me some details and I could fix it up. I need the file created this way so I can pass it through the script on my end to find out what the issue is, it can't be a cut/paste operation as that at times will not be processed exactly the same. Sorry that I'm requesting a lot of data from you but I haven't heard anyone else having this issue so I'm perplexed, especially if the other pools are reporting correctly.

I need to know the multi_report_config.txt file value under General Settings -> pool_capacity="zfs" or ="zpool". Default is "zfs". you can change this value to "zpool" to see what the results are but I prefer to use the "zfs" value as it lines up with TrueNAS values. "Zpool" was the older way this script displayed the data.

The commands below will place the files in the location you run the commands from. you could place them in the /tmp/ location (ex. /tmp/pool_status.txt) if you desire and they will be deleted upon reboot, but you need access to them to copy the files off the system. PM me if you need further assistance.

zpool status Pool2 > pool_status.txt
zpool list -H -p -o capacity Pool2 > pool_used.txt
zpool list -H -o size Pool2 > pool_size.txt
zpool list -H -o free Pool2 > pool_free.txt
zfs list Pool2 > zfs_list.txt

Then attach the file in the forums and I'll grab it. I might not be able to do anything until Friday, busy week at work so I'm getting home late.

If I need more data then I will PM you. Actually I will PM you with an updated script when I fix the issue to make sure it works. if it does, it will be in the next version release which I will likely make happen in the next month.

-Joe

awasb · Oct 26, 2022

Please use >> instead of >.

>> will append data.
> will overwrite.

joeschmuck · Oct 26, 2022

awasb said:
Please use >> instead of >.

>> will append data.
> will overwrite.

I do not want want appended data. That does not help me. They need to be clean files for me to process them. Do it how I listed please.

awasb · Oct 26, 2022

Ah. Sorry. Misread that. It's a one time action. Again: Sorry.

Deeda · Oct 26, 2022

Hi Joe,

I've attached the files requested.

In my config file pool_capacity="zfs"

joeschmuck · Oct 27, 2022

Deeda said:
I've attached the files requested.

Thanks. The data you provided looks correct, now I need to figure out what blasted math is wrong. I will be able to use your exact data to feed into the script to troubleshoot it. Math in BASH sucks!

awasb said:
Ah. Sorry. Misread that. It's a one time action. Again: Sorry.

No problem. The reason I ask for the data in this way is because when 'awk' looks through it, any special/hidden characters can throw me for a loop, so I need the data that would be presented in it's exact format. Cut and Paste often interprets some characters and will rain hell all over me as I'm scratching my head to figure out why I can't replicate the problem. I've included in the script the -dump parameter so the script will automatically dump the data I typically need (drive data), but it does not include zpool info, YET. It should on the next version, but I just hope I don't need to collect all that data many more times.

joeschmuck · Oct 28, 2022

@Deeda You have a message and a file to see if it fixes the issue.

syruprise · Oct 28, 2022

First off thanks 100 million cuz this multi_report script has made keeping track of both my Core & Scale systems much nicer. These are all super minor things but figured i would share my systems quirks and stuff.

dax/daxx drives jumbled:

Hdd Summary Report and in the SMART summary report for Core system. Can fix in statistical_data by just adding a zero in front so it reads da0x/da0xx. Don't know if that would work here with this email formatting. Probably more elegant way to do it but idk.

Helium on Toshiba MG0# drive:
These Toshiba drives use SMART attribute id 23 & 24 as helium. Read from zero instead of hundred but still would be nice in the Summary Report for quick glance in case of statistical change. On Core System.

Reserve NAND block Micron/Crucial ssd's:
On Cruical MX / Micron ssd's SMART attribute id 180 is Unused Reserve NAND blk. My understanding is that this is counting down the overprovisioned NAND blocks left on the drive. Be nice to have it on the SSD summary report on SCALE system. Would be double cooler to have a system like you did with UDMA CRC errors that could set number and warn if the number has dropped. Might be too edge case to put in all that work i admit. SSD's in my experience just randomly die anyways.

Commented email section:
Hell of a time getting script to work without failing when i first tried the script out. Took a hot minute but eventually figured that email providers (gmail/outlook/etc) were NOT liking the (from="TrueNAS@local.com") part. Once I switched the from section to (from = "myemail@address.com") it has worked fine. Might just put in comment section above something like: "The from address does not need to be changed but if failure just enter your email address in section as well". No picture of failure but could maybe recreate if needed.

Many thanks!

joeschmuck · Oct 28, 2022

syruprise said:
These are all super minor things but figured i would share my systems quirks and stuff.

While they might be small minor things, they are things none the less. To address some of these:

syruprise said:
dax/daxx drives jumbled:

syruprise said:
Can fix in statistical_data by just adding a zero in front so it reads da0x/da0xx.

Adding a leading zero would make the device name technically incorrect and would definitely cause confusion for anyone troubleshooting a drive. For example, if I have the script report drive /dev/da01 has a bad sector, then I manually run the command smartctl -a /dev/da01 it will return an error that the device does not exist. But I appreciate you trying to offer a solution, most people do not make that effort.

That is odd, everyone else who has used in in both Core and Scale are sorted, well the results that people have sent me. I know early versions were not sorted. I will look at the sort routine to make sure I'm sorting properly and that I didn't break it at some point in time.

syruprise said:
Helium on Toshiba MG0# drive:

syruprise said:
Reserve NAND block Micron/Crucial ssd's:

When some data is listed as "unknown attribute" I can't guess what it pertains to. I do not have a table of drive make/models to do this work, that is what I rely on 'smartctl' to decode. I would need the -dump command run and the select drive files sent my way to add them. I need to test the code to make sure it works and I do not mess up something, which is very easy to do as this script has gotten more complex each month.

syruprise said:
Commented email section:

That is an odd problem and the first time I've heard this and I suspect your email server does not like it. What email server/service do you use? I'm using msn.com (now called outlook.com) now but have used hotmail.com and gmail.com in the past, but I have no idea if they would work today. But I can add a comment to address it.

I will send you a Conversation request (PM). I'd like to collect your data in order to update the script.

Cuprum · Nov 5, 2022

Hi everyone!

First, thank you so much to the team making this script possible, it helps a lot for doing the follow up of my server!

As you can see in my signature, my system uses two mirrored Kingston A400 SSD as boot drives. They show the wear level with attribute ID 231 and with the Attribute Name SSD_Life_Left and they both are currently at 99 (as per the Raw Value). The issue is that the report shows "Wear Level" at 1 when emailed. See below:

As per Kingston's SMART Attribute Details (link, PDF file): the attribute indicates the approximate SSD life left where 100 = best and 1 = worst.

Is there any workaround to set the right value of wear level in the report? For reference, I'm attaching the smartctl --all for the drives but if a dump or additional info is needed, please let me know.

Thanks!

joeschmuck · Nov 5, 2022

Thanks for reporting the error. It's difficult to try to get every version of every drive out there and make the SMART data work for you.

In the meantime you could manually edit the multi_report_config.txt file and look for the value 'wearLevelCrit=9' and change it to a value of '1'.

You can also change it using '-config' option then a -> a -> c and then change it form 9 to 1.

I might have a fix today but before I make a change I need to make sure I do not break something else. When I do have a fix I will PM you and attach the updated script for you to test. I would appreciate a quick feedback on it if possible since I'm about to release a new version any day now and if this does fix a problem, I'd like to include it. Additionally I'd like to collect some data from you in a PM for my testing when updating the script. It's good to have '-dump' drive data since I do not have all different types of drives at my fingertips.

-Joe

Important Announcement for the TrueNAS Community.

multi_report.sh version for Core and Scale 3.0

Old Man

Contributor

Contributor

Contributor

Contributor

Explorer

Contributor

MVP

Explorer

Attachments

Old Man

Patron

Old Man

Patron

Explorer

Attachments

Old Man

Old Man

Cadet

Attachments

Old Man

Cadet

Attachments

Old Man

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "multi_report.sh version for Core and Scale"

Similar threads