multi_report.sh version for Core and Scale 3.0

garyc · May 26, 2023

joeschmuck said:
The All Good report is not optional for a normal run.
You can use the '-m' switch however I think it only sends an email for over temp situations, to "monitor" the temps for if you had a questionable heat issue.

@daschmidt If you would prefer to not receive "All Good" emails, I would recommend you filter your email to delete emails with "SMART Testing Results for truenas - All is Good" in the subject line.

I like the all is good report in the subject line. I know the script ran, the results passed and, I dont have to read the message. If I want to it is still an option. Sometimes scripts fail to run because of some other reason. The only way I know that it didnt run is if I do not get the report. With 4 pools spread across 40+ drives, I kind of want to know.

CheeryFlame · Jun 18, 2023

Hey Joe! I just wanted to update the script and now it seems like it's stuck when I launch it;

Multi-Report v2.4.3 dtd:2023-06-16 (TrueNAS Scale 22.12.2)
Found Old Configuration File
Automatically updating configuration file...
Continuing to run script
Checking for Updates
Current Version 2.4.3 -- GitHub Version 2.4.3
No Update Required
Message from the Creator

Let me know if you've got any ideas! Thank you!

joeschmuck · Jun 19, 2023

gravelfreeman said:
Hey Joe! I just wanted to update the script and now it seems like it's stuck when I launch it;

Let me know if you've got any ideas! Thank you!

I need more details. At face value it looks like it's trying to update the configuration file which is odd since the configuration file hasn't changed since version 2.4.1, also the v2.4.3 change from v 2.4.2 was a minor change which did not change the update process. More details please. Is this running in an SSH window, from a CRON job, privledged user, if you examine the multi_report_config.txt file, was the date recently changed?

CheeryFlame · Jun 19, 2023

joeschmuck said:
I need more details. At face value it looks like it's trying to update the configuration file which is odd since the configuration file hasn't changed since version 2.4.1, also the v2.4.3 change from v 2.4.2 was a minor change which did not change the update process. More details please. Is this running in an SSH window, from a CRON job, privledged user, if you examine the multi_report_config.txt file, was the date recently changed?

I was very tired yesterday. My fault! Everything is working as intended, thank you!

joeschmuck · Jun 19, 2023

gravelfreeman said:
I was very tired yesterday. My fault! Everything is working as intended, thank you!

Whew, glad that was the case. Had me concerned.

In case anyone is curious, a minor version change is on github, you can upgrade manually using the

Code:

-update

switch. If you have automatic updates setup then it should update automatically to version 2.4.3. Dang it, I found another minor issue, the

Code:

-h

does not display the update command. It's there, I promise you. I will make the change today and it will become part of the next release, slated for August or sooner if a problem needs to be fixed.

TooMuchData · Jul 8, 2023

TooMuchData updated multi_report.sh version for Core and Scale with a new update entry:

The Latest from Joe

### Changelog:
# V2.4.3 (16 June 2023)
# - Minor Update to recognize more SCSI drive Offline Uncorrectable Errors and Total Data Written.
# - Minor Update to recognize UDMA CRC Errors for some older Intel SSD's.
#
# V2.4.2 (19 May 2023)
# - Bug Fix to properly recognize Samsung HD103UJ HDD.
# - Bug fix to properly recognize/display more than 26 drives in Scale.
#
# V2.4.1 (29 April 2023)
# - Bugfix to allow script to be run outside the script directory.
# - Updated chmod 755 to...

Read the rest of this update entry...

Davvo · Jul 25, 2023

@joeschmuck I introduced new drives in my system and was wondering what did I need to do to add their warranty date in the script config file... my memory is a bit hazy regarding this procedure, and I haven't found much in the manual about it.

joeschmuck · Jul 25, 2023

You are correct, the user manual does not give specific instructions on how to add warranty dates, but if you use the -config switch, go to 'Advanced Configuration Settings', then select 'K) Drive Errors and Custom Builds (Ignore Drives, UDMA CRC, MultiZone,
Reallocated Sectors, ATA Errors, Warranty Expiration)', and step through the questions, hopefully it will be obvious. If you get stuck on something, please let me know and I will fix it.

The good thing about using the warranty date is that you will know when you might want to replace the drives if you are a person who replaces drives when a warranty expires, but for myself it tells me how much time has past since the warranty and it makes me feel better. For example three of my drives are 2 years, 9 months, 10 days after the warranty, this will have a yellow background. That makes me feel good. I did have one drive start failing a few months ago so I replaced it, now I have 2y 8m 16d before I hit the warranty date, this will have a normal colored background. Your Power On Time is also affected in the same manner and keys off the warranty date.

Let me know how it works.

Davvo · Jul 25, 2023

joeschmuck said:
You are correct, the user manual does not give specific instructions on how to add warranty dates, but if you use the -config switch, go to 'Advanced Configuration Settings', then select 'K) Drive Errors and Custom Builds (Ignore Drives, UDMA CRC, MultiZone,
Reallocated Sectors, ATA Errors, Warranty Expiration)', and step through the questions, hopefully it will be obvious. If you get stuck on something, please let me know and I will fix it.

The good thing about using the warranty date is that you will know when you might want to replace the drives if you are a person who replaces drives when a warranty expires, but for myself it tells me how much time has past since the warranty and it makes me feel better. For example three of my drives are 2 years, 9 months, 10 days after the warranty, this will have a yellow background. That makes me feel good. I did have one drive start failing a few months ago so I replaced it, now I have 2y 8m 16d before I hit the warranty date, this will have a normal colored background. Your Power On Time is also affected in the same manner and keys off the warranty date.

Let me know how it works.

I had read that but wasn't sure about it being the right option. Anyway, thank you for your guidance.

I succesfully entered the two new drives dates... and encountered a minor issue: even after I press "enter" when the script asks me if I want to edit, delete or no change a drive, my two original drives' warranty dates were deleted.
I had to re-run the config option and re-enter all the drives' warranty dates, even the already registered ones.

Example after the first run (ada1 and ada2 are the original drives, do note the absence of warranty value):

Edit: just noticed it removed my SDD wrranty as well. Well, just gotta rewrite everything.

joeschmuck · Jul 25, 2023

Davvo said:
Edit: just noticed it removed my SDD wrranty as well. Well, just gotta rewrite everything.

Sorry about that. When you choose Edit, it should have cycled through each drive and ask you if you wanted to enter the data, it should not have deleted any data. I will have to test out the current version to see if I let a bug get in.

Just tested it out and I should change this setup to ask something to address the drives that are already listed. When I developed this my mind set was that ALL the data would be entered at the same time, I am clearly wrong. I will make that change for the next version. I'm sure I made that assumption in other areas of the script as well, so I will need to step through every iteration to try to catch as many as I can.

Thank you for letting me know of the error and sorry for the inconvenience.

Additional Information: If you open the multi_report_config.txt file you can fine a variable on line 379 called Drive_Warranty_List="none" or it will list your drives, and there are directions on how to edit it. If I had told you this, it would have saved you some grief, but I had problems with people editing the config file and screwing up the format so I tried to make it menu driven. If you have an older copy of the config file, you can just copy that line over and edit the new drive data into it, but I suspect you have already updated the warranty dates.

barny · Jul 26, 2023

Hey Joe,
I just started using your script. Thanks for all the work involved.
It did an update from version multi_report_v2.3_2023_04_14 to multi_report_v2.4.3_2023_06_16
It threw an error during the update but it does work. It might only be a local problem.

./multi_report.sh: line 3195: ( / 8760): syntax error: operand expected (error token is "/ 8760)")

Best,

BTW, I tried to email the dump but the email but hotmail kicked it back. I can attach the dump if needed.

joeschmuck · Jul 26, 2023

barny said:
./multi_report.sh: line 3195: ( / 8760): syntax error: operand expected (error token is "/ 8760)")

This is usually a divide by zero or divide by invalid character problem.

If this problem happens again, I would like to find out what is causing it. If you would like to, run the script using the -dump and you should get an email filled with attachments. If you could forward that email and attachments to joeschmuck2023@hotmail.com then I would have the same data as the -dump email. If you can get the attachments, then I'm not sure why the '-dump email' command does not work. Just to make sure you are entering the command correctly, it is: multi_report.sh -dump email, assuming you named the script 'multi_report.sh'.

barny said:
It did an update from version multi_report_v2.3_2023_04_14 to multi_report_v2.4.3_2023_06_16

That is the correct current version. I plan to push another update in August to fix a few little things. If you are still having this issue, I'd like to see if I can include that fix as well. But hopefully it was a one time issue. If you still have this, I suspect some drive value is not in the correct location on the SMART output, it happens all too frequently.

barny · Jul 26, 2023

joeschmuck said:
This is usually a divide by zero or divide by invalid character problem.

If this problem happens again, I would like to find out what is causing it. If you would like to, run the script using the -dump and you should get an email filled with attachments. If you could forward that email and attachments to joeschmuck2023@hotmail.com then I would have the same data as the -dump email. If you can get the attachments, then I'm not sure why the '-dump email' command does not work. Just to make sure you are entering the command correctly, it is: multi_report.sh -dump email, assuming you named the script 'multi_report.sh'.

That is the correct current version. I plan to push another update in August to fix a few little things. If you are still having this issue, I'd like to see if I can include that fix as well. But hopefully it was a one time issue. If you still have this, I suspect some drive value is not in the correct location on the SMART output, it happens all too frequently.

Thanks for the reply. I appreciate it.
I though the error was wierd as no one else seemed to report it in the forums.
I ran the -dump email command again. I forwarded it to your hotmail. The last time I tried it hotmail themselves kicked it back.
I ran the multi_report.sh by itself a couple more times and it didn't kick the error so all good. Probably a fluke like you said.
BTW- you will see the seagtes drives I have run hot for some reason. 46-49C is common. Been that way since purchase.
Love the script and thanks. It really helps keep an eye on the NAS.

joeschmuck · Jul 26, 2023

I received the two dumps. I will initially examine the data tomorrow and get back to you, then do some data analysis on Saturday.

The error message you received could come back. If it does and you happen to be there when it happens, please send me another dump immediately. I need the data that each drive is sending at the specific time, the script is failing on something that might be there every once in a blue moon. For example (this actually happened): The last SMART test run hour was in a different location depending on the SMART test run. So if you ran a Short test then all could be good, but if you ran a Long test the results would cause the script to burp. Hopefully I will notice what it is during the analysis.

As for the drive temp issues, if your drives are going to run that warm, I would recommend you change the drive temp threshold. You also are running SMART tests less often than daily so I would recommend you change the Warning Test Age to a value of the days between SMART tests +1.

You want to clear all the alarms so you have an email that tells you "All is Good", and this way when you get a warning or error, you will pay attention to it. After I look at your data I will email you an updated configuration file to address those issues, or you can do all that yourself through the -config option.

barny · Jul 27, 2023

joeschmuck said:
I received the two dumps. I will initially examine the data tomorrow and get back to you, then do some data analysis on Saturday.

The error message you received could come back. If it does and you happen to be there when it happens, please send me another dump immediately. I need the data that each drive is sending at the specific time, the script is failing on something that might be there every once in a blue moon. For example (this actually happened): The last SMART test run hour was in a different location depending on the SMART test run. So if you ran a Short test then all could be good, but if you ran a Long test the results would cause the script to burp. Hopefully I will notice what it is during the analysis.

As for the drive temp issues, if your drives are going to run that warm, I would recommend you change the drive temp threshold. You also are running SMART tests less often than daily so I would recommend you change the Warning Test Age to a value of the days between SMART tests +1.

You want to clear all the alarms so you have an email that tells you "All is Good", and this way when you get a warning or error, you will pay attention to it. After I look at your data I will email you an updated configuration file to address those issues, or you can do all that yourself through the -config option.

Thanks Joe. I had similar thoughts regarding the warnings but wanted to wait to fix any errors if in the script. Let me try to update the config file according to your recommendations and if I run into trouble I can reach out. I certainly don't want to add to your workload.
10-4 on sending another dump if there is another error. Nothing so far.
Thanks again for the great script. I had been using the remote.sh and save-config-enc.sh scripts for a while. I like how this incorporates both.

barny · Jul 27, 2023

BTW- Just dawned on me.....
The SMART SHORT Tests are actually daily but I had the system down for a couple days. Probably why it looked like the SMART tasks were more that 48 hours. My bad.
On a different note:
Are you aware of the large 'Raw_Read_Error_Rate' that Seagate drives record in the smart data?
Not sure if it is important but FWIW:

Brand new Seagate HDD has high raw read error rate

I've just purchased a brand new Seagate ST31000524AS 1TB HDD. Manufacture date shows as January 2012 (yes that's as new as new can get), so must be one of the new batches from the post-flood Thaila...

superuser.com

NodeSupport / Guides / Internode Members Webspace End of Life | Internode

www.users.on.net

I'm going back to WD_Red drives after these as they run cool and error rates are easy to interpret.
Best

joeschmuck · Jul 27, 2023

barny said:
The SMART SHORT Tests are actually daily but I had the system down for a couple days. Probably why it looked like the SMART tasks were more that 48 hours. My bad.

Nope, maybe if you had it powered off to skip the SMART Test and powered on again. The Test Age is calculated from the drive Power On Hours divided by 24 = One 24 hour period (day). If you did just have the system powered off then you may actually have a scheduling issue with the SMART tests.

I do not know how you run your system, if you power it off each night or normally let it run. The script default values are based around a 24/7 running system. These values can be adjusted for any setup, well I'm sure someone can find something I didn't think of.

So watch the alarms, they should tell you truthful data. And I'm just now getting to examine your data so I will send you the results and recommendations based off your data directly to your email address.

barny said:
I'm going back to WD_Red drives after these as they run cool and error rates are easy to interpret.

My script adjusts for the Seagate drives that "seem" to report high Raw Read Rate errors. There is some math involved to read this data correctly, that is why the script chart states "0" value vice what SMART looks to indicate. Yea, I do not like what Seagate did either but we have to live with it.

barny · Jul 27, 2023

joeschmuck said:
Nope, maybe if you had it powered off to skip the SMART Test and powered on again. The Test Age is calculated from the drive Power On Hours divided by 24 = One 24 hour period (day). If you did just have the system powered off then you may actually have a scheduling issue with the SMART tests.

I do not know how you run your system, if you power it off each night or normally let it run. The script default values are based around a 24/7 running system. These values can be adjusted for any setup, well I'm sure someone can find something I didn't think of.

So watch the alarms, they should tell you truthful data. And I'm just now getting to examine your data so I will send you the results and recommendations based off your data directly to your email address.

My script adjusts for the Seagate drives that "seem" to report high Raw Read Rate errors. There is some math involved to read this data correctly, that is why the script chart states "0" value vice what SMART looks to indicate. Yea, I do not like what Seagate did either but we have to live with it.

barny · Jul 27, 2023

Thanks Joe. Good to know.
I only power it off if something needs changing- so rarely. Runs 24/7.
I had 2 old WD drives that were sitting in the case and wanted to remove them without disturbing anything running.
Although I have been using truenas since freenas 9 days I am still a novice.

joeschmuck · Jul 27, 2023

barny said:
Although I have been using truenas since freenas 9 days I am still a novice.

We are all still learning. I've been around when FreeNAS .7 came out. FreeNAS 8.0 was the iXsystem version. We could edit and compile the source code, making improvements for the community. I stopped compiling when Coral (10) came out and died quickly. The format was not something I wanted to learn. But I still learn because I definitely don't know much either. Just today I learned that an idiot is born every day. Thankfully that idiot was not me but someone who works for me.

Important Announcement for the TrueNAS Community.

multi_report.sh version for Core and Scale 3.0

Cadet

Contributor

Old Man

Contributor

Old Man

Contributor

MVP

Old Man

MVP

Old Man

Dabbler

Old Man

Dabbler

Old Man

Dabbler

Dabbler

Old Man

Dabbler

Dabbler

Old Man

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "multi_report.sh version for Core and Scale"

Similar threads