Resource icon

multi_report.sh version for Core and Scale 3.0

garyc

Cadet
Joined
Dec 1, 2015
Messages
9
The All Good report is not optional for a normal run.
You can use the '-m' switch however I think it only sends an email for over temp situations, to "monitor" the temps for if you had a questionable heat issue.

@daschmidt If you would prefer to not receive "All Good" emails, I would recommend you filter your email to delete emails with "SMART Testing Results for truenas - All is Good" in the subject line.
I like the all is good report in the subject line. I know the script ran, the results passed and, I dont have to read the message. If I want to it is still an option. Sometimes scripts fail to run because of some other reason. The only way I know that it didnt run is if I do not get the report. With 4 pools spread across 40+ drives, I kind of want to know.
 

CheeryFlame

Contributor
Joined
Nov 21, 2022
Messages
184
Hey Joe! I just wanted to update the script and now it seems like it's stuck when I launch it;

Multi-Report v2.4.3 dtd:2023-06-16 (TrueNAS Scale 22.12.2)
Found Old Configuration File
Automatically updating configuration file...
Continuing to run script
Checking for Updates
Current Version 2.4.3 -- GitHub Version 2.4.3
No Update Required
Message from the Creator

Let me know if you've got any ideas! Thank you!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Hey Joe! I just wanted to update the script and now it seems like it's stuck when I launch it;



Let me know if you've got any ideas! Thank you!
I need more details. At face value it looks like it's trying to update the configuration file which is odd since the configuration file hasn't changed since version 2.4.1, also the v2.4.3 change from v 2.4.2 was a minor change which did not change the update process. More details please. Is this running in an SSH window, from a CRON job, privledged user, if you examine the multi_report_config.txt file, was the date recently changed?
 

CheeryFlame

Contributor
Joined
Nov 21, 2022
Messages
184
I need more details. At face value it looks like it's trying to update the configuration file which is odd since the configuration file hasn't changed since version 2.4.1, also the v2.4.3 change from v 2.4.2 was a minor change which did not change the update process. More details please. Is this running in an SSH window, from a CRON job, privledged user, if you examine the multi_report_config.txt file, was the date recently changed?

I was very tired yesterday. My fault! Everything is working as intended, thank you!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
I was very tired yesterday. My fault! Everything is working as intended, thank you!
Whew, glad that was the case. Had me concerned.

In case anyone is curious, a minor version change is on github, you can upgrade manually using the
Code:
-update
switch. If you have automatic updates setup then it should update automatically to version 2.4.3. Dang it, I found another minor issue, the
Code:
-h
does not display the update command. It's there, I promise you. I will make the change today and it will become part of the next release, slated for August or sooner if a problem needs to be fixed.
 

TooMuchData

Contributor
Joined
Jan 4, 2015
Messages
188
TooMuchData updated multi_report.sh version for Core and Scale with a new update entry:

The Latest from Joe

### Changelog:
# V2.4.3 (16 June 2023)
# - Minor Update to recognize more SCSI drive Offline Uncorrectable Errors and Total Data Written.
# - Minor Update to recognize UDMA CRC Errors for some older Intel SSD's.
#
# V2.4.2 (19 May 2023)
# - Bug Fix to properly recognize Samsung HD103UJ HDD.
# - Bug fix to properly recognize/display more than 26 drives in Scale.
#
# V2.4.1 (29 April 2023)
# - Bugfix to allow script to be run outside the script directory.
# - Updated chmod 755 to...

Read the rest of this update entry...
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
@joeschmuck I introduced new drives in my system and was wondering what did I need to do to add their warranty date in the script config file... my memory is a bit hazy regarding this procedure, and I haven't found much in the manual about it.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
You are correct, the user manual does not give specific instructions on how to add warranty dates, but if you use the -config switch, go to 'Advanced Configuration Settings', then select 'K) Drive Errors and Custom Builds (Ignore Drives, UDMA CRC, MultiZone,
Reallocated Sectors, ATA Errors, Warranty Expiration)', and step through the questions, hopefully it will be obvious. If you get stuck on something, please let me know and I will fix it.

The good thing about using the warranty date is that you will know when you might want to replace the drives if you are a person who replaces drives when a warranty expires, but for myself it tells me how much time has past since the warranty and it makes me feel better. For example three of my drives are 2 years, 9 months, 10 days after the warranty, this will have a yellow background. That makes me feel good. I did have one drive start failing a few months ago so I replaced it, now I have 2y 8m 16d before I hit the warranty date, this will have a normal colored background. Your Power On Time is also affected in the same manner and keys off the warranty date.

Let me know how it works.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
You are correct, the user manual does not give specific instructions on how to add warranty dates, but if you use the -config switch, go to 'Advanced Configuration Settings', then select 'K) Drive Errors and Custom Builds (Ignore Drives, UDMA CRC, MultiZone,
Reallocated Sectors, ATA Errors, Warranty Expiration)', and step through the questions, hopefully it will be obvious. If you get stuck on something, please let me know and I will fix it.

The good thing about using the warranty date is that you will know when you might want to replace the drives if you are a person who replaces drives when a warranty expires, but for myself it tells me how much time has past since the warranty and it makes me feel better. For example three of my drives are 2 years, 9 months, 10 days after the warranty, this will have a yellow background. That makes me feel good. I did have one drive start failing a few months ago so I replaced it, now I have 2y 8m 16d before I hit the warranty date, this will have a normal colored background. Your Power On Time is also affected in the same manner and keys off the warranty date.

Let me know how it works.
I had read that but wasn't sure about it being the right option. Anyway, thank you for your guidance.

I succesfully entered the two new drives dates... and encountered a minor issue: even after I press "enter" when the script asks me if I want to edit, delete or no change a drive, my two original drives' warranty dates were deleted.
I had to re-run the config option and re-enter all the drives' warranty dates, even the already registered ones.

Example after the first run (ada1 and ada2 are the original drives, do note the absence of warranty value):
Screenshot_2.png


Edit: just noticed it removed my SDD wrranty as well. Well, just gotta rewrite everything.
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Edit: just noticed it removed my SDD wrranty as well. Well, just gotta rewrite everything.
Sorry about that. When you choose Edit, it should have cycled through each drive and ask you if you wanted to enter the data, it should not have deleted any data. I will have to test out the current version to see if I let a bug get in.

Just tested it out and I should change this setup to ask something to address the drives that are already listed. When I developed this my mind set was that ALL the data would be entered at the same time, I am clearly wrong. I will make that change for the next version. I'm sure I made that assumption in other areas of the script as well, so I will need to step through every iteration to try to catch as many as I can.

Thank you for letting me know of the error and sorry for the inconvenience.

Additional Information: If you open the multi_report_config.txt file you can fine a variable on line 379 called Drive_Warranty_List="none" or it will list your drives, and there are directions on how to edit it. If I had told you this, it would have saved you some grief, but I had problems with people editing the config file and screwing up the format so I tried to make it menu driven. If you have an older copy of the config file, you can just copy that line over and edit the new drive data into it, but I suspect you have already updated the warranty dates.
 

barny

Dabbler
Joined
Feb 4, 2015
Messages
15
Hey Joe,
I just started using your script. Thanks for all the work involved.
It did an update from version multi_report_v2.3_2023_04_14 to multi_report_v2.4.3_2023_06_16
It threw an error during the update but it does work. It might only be a local problem.

./multi_report.sh: line 3195: ( / 8760): syntax error: operand expected (error token is "/ 8760)")

Best,

BTW, I tried to email the dump but the email but hotmail kicked it back. I can attach the dump if needed.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
./multi_report.sh: line 3195: ( / 8760): syntax error: operand expected (error token is "/ 8760)")
This is usually a divide by zero or divide by invalid character problem.

If this problem happens again, I would like to find out what is causing it. If you would like to, run the script using the -dump and you should get an email filled with attachments. If you could forward that email and attachments to joeschmuck2023@hotmail.com then I would have the same data as the -dump email. If you can get the attachments, then I'm not sure why the '-dump email' command does not work. Just to make sure you are entering the command correctly, it is: multi_report.sh -dump email, assuming you named the script 'multi_report.sh'.

It did an update from version multi_report_v2.3_2023_04_14 to multi_report_v2.4.3_2023_06_16
That is the correct current version. I plan to push another update in August to fix a few little things. If you are still having this issue, I'd like to see if I can include that fix as well. But hopefully it was a one time issue. If you still have this, I suspect some drive value is not in the correct location on the SMART output, it happens all too frequently.
 

barny

Dabbler
Joined
Feb 4, 2015
Messages
15
This is usually a divide by zero or divide by invalid character problem.

If this problem happens again, I would like to find out what is causing it. If you would like to, run the script using the -dump and you should get an email filled with attachments. If you could forward that email and attachments to joeschmuck2023@hotmail.com then I would have the same data as the -dump email. If you can get the attachments, then I'm not sure why the '-dump email' command does not work. Just to make sure you are entering the command correctly, it is: multi_report.sh -dump email, assuming you named the script 'multi_report.sh'.


That is the correct current version. I plan to push another update in August to fix a few little things. If you are still having this issue, I'd like to see if I can include that fix as well. But hopefully it was a one time issue. If you still have this, I suspect some drive value is not in the correct location on the SMART output, it happens all too frequently.
Thanks for the reply. I appreciate it.
I though the error was wierd as no one else seemed to report it in the forums.
I ran the -dump email command again. I forwarded it to your hotmail. The last time I tried it hotmail themselves kicked it back.
I ran the multi_report.sh by itself a couple more times and it didn't kick the error so all good. Probably a fluke like you said.
BTW- you will see the seagtes drives I have run hot for some reason. 46-49C is common. Been that way since purchase.
Love the script and thanks. It really helps keep an eye on the NAS.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
I received the two dumps. I will initially examine the data tomorrow and get back to you, then do some data analysis on Saturday.

The error message you received could come back. If it does and you happen to be there when it happens, please send me another dump immediately. I need the data that each drive is sending at the specific time, the script is failing on something that might be there every once in a blue moon. For example (this actually happened): The last SMART test run hour was in a different location depending on the SMART test run. So if you ran a Short test then all could be good, but if you ran a Long test the results would cause the script to burp. Hopefully I will notice what it is during the analysis.

As for the drive temp issues, if your drives are going to run that warm, I would recommend you change the drive temp threshold. You also are running SMART tests less often than daily so I would recommend you change the Warning Test Age to a value of the days between SMART tests +1.

You want to clear all the alarms so you have an email that tells you "All is Good", and this way when you get a warning or error, you will pay attention to it. After I look at your data I will email you an updated configuration file to address those issues, or you can do all that yourself through the -config option.
 

barny

Dabbler
Joined
Feb 4, 2015
Messages
15
I received the two dumps. I will initially examine the data tomorrow and get back to you, then do some data analysis on Saturday.

The error message you received could come back. If it does and you happen to be there when it happens, please send me another dump immediately. I need the data that each drive is sending at the specific time, the script is failing on something that might be there every once in a blue moon. For example (this actually happened): The last SMART test run hour was in a different location depending on the SMART test run. So if you ran a Short test then all could be good, but if you ran a Long test the results would cause the script to burp. Hopefully I will notice what it is during the analysis.

As for the drive temp issues, if your drives are going to run that warm, I would recommend you change the drive temp threshold. You also are running SMART tests less often than daily so I would recommend you change the Warning Test Age to a value of the days between SMART tests +1.

You want to clear all the alarms so you have an email that tells you "All is Good", and this way when you get a warning or error, you will pay attention to it. After I look at your data I will email you an updated configuration file to address those issues, or you can do all that yourself through the -config option.
Thanks Joe. I had similar thoughts regarding the warnings but wanted to wait to fix any errors if in the script. Let me try to update the config file according to your recommendations and if I run into trouble I can reach out. I certainly don't want to add to your workload.
10-4 on sending another dump if there is another error. Nothing so far.
Thanks again for the great script. I had been using the remote.sh and save-config-enc.sh scripts for a while. I like how this incorporates both.
 

barny

Dabbler
Joined
Feb 4, 2015
Messages
15
BTW- Just dawned on me.....
The SMART SHORT Tests are actually daily but I had the system down for a couple days. Probably why it looked like the SMART tasks were more that 48 hours. My bad.
On a different note:
Are you aware of the large 'Raw_Read_Error_Rate' that Seagate drives record in the smart data?
Not sure if it is important but FWIW:
I'm going back to WD_Red drives after these as they run cool and error rates are easy to interpret.
Best
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
The SMART SHORT Tests are actually daily but I had the system down for a couple days. Probably why it looked like the SMART tasks were more that 48 hours. My bad.
Nope, maybe if you had it powered off to skip the SMART Test and powered on again. The Test Age is calculated from the drive Power On Hours divided by 24 = One 24 hour period (day). If you did just have the system powered off then you may actually have a scheduling issue with the SMART tests.

I do not know how you run your system, if you power it off each night or normally let it run. The script default values are based around a 24/7 running system. These values can be adjusted for any setup, well I'm sure someone can find something I didn't think of.

So watch the alarms, they should tell you truthful data. And I'm just now getting to examine your data so I will send you the results and recommendations based off your data directly to your email address.

I'm going back to WD_Red drives after these as they run cool and error rates are easy to interpret.
My script adjusts for the Seagate drives that "seem" to report high Raw Read Rate errors. There is some math involved to read this data correctly, that is why the script chart states "0" value vice what SMART looks to indicate. Yea, I do not like what Seagate did either but we have to live with it.
 

barny

Dabbler
Joined
Feb 4, 2015
Messages
15
Nope, maybe if you had it powered off to skip the SMART Test and powered on again. The Test Age is calculated from the drive Power On Hours divided by 24 = One 24 hour period (day). If you did just have the system powered off then you may actually have a scheduling issue with the SMART tests.

I do not know how you run your system, if you power it off each night or normally let it run. The script default values are based around a 24/7 running system. These values can be adjusted for any setup, well I'm sure someone can find something I didn't think of.

So watch the alarms, they should tell you truthful data. And I'm just now getting to examine your data so I will send you the results and recommendations based off your data directly to your email address.


My script adjusts for the Seagate drives that "seem" to report high Raw Read Rate errors. There is some math involved to read this data correctly, that is why the script chart states "0" value vice what SMART looks to indicate. Yea, I do not like what Seagate did either but we have to live with it.
 

barny

Dabbler
Joined
Feb 4, 2015
Messages
15
Thanks Joe. Good to know.
I only power it off if something needs changing- so rarely. Runs 24/7.
I had 2 old WD drives that were sitting in the case and wanted to remove them without disturbing anything running.
Although I have been using truenas since freenas 9 days I am still a novice.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Although I have been using truenas since freenas 9 days I am still a novice.
We are all still learning. I've been around when FreeNAS .7 came out. FreeNAS 8.0 was the iXsystem version. We could edit and compile the source code, making improvements for the community. I stopped compiling when Coral (10) came out and died quickly. The format was not something I wanted to learn. But I still learn because I definitely don't know much either. Just today I learned that an idiot is born every day. Thankfully that idiot was not me but someone who works for me.
 
Top