Resource icon

multi_report.sh version for Core and Scale 2.5

SoraKagami

Cadet
Joined
Aug 27, 2022
Messages
3

TooMuchData

Contributor
Joined
Jan 4, 2015
Messages
188
TooMuchData updated multi_report.sh version for Core and Scale with a new update entry:

Major Enhancements (thanks to Joe Schmuck)

With version 1.6c you may use an external configuration file. Use [-h] to read the Help Section.
Run the program with the [-config] to create a default configuration file in the directory this script is run from.

v1.6c (28 August 2022)
- Supports external configuration file (but not required).
- Completely Configurable by running the script -config parameter (this took a lot of work).
- Added HDD/SSDmaxtempovrd variables to combat some bogus SSD values.
- Added TLER (SCT) support.
- Added...

Read the rest of this update entry...
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,970
Raw Read Error Rates: Hex#01 (Decimal #01)
Just to follow up, I had to look back at the code again, I do cover Raw Read Error Rates but it's not customizable.

The rule is: If the error rate is a value above 900 then treat it as invalid. If ignoreSeekError is not true (it is true by default) and the error rate is above 200 (and we are still below 900) then there is a warning message.

Why is it done this way? It was a short cut to be honest. Some drives report actual Raw Read Error Rates and some drives report two pieces of data via bit patterns of this reported value. So the first thing I did was disabled the reporting of an alarm condition with the ignoreSeekError="true" line. Why, because there are other key indicators to alert a person to a drive failure. If you are truly having this kind of error then that means the controller for the head positioning is failing and will manifest in other places as well.

I may update the code to fix this issue but I need to get my thoughts straight on the format of the data and if it's the same for each drive manufacturer or if it could be unique between them. I know there is a command to output the bits and have seen it, used it on an old drive once a long time ago, so I should have what I need. But I actually do want to fix it. If I do, I will add a variable for rawreadrateCrit and Warn and change the ignoreSeekError to false by default.

So while I do not feel it's that important to have this data (and that is just me), others would like it and I will figure out how to provide it correctly.

So this message is to just say I was wrong and see room for improvement.
 

Deeda

Explorer
Joined
Feb 16, 2021
Messages
65
Hi,

I had a couple of questions about the script. I have two TrueNAS servers (one running TrueNAS-12.0-U8.1 and the other TrueNAS-13.0-U2).

On the server running TrueNAS-13.0-U2, the script runs, but the email that it sends has *WARNING* in the subject line. I'm not sure what the warning is referring to. I've attached the output to this post. Also, the email that is sent doesn't include the config file as an attachment. Is that something that needs to be specifically enabled in the script?

On the server running TrueNAS-12.0-U8.1, the script doesn't complete, and shows the error:

Code:
root@truenas[/tmp]# ./multi_report_v1.6c.sh
Multi-Report v1.6c dtd 28 Aug 2022 (TrueNAS Core 12.0-U8.1)
No Config File Exists
Checking for a valid email within the script...
Valid email within the script = xxx@xxx.net, using script parameters...

./multi_report_v1.6c.sh: line 1630: 10#-: syntax error: operand expected (error token is "-")
root@truenas[/tmp]#
 

Attachments

  • email.pdf
    675.8 KB · Views: 148

awasb

Patron
Joined
Jan 11, 2021
Messages
402
re 13-U2: The WARN-level refers to the fact, that some of Your SMART tests are stuck - as it seems - on the drives with the red marks. smartctl -X da[0-3] (and smartctl -t short da[0-3] afterwards) should do the trick for the next report.

re 12-U8.1: Check the script @line1630. Mine reads:

Code:
lastTestHours="$((10#$(echo "$smartdata" | grep "# 1" | awk '{print $9}' )))"


Maybe You inserted something by accident?
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,970
oot@truenas[/tmp]# ./multi_report_v1.6c.sh Multi-Report v1.6c dtd 28 Aug 2022 (TrueNAS Core 12.0-U8.1) No Config File Exists Checking for a valid email within the script... Valid email within the script = xxx@xxx.net, using script parameters... ./multi_report_v1.6c.sh: line 1630: 10#-: syntax error: operand expected (error token is "-") root@truenas[/tmp]#
My advice is to use the external configuration file vice editing the script directly, in fact I may just remove the ability to use the script alone. Why? because when you edit the script, it's very possible you could make a change for the bad, but also the line numbers very well may not be the same as the code posted. Lastly, while you may have a script error indicated in line 1630, it may also be anywhere else just before or just after that area causing the error. But assuming the script was not damaged then I would assume you have a drive value that the script does not like. This is the problem with drive manufacturers not all using the same definitions. You have my personal email address, I'd recommend you forward me a copy of the full output run normally and a copy of the full output using the -dump option.

What I find very odd is the fact that there is no data for most of the charts, no pending sectors, no reallocated sectors, nothing. I really would like the see the -dump data to see if there is data that I need to grab from these drives of if the data really isn't there.

I am already 90% completed with a newer version of the script, adding in Raw Read Rates and adjusting for Seek Error Rates for Seagate Drives, hopefully it will work as planned. And some other updates as well to include better (I hope) and easier to use -config option. Not sure it will be ready for primetime even though I have tested the crap out of it on Core, no testing on Scale yet. But that newer version would not correct a problem if the drive type is not recognized. The -dump is likely the data I will need to rectify it. You have my personal email address, use it.

On the server running TrueNAS-13.0-U2, the script runs, but the email that it sends has *WARNING* in the subject line. I'm not sure what the warning is referring to.
Read the output, it clearly states the test age is above the threshold. Either run another SMART test or change the threshold.

EDIT: Sorry, thought you had my personal email. Since you do not, try the new beta script below and report back. PM me with your email address if you would like to troubleshoot directly.
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,970
Here is the latest version, v1.6d-beta. I have tested it mostly on Core and a little bit on Scale. While it may run well, I still have other improvements for it so that is the only reason it's still in beta. A few instructions:

1) If you have been using an external configuration file, run the script with -config "multi_report_v1.6d_beta_11_Sep_2022.txt -config" and select Update configuration file to include new variables used in the new script. If you have made any changes manually to the multi_report_config.txt file, like modifying something other than a variable, it will be destroyed when the updated configuration file overwrites it.

2) If this is your first time using an external configuration file, run the script with the -config and select New configuration file. You can also automatically setup any drive error offsets (UDMA_CRC, MultiZone, Bad Sectors) which is a nice feature.

3) If you have any problems and you already have my personal email address, run the script using the -dump switch and then forward the entire email to me. This should give me the data I need to troubleshoot any problem. The only personal data included is your email address, and if you are emailing it to me, then I guess it's not a secret to me. But I will not share it with anyone of course.

4) If you are using the statistical data file then two new columns will be added automatically and your old data will remain preserved.

5) If you have a simple question, post it here in the forums and someone should be able to answer it, maybe even me.

6) Use the -h switch to view the Help information.

If you have any suggestions on how to make the -config better, please let me know. I'm so close to it that I may not see obvious issues. If there are other suggestions to make it better, let me know and I will evaluate the suggestion. Unfortunately I cannot make everyone happy but if the suggestion is doable, then I will try to make it happen.

Again, this is still a beta but should be 100% functional.
 

Attachments

  • multi_report_v1.6d_beta_11_Sep_2022.txt
    288.5 KB · Views: 133
Last edited:

awasb

Patron
Joined
Jan 11, 2021
Messages
402
Works like a charm!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,970
Works like a charm!
Thanks. Are you still using Core 13-U2? I think that is what you were using before and what my main system uses right now.
 

awasb

Patron
Joined
Jan 11, 2021
Messages
402
Exactly.
 

Deeda

Explorer
Joined
Feb 16, 2021
Messages
65
What I find very odd is the fact that there is no data for most of the charts, no pending sectors, no reallocated sectors, nothing. I really would like the see the -dump data to see if there is data that I need to grab from these drives of if the data really isn't there.

My TrueNAS servers use SAS drives, is this likely to have an impact on the data collection?

Also, I may have misinterpreted some of the documentation for this script, is it meant to attach a copy of your TrueNAS config file when it emails you?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,970
My TrueNAS servers use SAS drives, is this likely to have an impact on the data collection?
Yes, SAS drives are not very consistent with the data format and from the data you posted, it dis look like SAS drives to me. But that doesn't mean that I cannot add them properly to the script to be recognized correctly, which is one of my goals, without breaking the rest of the script that is.

Also, I may have misinterpreted some of the documentation for this script, is it meant to attach a copy of your TrueNAS config file when it emails you?
Yes, the default setup is to attach the TrueNAS config file and the statistical_data_file.csv to the email on Mondays. That is the default because that is my personal preference. You can change that using the -config option to every day, once a month (first of the month), or a different weekday.

So, if you could do a -dump and then attach four of the drive files (for two drives) here for me to download and test, maybe we can fix the SAS issues you are having. I need the files for da0-a and da0-x, and da7-a and da7-x. I'm looking at both files that have had a SMART test done on them and one that hasn't. Once I have those files, I can start to try to incorporate the drive recognition software and I will post an updated version for you to test and if it works, I just need to know that.
 

Deeda

Explorer
Joined
Feb 16, 2021
Messages
65
Thanks mate. I've made the dumps. I'll PM you.
 

Malpractis

Cadet
Joined
Feb 16, 2016
Messages
7
Thanks for the script, it's great!

I've been running (1.6c) it with the stats file for about 3mo now. Just yesterday I started intermittently getting "multi_report.sh: line 1800: [[: 36,018: value too great for base (error token is "018")". I took a look at line 1800, it looks fine to me:
Code:
# Some drives do not report test age after 65536 hours.
if [[ $onHours -gt "65536" ]] && [[ $lastTestHours -gt "0" && $lastTestHours -lt "65536" ]]; then lastTestHours=$(($lastTestHours + 65536)); fi


Any ideas what might be going wrong?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,970
Thanks for the script, it's great!

I've been running (1.6c) it with the stats file for about 3mo now. Just yesterday I started intermittently getting "multi_report.sh: line 1800: [[: 36,018: value too great for base (error token is "018")". I took a look at line 1800, it looks fine to me:
Code:
# Some drives do not report test age after 65536 hours.
if [[ $onHours -gt "65536" ]] && [[ $lastTestHours -gt "0" && $lastTestHours -lt "65536" ]]; then lastTestHours=$(($lastTestHours + 65536)); fi


Any ideas what might be going wrong?
I will PM you. Odds are it's not line 1800. I will send you the new version of the script which I just finished tonight but I have not tested enough on Scale, yet.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,970
Here is Multi-Report Version 1.6d. There have been some bug fixes and enhancements.

Below is the changelog for this version. As usual, if you note a problem or feel there is a reasonable enhancement I could make, feel free to post a comment. I will try to fix errors rapidly and will listen to enhancement suggestions.

If you do have a problem with the script, please send me a PM or you can place it here in the discussion but if the issue is a script failure then I will need you to provide me the the output of the generated email, the error message you received (if it wasn't part of the email), and a copy of the attachments (at a minimum the file for the suspect drive) from the script when run using the -dump parameter.

The one thing I do not desire is to make this script too busy. The original intent was to have a place to quickly assess the health of your hard drives (you should have seen the original version). If there was an indication of a problem then it was up to the end user to address the situation.

### Changelog:
# v1.6d (01 October 2022)
# - Thanks goes out to ChrisRJ for offering some great suggestions to enhance and optimize the script.
# - Updated gptid text and help text areas (clarifying information)
# - Updated the -dump parameter to -dump [all] and included non-SMART attachments.
# - Added Automatic UDMA_CRC, MultiZone, and Reallocated Sector Compensation to -config advanced option K.
# - Fixed Warranty Date always showing as expired.
# - Added Helium and Raw Read Error Rates to statistical data file.
# - Added Raw Read Error Rates chart column.
# - Added compensation for Seagate Seek Error Rates and Raw Read Error Rates.
# - Added Automatic Configuration File Update feature.
# - Added selection between ZFS Pool Size or Zpool Pool Size. ZFS is representative of the actual storage capacity
# -- and updated the Pool Status Report Summary chart.
# - Added ATA Error Log Silencing (by special request).
# - Added 0.1 second delay after writing "$logfile" to eliminate intermittent file creation errors.
# - Fixed Text Report -> Drive Model Number not showing up for some drives.
# -- Future Work
# ---- Change all the -config dialog to be consistent.
# ---- Optimizing Code


EDIT: One note: If you have an older configuration file, you may see four lines of error messages as the script reads the older configuration file. This is normal and will only happen the one time as the configuration file will be updated automatically and the script will never display those errors again. Thanks for @Davvo for pointing that out to me. At times I'm just too close to see that might alarm some folks.
 

Attachments

  • multi_report_v1.6d_Final.txt
    254 KB · Views: 89
Last edited:

Deeda

Explorer
Joined
Feb 16, 2021
Messages
65
Thanks again for the updates! If we already have an external config file, do we just replace the older script and that's it?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,970
Thanks again for the updates! If we already have an external config file, do we just replace the older script and that's it?
Yes, just replace the older script and when the new script is run the first time, it will update the older configuration file if required.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,970
One last thing: I have the script also posted on Github per @ChrisRJ recommendation. If you want the most current version (my beta versions) as they come out, you can grab them from here: https://github.com/JoeSchmuck/Multi-Report.git and use or test out the current versions. But understand that it's a beta if "beta" is in the version number when you run the script. Generally I do not push updates to github unless they at least work some. Right now I'm waiting on comments for the new version 1.6d to fix any issues that may come up. I will not place further beta version on the TrueNAS forum, these will be final versions only after at least some testing has been accomplished and I feel the script is fairly solid. I will be happy to forward some folks beta versions if they request it. But for right now, look at the github location for the most current versions. And don't expect me to make daily updates, it's more like weekly or every 2 weeks, and only if there is something to do.

Please provide feedback, good, bad, whatever. That is the only way to improve the script. And feel free to rate this resource, again good, bad, whatever. This is good for the community.
 
Last edited:
Top