Resource icon

multi_report.sh version for Core and Scale 3.0

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
As per Kingston's SMART Attribute Details (link, PDF file): the attribute indicates the approximate SSD life left where 100 = best and 1 = worst.
That is true, but Kingston SF-2000 references the Normalized Value. This is the column of data called "VALUE", not "RAW". I do have other Kingston drive data (see below) for my testing to make sure the script works properly.

TEST CODE:
Code:
Model Family:     SandForce Driven SSDs
Device Model:     KINGSTON SV300S37A120G
Serial Number:    REDACTED
LU WWN Device Id: 5 0026b7 23b097fd7
Firmware Version: 60AABBF0
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
TRIM Command:     Available, deterministic
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Aug  6 16:18:40 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x7d) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Abort Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      (  48) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x0025)    SCT Status supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   095   095   050    Old_age   Always       -       0/76189640
  5 Retired_Block_Count     0x0033   100   100   003    Pre-fail  Always       -       0
  9 Power_On_Hours_and_Msec 0x0032   070   070   000    Old_age   Always       -       26689h+49m+55.240s
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1243
171 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age   Offline      -       656
177 Wear_Range_Delta        0x0000   000   000   000    Old_age   Offline      -       4
181 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0012   100   100   000    Old_age   Always       -       0
189 Airflow_Temperature_Cel 0x0000   033   067   000    Old_age   Offline      -       33 (Min/Max -21/67)
194 Temperature_Celsius     0x0022   033   067   000    Old_age   Always       -       33 (Min/Max -21/67)
195 ECC_Uncorr_Error_Count  0x001c   120   120   000    Old_age   Offline      -       0/76189640
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       0
201 Unc_Soft_Read_Err_Rate  0x001c   120   120   000    Old_age   Offline      -       0/76189640
204 Soft_ECC_Correct_Rate   0x001c   120   120   000    Old_age   Offline      -       0/76189640
230 Life_Curve_Status       0x0013   100   100   000    Pre-fail  Always       -       100
231 SSD_Life_Left           0x0013   099   099   010    Pre-fail  Always       -       1
233 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       10673
234 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       10579
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always       -       10579
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age   Always       -       12937

SMART Error Log not supported


You can see the test data has a SSD that has less hours and the Normalized Value is "99".

YOUR CODE:
Code:
Model Family:     Phison Driven SSDs
Device Model:     KINGSTON SA400S37120G
Serial Number:    [[REDACTED FOR PUBLIC SHARING]]
LU WWN Device Id: 5 0026b7 7824bf724
Firmware Version: SBFKB1E1
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
TRIM Command:     Available
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Nov  5 13:19:09 2022 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:         (65535) seconds.
Offline data collection
capabilities:              (0x11) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    No Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  30) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   000   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       31581
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       124
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
167 Write_Protect_Mode      0x0000   100   100   000    Old_age   Offline      -       0
168 SATA_Phy_Error_Count    0x0012   100   100   000    Old_age   Always       -       0
169 Bad_Block_Rate          0x0000   100   100   000    Old_age   Offline      -       14
170 Bad_Blk_Ct_Erl/Lat      0x0000   100   100   010    Old_age   Offline      -       0/10
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 MaxAvgErase_Ct          0x0000   100   100   000    Old_age   Offline      -       5 (Average 2)
181 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0000   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
192 Unsafe_Shutdown_Count   0x0012   100   100   000    Old_age   Always       -       12
194 Temperature_Celsius     0x0022   068   062   000    Old_age   Always       -       32 (Min/Max 21/38)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
199 SATA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
218 CRC_Error_Count         0x0032   100   100   000    Old_age   Always       -       0
231 SSD_Life_Left           0x0000   001   001   000    Old_age   Offline      -       99
233 Flash_Writes_GiB        0x0032   100   100   000    Old_age   Always       -       166
241 Lifetime_Writes_GiB     0x0032   100   100   000    Old_age   Always       -       89
242 Lifetime_Reads_GiB      0x0032   100   100   000    Old_age   Always       -       435
244 Average_Erase_Count     0x0000   100   100   000    Old_age   Offline      -       2
245 Max_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       5
246 Total_Erase_Count       0x0000   100   100   000    Old_age   Offline      -       10872
247 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       10872

SMART Error Log Version: 1
No Errors Logged


Your Normalized and RAW values are just the opposite of the test code data, that made me perk up.
Now looking at the column called WORST you will see a value of "001" for ID 231.
Going back to the Kingston data "1 = Worst = Insufficient Flash blocks remain in service for proper SSD operation"

BUT you do not have a SandForce drive which is the link you provided, you have a Phison drive so that makes a big difference. Unfortunately the reference for the Phison data from Kingston is not clear, I do believe the data is backwards for your drive. I will need to make a differentiation between Phison and any other driver.

So the first part it done, I had to prove to myself that your drive was not actually close to failure because the reference you provided said it was almost dead. I like to verify my facts.

I will get to working on the fix but since I need to make a fairly significant change (not a lot of code I hope but it will impact many other users) I will have to have you test it and if that works, I will need to pass it to a few people to verify it works on their systems and I didn't introduce an error.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
PM coming your way, fixed the script and just need you to test it. It works on my end.
 

Cuprum

Cadet
Joined
Aug 15, 2018
Messages
6
Hi Joe,

Thanks for reporting the error. It's difficult to try to get every version of every drive out there and make the SMART data work for you.
Indeed it's a difficult task. Thanks for what you're doing with this script :wink:

You can also change it using '-config' option then a -> a -> c and then change it form 9 to 1
Ok! I just done this and updated my config file to stop triggering a warning.

I'd like to collect some data from you in a PM for my testing when updating the script.
Sure, I will pass you the dump in a PM. Sorry if I have not been fast enough.

BUT you do not have a SandForce drive which is the link you provided, you have a Phison drive so that makes a big difference.
What a dumb mistake from my side. Sorry! I did not notice it.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Sure, I will pass you the dump in a PM. Sorry if I have not been fast enough.
You have been plenty fast. It's nice to deal with someone responding quickly like you are. It's good to get fast feedback when I'm in the mood to work the script. I just like helping out I guess.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
@joeschmuck last time when my crontask was run (looked fine, I got the report and everything) it also sent me a mail with the following text:
./scripts/multi_report_v1.6d-2: line 1471: [[: reset): syntax error in expression (error token is ")") ./scripts/multi_report_v1.6d-2: line 1471: [[: reset): syntax error in expression (error token is ")")
I opened the script file with nano and jumped to the 1471 line, it has the following code in it:
if [[ "$altlastTestHours" -gt "0" ]]; then convert_to_decimal $altlastTestHours; altlastTestHours=$Return_Value; fi

Tell me if you need anything.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Tell me if you need anything.
I need these little problems to stop showing up.

If you can repeat the problem, immediately run the script with the -dump parameter and then send me a PM with all the files attached, if it's not too many files. Or you can email me the entire thing. I will provide my email address if you don't have it already. But the problem must be happening when you create the dump. This is the second time I've heard of this problem now. The first one disappeared. I'd like to solve it.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
If you can repeat the problem, immediately run the script with the -dump parameter and then send me a PM with all the files attached, if it's not too many files.
I can't seem to reproduce it for now, manually run it both from terminal and the UI but did not get the standard error report.
Black forces at work.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I opened the script file with nano and jumped to the 1471 line, it has the following code in it:
if [[ "$altlastTestHours" -gt "0" ]]; then convert_to_decimal $altlastTestHours; altlastTestHours=$Return_Value; fi
To explain what this line of code does: When I look for the last short or long test that was run, depending on how SMART reports it (because there is not real standard), I look to see if the value hold a value that is greater then zero. If it is greater than zero then I pass it through to a routine which will convert the number into a decimal number and then return it for use later in the code. There is no magic going on with this line so I don't understand the syntax error. Also there are no ")" to generate the error. So the error probably isn't actually at line 1471. It's a fishing expedition.

Since this was the second time I've heard of this line giving problems, I hope it can be repeated and I can get some data back in order to find out what is causing the issue.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I can't seem to reproduce it for now, manually run it both from terminal and the UI but did not get the standard error report.
Black forces at work.
Send you a PM, maybe your data could still help but I don't know.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Okay, so that was a simple fix and will be in the next release. The issue was the drives were in the middle of a SMART test that was reset by the host. This ended up having the message " reset)" captured where the hours value should normally be. The ")" is the culprit that caused the issue. I now filter out the characters "()%/" and that took care of it. The script still ran even with the error message but it's a nuisance. It's fixed.
The new script should be posted tomorrow.
 

TooMuchData

Contributor
Joined
Jan 4, 2015
Messages
188
TooMuchData updated multi_report.sh version for Core and Scale with a new update entry:

Fixes and Additions from the great Joe Schmuck

v1.6e (11 November 2022)
- Fixed gptid not showing in the text section for the cache drive (Scale only affected).
- Fixed Zpool "Pool Size" - Wasn't calculating correctly under certain circumstances.
- Added Toshiba MG07+ drive Helium value support.
- Added Alphabetizing Zpool Names and Device ID's.
- Added No HDD Chart Generation if no HDD's are identified (nice for SSD/NVMe Only Systems).
- Added Warranty Column to chart (by request and must have a value in the Drive_Warranty variable).
-...

Read the rest of this update entry...
 

diedrichg

Wizard
Joined
Dec 4, 2012
Messages
1,319
v1.6e - TrueNAS Scale
I'm getting:
/mnt/Daedalus/superadmin/scripts/report.sh: 165: Bad substitution
/mnt/Daedalus/superadmin/scripts/report.sh: 538: [[: not found
/mnt/Daedalus/superadmin/scripts/report.sh: 633: Syntax error: Bad for loop variable
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
v1.6e - TrueNAS Scale
I'm getting:
Did you have this error on the previous versions? And does an email report get generated?
Additionally, I'm running Scale 24.02.4 without error. And is the error reproducible? If it is then I will PM you with my email address and you can send me the data you have. If I can't reproduce it, I'm not sure I can fix it. This is the first time I've seen these errors so it's just odd.
 
Last edited:

Deeda

Explorer
Joined
Feb 16, 2021
Messages
65
Just updated to the latest version, and it appears to be working fine :smile:
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Just updated to the latest version, and it appears to be working fine :smile:
Thank you for the feedback. It's appreciated.
 

diedrichg

Wizard
Joined
Dec 4, 2012
Messages
1,319
Findings:
I ran the report with the -config option. All went well.
I then manually ran the ./multi_report.sh and I get the email as expected with no errors. The errors I reported earlier were user error trying to run the script from a cron job with "sh /path/to/script/multi_report.sh"

With that said, how do I run scripts in a cron job in Scale?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
With that said, how do I run scripts in a cron job in Scale?
In the Scale GUI -> System Settings -> Advanced -> Cron Jobs -> Add -> Command -> "/path/to/script/multi_report.sh" (without the quotes of course and no preceding sh command). Run As User = root, Hide Standard Output and Enabled both checked, Hide Standard Error unchecked.

See my attached screen shot.

Please let me know if this works or not.

EDIT: I am using Scale 22.02.4, not Bluefin RC. I will test out Bluefin once it's a final release version. You never know what will change.
 

Attachments

  • Screenshot 2022-11-19 112842.jpg
    Screenshot 2022-11-19 112842.jpg
    80.7 KB · Views: 56

diedrichg

Wizard
Joined
Dec 4, 2012
Messages
1,319

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Joined
Nov 29, 2022
Messages
4
Hi, thanks for the script. Is there anyway to ignore "Last Age Test" for my SSD. I can't find a setting for this in the config. I'm sure its sensible information to know but I have no intention of acting on it for my boot drive.
 
Top