Resource icon

multi_report.sh version for Core and Scale 2.5

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,972
Thanks. This is a new problem that looks to be dealing with the scan in progress, well I hope, but I think it is still a line length issue. I will be comparing it against my scrub message as well. I hope that I can make concessions for all of the different formatting. I can't get to it tonight as I have other stuff to do but hopefully tomorrow night.

I will PM you my personal email address so we can exchange the script without cluttering up the discussion thread. When it's fixed, I guess we will have version 14d.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,972
That was an easy fix, I originally was going to post that you add a second set of brackets around the IF statement but then I changed my mind because it's easier if I just troubleshoot the script. Yes, add another set of square brackets. So change line 727 to read
Code:
if [[ $scrubDays == "days" ]]; then
and that should fix it. Attached is the most current version now, 1.4d dated 31 March 2022.

EDIT: I have now tested this version on TrueNAS Scale (Debian) and since it's going to be a new version, I added another feature that I hope not too many people will have to take advantage of, BADSECTORS offset. This new section is meant for testing with drives that are likely old and possibly failing, but good enough to test with. I'm sure someone will offset a drive they plan to use in an operational NAS but that is up to them. The new feature supports up to four drives. Sure it could be updated to support more but I'm trying to be realistic. The new version is date 1 April 2022. Yes I was tempted to create a special April Fools version that would provide random errors on 1 April but I have things to do today. Maybe later.
 

Attachments

  • multi_report_v1.4d_1April2022.txt
    79.6 KB · Views: 231
Last edited:

TooMuchData

Contributor
Joined
Jan 4, 2015
Messages
188
Hey joeschmuck.

I'm thinking some processor and memory information. Then I think staple my tongue.

Ever seen my HomeAssistant Servers page?
 

Attachments

  • HASS Servers 1of2.jpg
    HASS Servers 1of2.jpg
    300.4 KB · Views: 216
  • HASS Servers 2of2.jpg
    HASS Servers 2of2.jpg
    178.2 KB · Views: 182

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,972
Hey joeschmuck.

I'm thinking some processor and memory information. Then I think staple my tongue.

Ever seen my HomeAssistant Servers page?
Looks like a good project for you, looking at those screen shots you have the skills. The only work I plan to do right now is if someone reports the script is not working for them then I will try to fix it. I am in progress of rewriting the entire script because I dislike having a lot of processing within the AWK section, I want to pull that out. This will make it easier to segment the sections and I also think the few issues with specific hard drive serial number recognition may go away as is well. But this is a project that will take me maybe months to complete as summer is here and that means being outside more, not stuck inside the house, except when we have thunderstorms. And of course I have my day job where I don't get time to work on the script.

Now to humor you... What data points for the Processor and Memory information do you plan to take? In other words, what snapshot in time are you sampling? Taking a snapshot only while the script is running does not seem useful, well maybe the memory could be. I have not researched it to see if you could get the data from the command line, which is what you would need to do. Maybe collect the Min/Max/Ave of each over the previous 24 hours? Is that possible? That is the only useful information I would desire, if I were collecting data that is. If you desired yo could create a script to collect that data and place it in a CSV file periodically and then email it at a defined periodicity. I think you can get the memory and maybe the cpu data, but will it be over the past running 24 hours? I don't know. While I want to look into it, I will hold off since I'm at work right now.

Have fun with your new project.
 

TooMuchData

Contributor
Joined
Jan 4, 2015
Messages
188
Definitely staple my tongue.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,972
Sorry, I have no idea how to obtain a 24 hour period of time to get the data you would desire, with min/max/avg data points. I think that would require use of an external program to track that and record the historical data. If we went to those extremes then I'd also want to see the offending programs that cause usage above a specified threshold. This would be great as a diagnostic tool. So yup, it's your project if you want to take it on, not something I would desire to do. I have 'top' and 'htop', and several other commands that can help me troubleshoot when/if I need to.

Let me know what you come up with :wink:
 

TooMuchData

Contributor
Joined
Jan 4, 2015
Messages
188
I sent you email offline about the home-assistant displays (didn't want to cludge up this conversation). In addition to sensors, etc. home-assistant allows for statistics to be tracked on any already defined entity (sensor) at various intervals.

For the purposes of multi_report, I suggest only a snapshot at the moment the script is run of cpu busy and temp, etc (that kind of simple stuff available via ssh commands). Maybe fan rpm (why?). ;-}

Anyone who wants serious tracking and realtime analysis should install the Netdata plugin. Does it have an API?
 

TooMuchData

Contributor
Joined
Jan 4, 2015
Messages
188
Another possibility for inclusion is to test the status of SCT Error Recovery Control for each HDD/SDD

line 1226: #echo "<br>" echo " " # SCT Error Recovery Control Report scterc="$(smartctl -l scterc /dev/"$drive" | tail -3 | head -2)" echo "SCT Error Recovery Control: "$scterc
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
I noticed that Samsung SSDs are reporting 190 Airflow_Temperature_Cel instead of 194 Temperature_Celsius.

Any chance that can be accounted for... I couldn't easily see where that's being grabbed to try to shoehorn it in.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,972
@sretalla
Here is the latest version of the script that I'm running, note that it now has separate temperature alarm points for SSD. As for the part where the temperature is pulled out, look at line 1020 of this script. If this script does not fix the issue you are having, feel free to adjust the section of code I've pointed you to. If you run into difficulties, email me your emailed output so I can see what adjustments I can make without breaking it completely. I will PM you my email address in a minute. Please note that while this has run on Scale, I personally do not use Scale so I have less runtime on this version but I'm happy to take comments back.

I noticed that Samsung SSDs are reporting 190 Airflow_Temperature_Cel instead of 194 Temperature_Celsius.
That is interesting, I took a look at the code and rearranged the order obtaining a temperature value so the last one is now "Temperature_Celsius", give it a try, I hope it works.

With so many non-standards, it's difficult to make it all work perfectly for everyone, well with this script but we can try.
 

Attachments

  • multi_report_v1.4d_22June2022.txt
    83.2 KB · Views: 133

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
That's great. I see temps for all my SSDs now.

Thanks.
 

TooMuchData

Contributor
Joined
Jan 4, 2015
Messages
188
TooMuchData updated multi_report.sh versions for Core and Scale with a new update entry:

Substantial Rewrite by Joe Schmuck (my hero)

### Changelog v1.6: (31 July 2022)
# - Complete rewrite of the script. More organized and easier for future updates.
# - Almost completely got rid of using AWK, earlier versions had way too much programming within the AWK structure.
# - Reads the drives much less often (3 times each I believe).
# - Added test input file to pharse txt files of smartctl -a output. This will allow for a single drive entry and ability
# -- for myself or any scri[pt writer to identify additional...

Read the rest of this update entry...
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,972
You are way too funny my friend.

If anyone using this script finds drives that do not report correctly, please follow the instruction in the script and provide me the requested information and I will update the script if that data is available (there is no real standard the manufacturers live by which is why this is so tricky) and I can make it work without breaking too many other things. That is really the trick, not breaking something else. Also, if the script just has some flaw or maybe you would like to see an improvement. I'm willing to listen but please understand, I can't say yes to everything just to make a custom build for one person but if many could benefit, I will entertain that. It takes time to write and even more time to test, lots more time.
 
Joined
Jan 27, 2020
Messages
577
Nice work as always! So I'll be the first to report an issue:

Script runs fine, mail get's send, everything seems ok but.. running it from the cli with the requested syntax and inputfile leaves the file empty and gives an error such as follows:

Code:
root@truenas[~]# /mnt/tank/scripts/Multi_Report_Script/run_multi_report.sh SSD /mnt/tank/scripts/Multi_Report_Script/ssd_smart_data.txt
Multi-Report v1.6 (31 July 2022) for TrueNAS Core (TrueNAS-13.0-U1.1)
HDD List : da0 da1 da2 da3 da4 da5
SSD Test File
SSD TEST FILE ROUTINE
NVME List:  nvme0
Statistical Datafile Exists
Collecting Drive Information
Writing Statistical Data
Modified smartdata=
/mnt/tank/scripts/Multi_Report_Script/run_multi_report.sh: line 1728: ( / 8760): syntax error: operand expected (error token is "/ 8760)")
Running Purging Routine
Creating Detailed Report
Sending Email


all of the 4 ssd aren't displayed in the summary although I have SSD detection on "true".

smartctl data following:

Code:
root@truenas[~]# smartctl -a /dev/da6
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Marvell based SanDisk SSDs
Device Model:     SanDisk SSD PLUS 480GB
Serial Number:    2014HA482513
LU WWN Device Id: 5 001b44 4a76b6670
Firmware Version: UG5100RL
User Capacity:    480,113,590,272 bytes [480 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Aug  6 11:39:08 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (  32) The self-test routine was interrupted
                                        by the host with a hard or soft reset.
Total time to complete Offline
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x15) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Abort Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  85) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       19968
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       24
165 Total_Write/Erase_Count 0x0032   100   100   000    Old_age   Always       -       11465
166 Min_W/E_Cycle           0x0032   100   100   ---    Old_age   Always       -       47
167 Min_Bad_Block/Die       0x0032   100   100   ---    Old_age   Always       -       0
168 Maximum_Erase_Cycle     0x0032   100   100   ---    Old_age   Always       -       98
169 Total_Bad_Block         0x0032   100   100   ---    Old_age   Always       -       523
170 Unknown_Marvell_Attr    0x0032   100   100   ---    Old_age   Always       -       0
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Avg_Write/Erase_Count   0x0032   100   100   000    Old_age   Always       -       47
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       4
184 End-to-End_Error        0x0032   100   100   ---    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   ---    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   067   065   000    Old_age   Always       -       33 (Min/Max 9/65)
199 SATA_CRC_Error          0x0032   100   100   ---    Old_age   Always       -       0
230 Perc_Write/Erase_Count  0x0032   100   100   000    Old_age   Always       -       8267 2344 8267
232 Perc_Avail_Resrvd_Space 0x0033   100   100   005    Pre-fail  Always       -       100
233 Total_NAND_Writes_GiB   0x0032   100   100   ---    Old_age   Always       -       23987
234 Perc_Write/Erase_Ct_BC  0x0032   100   100   000    Old_age   Always       -       128979
241 Total_Writes_GiB        0x0030   100   100   000    Old_age   Offline      -       35638
242 Total_Reads_GiB         0x0030   100   100   000    Old_age   Offline      -       5481
244 Thermal_Throttle        0x0032   000   100   ---    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     19944         -
# 2  Short offline       Completed without error       00%     19920         -
# 3  Short offline       Completed without error       00%     19896         -
# 4  Short offline       Completed without error       00%     19872         -
# 5  Short offline       Completed without error       00%     19848         -
# 6  Extended offline    Completed without error       00%     19844         -
# 7  Extended offline    Self-test routine in progress 10%     19844         -
# 8  Extended offline    Self-test routine in progress 30%     19843         -
# 9  Extended offline    Self-test routine in progress 50%     19842         -
#10  Extended offline    Self-test routine in progress 60%     19842         -
#11  Extended offline    Self-test routine in progress 60%     19842         -
#12  Extended offline    Self-test routine in progress 90%     19841         -
#13  Extended offline    Self-test routine in progress 90%     19841         -
#14  Short offline       Completed without error       00%     19824         -
#15  Short offline       Self-test routine in progress 30%     19824         -
#16  Short offline       Completed without error       00%     19800         -
#17  Short offline       Self-test routine in progress 60%     19800         -
#18  Short offline       Completed without error       00%     19776         -
#19  Short offline       Completed without error       00%     19752         -
#20  Short offline       Completed without error       00%     19728         -
#21  Short offline       Completed without error       00%     19704         -

Selective Self-tests/Logging not supported

Code:
root@truenas[~]# smartctl -a /dev/da7
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Silicon Motion based SSDs
Device Model:     CT480BX200SSD1
Serial Number:    1603F0161945
LU WWN Device Id: 5 00a075 1f0161945
Firmware Version: MU02.6
User Capacity:    480,103,981,056 bytes [480 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Aug  6 11:41:11 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  171) seconds.
Offline data collection
capabilities:                    (0x71) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   2) minutes.
Conveyance self-test routine
recommended polling time:        (   1) minutes.
SCT capabilities:              (0x0035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0000   100   100   000    Old_age   Offline      -       0
  5 Reallocated_Sector_Ct   0x0000   100   100   000    Old_age   Offline      -       1
  9 Power_On_Hours          0x0000   100   100   000    Old_age   Offline      -       9626
 12 Power_Cycle_Count       0x0000   100   100   000    Old_age   Offline      -       1114
160 Uncorrectable_Error_Cnt 0x0000   100   100   000    Old_age   Offline      -       0
161 Valid_Spare_Block_Cnt   0x0000   100   100   000    Old_age   Offline      -       45
163 Initial_Bad_Block_Count 0x0000   100   100   000    Old_age   Offline      -       583
148 Total_SLC_Erase_Ct      0x0000   100   100   000    Old_age   Offline      -       460886
149 Max_SLC_Erase_Ct        0x0000   100   100   000    Old_age   Offline      -       6329
150 Min_SLC_Erase_Ct        0x0000   100   100   000    Old_age   Offline      -       6235
151 Average_SLC_Erase_Ct    0x0000   100   100   000    Old_age   Offline      -       6313
164 Total_Erase_Count       0x0000   100   100   000    Old_age   Offline      -       334715
165 Max_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       307
166 Min_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       208
167 Average_Erase_Count     0x0000   100   100   000    Old_age   Offline      -       261
169 Remaining_Lifetime_Perc 0x0000   100   100   001    Old_age   Offline      -       75
181 Program_Fail_Cnt_Total  0x0000   100   100   000    Old_age   Offline      -       2
182 Erase_Fail_Count_Total  0x0000   100   100   000    Old_age   Offline      -       2
192 Power-Off_Retract_Count 0x0000   100   100   000    Old_age   Offline      -       31
194 Temperature_Celsius     0x0000   100   100   070    Old_age   Offline      -       26 (33 33 33 33 0)
199 UDMA_CRC_Error_Count    0x0000   100   100   000    Old_age   Offline      -       0
232 Available_Reservd_Space 0x0000   100   100   000    Old_age   Offline      -       100
241 Host_Writes_32MiB       0x0000   100   100   000    Old_age   Offline      -       1597341
242 Host_Reads_32MiB        0x0000   100   100   000    Old_age   Offline      -       1782614
245 TLC_Writes_32MiB        0x0000   100   100   000    Old_age   Offline      -       3953820
246 SLC_Writes_32MiB        0x0000   100   100   000    Old_age   Offline      -       1843544
247 Raid_Recoverty_Ct       0x0000   100   100   000    Old_age   Offline      -       0

SMART Error Log Version: 1
Invalid Error Log index = 0x0b (T13/1321D rev 1c Section 8.41.6.8.2.2 gives valid range from 1 to 5)

Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       154         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
    7        0    65535  Read_scanning was completed without error
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Code:
root@truenas[~]# smartctl -a /dev/ada0
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SandForce Driven SSDs
Device Model:     KINGSTON SV300S37A120G
Serial Number:    50026B723B097FD7
LU WWN Device Id: 5 0026b7 23b097fd7
Firmware Version: 60AABBF0
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
TRIM Command:     Available, deterministic
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Aug  6 11:41:44 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7d) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Abort Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  48) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x0025) SCT Status supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   095   095   050    Old_age   Always       -       0/75946030
  5 Retired_Block_Count     0x0033   100   100   003    Pre-fail  Always       -       0
  9 Power_On_Hours_and_Msec 0x0032   070   070   000    Old_age   Always       -       26685h+13m+02.380s
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1243
171 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age   Offline      -       656
177 Wear_Range_Delta        0x0000   000   000   000    Old_age   Offline      -       4
181 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0012   100   100   000    Old_age   Always       -       0
189 Airflow_Temperature_Cel 0x0000   033   067   000    Old_age   Offline      -       33 (Min/Max -21/67)
194 Temperature_Celsius     0x0022   033   067   000    Old_age   Always       -       33 (Min/Max -21/67)
195 ECC_Uncorr_Error_Count  0x001c   120   120   000    Old_age   Offline      -       0/75946030
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       0
201 Unc_Soft_Read_Err_Rate  0x001c   120   120   000    Old_age   Offline      -       0/75946030
204 Soft_ECC_Correct_Rate   0x001c   120   120   000    Old_age   Offline      -       0/75946030
230 Life_Curve_Status       0x0013   100   100   000    Pre-fail  Always       -       100
231 SSD_Life_Left           0x0013   099   099   010    Pre-fail  Always       -       1
233 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       10672
234 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       10577
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always       -       10577
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age   Always       -       12937

SMART Error Log not supported

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     26661         -
# 2  Short offline       Completed without error       00%     26637         -
# 3  Short offline       Completed without error       00%     26613         -
# 4  Short offline       Completed without error       00%     26589         -
# 5  Short offline       Completed without error       00%     26565         -
# 6  Extended offline    Completed without error       00%     26558         -
# 7  Short offline       Completed without error       00%     26541         -
# 8  Short offline       Completed without error       00%     26517         -
# 9  Short offline       Completed without error       00%     26493         -
#10  Short offline       Completed without error       00%     26469         -
#11  Short offline       Completed without error       00%     26445         -
#12  Short offline       Completed without error       00%     26421         -
#13  Short offline       Completed without error       00%     26397         -
#14  Extended offline    Completed without error       00%     26390         -
#15  Short offline       Completed without error       00%     26373         -
#16  Short offline       Completed without error       00%     26349         -
#17  Short offline       Completed without error       00%     26325         -
#18  Short offline       Completed without error       00%     26301         -
#19  Short offline       Completed without error       00%     26277         -
#20  Short offline       Completed without error       00%     26253         -
#21  Short offline       Completed without error       00%     26229         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Code:
root@truenas[~]# smartctl -a /dev/ada1
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SandForce Driven SSDs
Device Model:     KINGSTON SV300S37A120G
Serial Number:    50026B723B097F14
LU WWN Device Id: 5 0026b7 23b097f14
Firmware Version: 60AABBF0
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
TRIM Command:     Available, deterministic
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Aug  6 11:42:24 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7d) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Abort Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  48) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x0025) SCT Status supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   095   095   050    Old_age   Always       -       0/76456475
  5 Retired_Block_Count     0x0033   100   100   003    Pre-fail  Always       -       2
  9 Power_On_Hours_and_Msec 0x0032   064   064   000    Old_age   Always       -       31822h+58m+11.310s
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1458
171 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       1
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age   Offline      -       83
177 Wear_Range_Delta        0x0000   000   000   000    Old_age   Offline      -       6
181 Program_Fail_Count      0x000a   100   100   000    Old_age   Always       -       1
182 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0012   100   100   000    Old_age   Always       -       0
189 Airflow_Temperature_Cel 0x0000   032   062   000    Old_age   Offline      -       32 (Min/Max 15/62)
194 Temperature_Celsius     0x0022   032   062   000    Old_age   Always       -       32 (Min/Max 15/62)
195 ECC_Uncorr_Error_Count  0x001c   120   120   000    Old_age   Offline      -       0/76456475
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       2
201 Unc_Soft_Read_Err_Rate  0x001c   120   120   000    Old_age   Offline      -       0/76456475
204 Soft_ECC_Correct_Rate   0x001c   120   120   000    Old_age   Offline      -       0/76456475
230 Life_Curve_Status       0x0013   100   100   000    Pre-fail  Always       -       100
231 SSD_Life_Left           0x0013   098   098   010    Pre-fail  Always       -       4294967296
233 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       16745
234 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       12633
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always       -       12633
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age   Always       -       14639

SMART Error Log not supported

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     31799         -
# 2  Short offline       Completed without error       00%     31775         -
# 3  Short offline       Completed without error       00%     31751         -
# 4  Short offline       Completed without error       00%     31727         -
# 5  Short offline       Completed without error       00%     31703         -
# 6  Extended offline    Completed without error       00%     31696         -
# 7  Short offline       Completed without error       00%     31679         -
# 8  Short offline       Completed without error       00%     31655         -
# 9  Short offline       Completed without error       00%     31631         -
#10  Short offline       Completed without error       00%     31607         -
#11  Short offline       Completed without error       00%     31583         -
#12  Short offline       Completed without error       00%     31559         -
#13  Short offline       Completed without error       00%     31535         -
#14  Extended offline    Completed without error       00%     31528         -
#15  Short offline       Completed without error       00%     31511         -
#16  Short offline       Completed without error       00%     31487         -
#17  Short offline       Completed without error       00%     31463         -
#18  Short offline       Completed without error       00%     31439         -
#19  Short offline       Completed without error       00%     31415         -
#20  Short offline       Completed without error       00%     31391         -
#21  Short offline       Completed without error       00%     31367         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Last edited:

TooMuchData

Contributor
Joined
Jan 4, 2015
Messages
188

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,972
Script runs fine, mail get's send, everything seems ok but.. running it from the cli with the requested syntax and inputfile leaves the file empty and gives an error such as follows:
When you say the script runs okay, does it also provide you all the drives you are missing when you just run in normally, without trying to use the testing capabilities?

The input file must be created is a specific way and only one drive can be tested at a time. Also, that input file will not be reported in the Text Section of the email output. This option was really just for me to test out new drives that were not being fully recognized resulting in me modifying the code to support the drive in question. It became very helpful to me just to ask someone to send me the text file and I could add the drive(s) in question.

The line that states "Modified smartdata=" should have a copy of the input file contents listed. This tells me that the input file was either empty, did not exist, or something else weird is going on.

The input file must be created by using the command
Code:
smartctl -a /dev/xxx > outputfile.txt
and then it should work. Note that we are not doing a cut and paste here, that fails to provide the data correctly.

But to be truthful, I have not tested the script on TrueNAS 13, I'm still running 12.0-U8.1 although I really doubt that would cause the issue.

I will PM you, then you can send me a copy of the text files you created and I can test it on my system. If we are missing something then I will add it.
 
Joined
Jan 27, 2020
Messages
577
The input file must be created by using the command Code:
smartctl -a /dev/xxx > outputfile.txt
and then it should work. Note that we are not doing a cut and paste here, that fails to provide the data correctly.
Ah well, that's wasn't clearly defined in the script comments, I thought just "SSD" will cut it. Calling the exact dev makes much more sense.

When you say the script runs okay, does it also provide you all the drives you are missing when you just run in normally, without trying to use the testing capabilities?
Except for the 4 SSDs it runs perfectly, but theses are missing complete - the SSD block in the mail summary is just empty. Running the script with or without the input file option doesn't change the error. Although the error appears, the script finishes just fine.
Interestingly the NVME summary works well.
 

SoraKagami

Cadet
Joined
Aug 27, 2022
Messages
3
Thank you very much @TooMuchData, @joeschmuck and everyone that contributed to this script.
Great work!

I would like to suggest for the addition of two (optional) stats to help out with some of the drives out there that have and can benefit from these:

Relevant info:
multi_report.sh v1.6 (05 August 2022)
Tested on: TrueNAS 13.0 U1.1

Raw Read Error Rates: Hex#01 (Decimal #01)​

From the looks of the discussion history this stat used to be present?
Some drives may benefit from having this enabled and listed.
For example: Western Digital Green/Reds (2TB onwards) usually report 0 for this stat. When it increases beyond 0 this can serve as a early warning sign. Or at least, for my failed/failing drives this stat appears to correlate well with preparing for a possible problem in advance.
For Seagate drives this stat hasn't been of much use for me so far.

Helium Remaining: Hex #16 (Decimal #22)​

For Western Digital and Hitachi (HGST) Helium drives, this stat should report 100 out of the box.
Anything less than 100 would be indicative of a potential leak and needs monitoring.

The reported smart description (smartctl report) for Hex#16 reads as "Helium_Level" for WD, and "Unknown_Attribute" for my HGST drives.
In case this helps, for my own smart reporting script (based on the one by joeschmuck, Bidelu0hm and melp), I modified the awk call with the following to deal with these two variations:
Code:
/Power-Off_Retract_Count/{pOffRetract=$10} \
 /22 Unknown_Attribute/{heliumLevel2=$10} \
 
Then later did a comparison to determine which value to show (if any):
Code:
 if (heliumLevel == "") if (heliumLevel2 != "") heliumLevel=heliumLevel2;
 if (heliumLevel == "") heliumColour = bgColor; else if((heliumLevel + 0) == 100) heliumColour = bgColor; else heliumColour = warnColor;


Other relevant information:
In your script there were references to two NVMe stats and commented with "haven't found this on an NVME drive yet" and by default set to "false":
NVM_Drive_Temp_Min
NVM_Drive_Temp_Max

I have only seen these two stats on my two Intel 670p NVMe drives.
Could be an OEM only stat as these two drives came out of ASUS laptops that were recently purchased.

This was my mistake. (Post #100)
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,972
I would like to suggest for the addition of two (optional) stats to help out with some of the drives out there that have and can benefit from these:
I appreciate the suggestions.

Raw Read Error Rates were intentionally removed quite a while ago because for many drives they are unreliable to diagnose a drive failure. Why is that? Because many drives that do not show "0" and are rapidly changing are actually encoded data, to read it properly the data bytes need to be split and then deciphered. It's not impossible but more involved to ensure we are using the correct decoding for all drive manufacturers. For drives that would typically have a value of "0" then if the value does jump up it would generally indicate a mechanical failure. Fortunately there are other items we do watch for that also indicate mechanical failure very well. I will consider adding the statistic to the chart but there is more to it as I have to think about the drives that do not report it clearly.

I like the addition of the Helium Level value, but I cannot use "Unknown_Attribute" as a search parameter. Many drives state this, but I see you actually used "22 Unknown_Attribute" so this could be helpful.

Tell you what, I will send you a Personal Message (PM) and provide you my email address and a current copy of Multi-Report which has a lot of added features. You send me what I need and I will try to incorporate the Helium Level correctly, where I can. If I am unable to clearly define the attribute then I will not add it. If this attribute is custom and I could match it to a drive model, that might be possible.

If you have a way to dump the smartctl -x of the Intel 670p NVMe drives, that would be helpful to check the script, but if they are in a laptop, maybe not depending on the OS.
 

SoraKagami

Cadet
Joined
Aug 27, 2022
Messages
3
Other relevant information:
In your script there were references to two NVMe stats and commented with "haven't found this on an NVME drive yet" and by default set to "false":
NVM_Drive_Temp_Min
NVM_Drive_Temp_Max

I have only seen these two stats on my two Intel 670p NVMe drives.
Could be an OEM only stat as these two drives came out of ASUS laptops that were recently purchased.
Looks like I made a mistake with this comment above. I mistook two other stats (Warning/Critical Comp. Temp Time) for these two while testing these drives & working on my smart reporting script.
Both of these stats do not show up on any of my current NVMe drives.

My sincere apologies, sorry.
 
Last edited:
Top