SMART Test Not Happening on SSD

horse_porcupine

Dabbler
Joined
Jun 11, 2013
Messages
22
I'm trying to run a SMART test on my SSD boot drive. I've tried running this from the command line and from the GUI, and whichever way I try, it suggests it will be finished in a couple of minutes. I've then tried to query the SMART results from the command line (smartctl -a /dev/ada0) to see the results and... it tells me I haven't run a test since the drive was 255 hours old (it's now 15k hours old). The results of the test are below.

Has anyone any ideas or suggestions as to what's going on here? I'm not really sure where to begin troubleshooting on this one. Maybe the SSD is old and it's time to replace it (I've backups of my config), but it's a curious situation so I'm interested to hear what might be happening here.

Code:
smartctl 7.1 2019-12-30 r5022 [FreeBSD 12.2-RELEASE-p3 amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron BX/MX1/2/3/500, M5/600, 1100 SSDs
Device Model:     CT120BX300SSD1
Serial Number:    XXXXXXXXXXXXXXXXXXXXXXXXXXXX
Firmware Version: M2CR010
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Apr 14 20:07:44 2021 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x03) Offline data collection activity
                                        is in progress.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  180) seconds.
Offline data collection
capabilities:                    (0x71) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   2) minutes.
Conveyance self-test routine
recommended polling time:        (   1) minutes.
SCT capabilities:              (0x0035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       7318
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       15359
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       93
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   096   096   000    Old_age   Always       -       141
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       2
180 Unused_Reserve_NAND_Blk 0x0033   100   100   000    Pre-fail  Always       -       43
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       40
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Remain 0x0030   096   096   001    Old_age   Offline      -       4
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       17106167346
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       150243444
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       22995958

SMART Error Log Version: 1
Invalid Error Log index = 0x0f (T13/1321D rev 1c Section 8.41.6.8.2.2 gives valid range from 1 to 5)

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       255         -
# 2  Short offline       Completed without error       00%       255         -
# 3  Short offline       Completed without error       00%       255         -
# 4  Offline             Completed without error       00%       254         -
# 5  Extended offline    Completed without error       00%       217         -
# 6  Short offline       Completed without error       00%       216         -
# 7  Short offline       Completed without error       00%       211         -
# 8  Short offline       Completed without error       00%       187         -
# 9  Short offline       Completed without error       00%       163         -
#10  Offline             Completed without error       00%       139         -
#11  Short offline       Completed without error       00%       116         -
#12  Short offline       Completed without error       00%       115         -
#13  Short offline       Completed without error       00%        91         -
#14  Offline             Completed without error       00%        67         -
#15  Short offline       Completed without error       00%        53         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
    6        0    65535  Read_scanning is in progress
Selective self-test flags (0x18):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
what happens if you launch the test manually?

smartctl -t short /dev/ada0
 

horse_porcupine

Dabbler
Joined
Jun 11, 2013
Messages
22
This is what I'm seeing. Just the regular message about how long it will take to complete:

Code:
smartctl 7.1 2019-12-30 r5022 [FreeBSD 12.2-RELEASE-p3 amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Thu Apr 15 11:50:55 2021 PDT
Use smartctl -X to abort test.
 

horse_porcupine

Dabbler
Joined
Jun 11, 2013
Messages
22
Oh, what the hell... So I decided to try running the command you suggested above @sretalla (which I'm CERTAIN I've run before) and now it shows more recent runs in my SMART results. But they're showing as occurring when the drive was 15 hours old. The question is what did I do 15 hours ago... I think in the TrueNAS GUI I toggled off and then on the SMART reporting for the SSD around that time. So maybe that's fixed it somewhat. Although now the drive thinks it's much younger than it is...

Code:
smartctl 7.1 2019-12-30 r5022 [FreeBSD 12.2-RELEASE-p3 amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron BX/MX1/2/3/500, M5/600, 1100 SSDs
Device Model:     CT120BX300SSD1
Serial Number:    xxxxxx
Firmware Version: M2CR010
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Apr 15 11:52:43 2021 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 246) Self-test routine in progress...
                                        60% of test remaining.
Total time to complete Offline
data collection:                (   60) seconds.
Offline data collection
capabilities:                    (0x71) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   2) minutes.
Conveyance self-test routine
recommended polling time:        (   1) minutes.
SCT capabilities:              (0x0035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       7318
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       15375
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       93
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   096   096   000    Old_age   Always       -       141
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       2
180 Unused_Reserve_NAND_Blk 0x0033   100   100   000    Pre-fail  Always       -       43
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       40
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Remain 0x0030   096   096   001    Old_age   Offline      -       4
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       17118413322
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       150357124
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       23062960

SMART Error Log Version: 1
Invalid Error Log index = 0x12 (T13/1321D rev 1c Section 8.41.6.8.2.2 gives valid range from 1 to 5)

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Self-test routine in progress 60%        15         -
# 2  Short offline       Completed without error       00%        15         -
# 3  Extended offline    Completed without error       00%        15         -
# 4  Short offline       Completed without error       00%       255         -
# 5  Short offline       Completed without error       00%       255         -
# 6  Short offline       Completed without error       00%       255         -
# 7  Offline             Completed without error       00%       254         -
# 8  Extended offline    Completed without error       00%       217         -
# 9  Short offline       Completed without error       00%       216         -
#10  Short offline       Completed without error       00%       211         -
#11  Short offline       Completed without error       00%       187         -
#12  Short offline       Completed without error       00%       163         -
#13  Offline             Completed without error       00%       139         -
#14  Short offline       Completed without error       00%       116         -
#15  Short offline       Completed without error       00%       115         -
#16  Short offline       Completed without error       00%        91         -
#17  Offline             Completed without error       00%        67         -
#18  Short offline       Completed without error       00%        53         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
    6        0    65535  Read_scanning was never started
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
OK. I think there's something really not good going on with that drive.

Your read error count is quite high and you're showing an average erase count of 141 ( meaning the entire drive has been written and erased 141 times: over 16 TB written to a 120GB drive!

I suspect that the controller is having some trouble reading the blocks containing the SMART data, hence the crazy numbers for things including drive age of the last SMART test.

I would replace it if you can.
 

horse_porcupine

Dabbler
Joined
Jun 11, 2013
Messages
22
I'm definitely considering replacing it, there's something fundamentally not right here and I don't want to wait for it to go bang!

I think the Average Block Erase Count at 141 is not that problematic though. I think SSD's should be able to handle much more than 16TB of writes in a lifetime... For comparison, I have two SSD's for my jails pool where they're regularly copying data back and forth and they're both up at ~250 and seem healthy. Well, they're not causing me headaches yet at least :grin:

Thanks for the input @sretalla !
 
Top