SMART: self-assessment test result FAILED, but tests are passing

emarj

Dabbler
Joined
Feb 7, 2018
Messages
23
Hi, I need some help with understanding this S.M.A.R.T. "situation" on the boot SSD.

I got the scary message in the console: FAILED SMART self-check. BACK UP DATA NOW!

But the tests (short/long) pass and I don't understand which is the metric that is triggering this alert. Infact it says: No failed Attributes found.

Could someone help me?

Code:
root@host: smartctl -a /dev/ada3
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron Client SSDs
Device Model:     CT240BX500SSD1
Serial Number:    2028E4076B04
LU WWN Device Id: 0 000000 000000000
Firmware Version: M6CR022
User Capacity:    240,057,409,536 bytes [240 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Oct 13 11:36:19 2021 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
No failed Attributes found.

General SMART Values:
Offline data collection status:  (0x02)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (  120) seconds.
Offline data collection
capabilities:              (0x11) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    No Selective Self-test supported.
SMART capabilities:            (0x0002)    Does not save SMART data before
                    entering power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   000   100   000    Pre-fail  Always       -       0
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       2501
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       491
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   000   000   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   001   001   000    Old_age   Always       -       35
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       77
180 Unused_Reserve_NAND_Blk 0x0033   100   100   000    Pre-fail  Always       -       227
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   067   043   000    Old_age   Always       -       33 (Min/Max 9/57)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_ECC_Cnt 0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Remain 0x0030   099   099   001    Old_age   Offline      -       1
206 Write_Error_Rate        0x000e   000   000   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       4150537437
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       129704294
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       177253984

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      2476         -
# 2  Short offline       Completed without error       00%      2452         -
# 3  Short offline       Completed without error       00%      2428         -
# 4  Short offline       Completed without error       00%      2404         -
# 5  Short offline       Completed without error       00%      2379         -
# 6  Short offline       Completed without error       00%      2355         -
# 7  Short offline       Completed without error       00%      2331         -
# 8  Short offline       Completed without error       00%      2307         -
# 9  Short offline       Completed without error       00%      2283         -
#10  Short offline       Completed without error       00%      2259         -
#11  Short offline       Completed without error       00%      2235         -
#12  Short offline       Completed without error       00%      2211         -
#13  Short offline       Completed without error       00%      2187         -
#14  Short offline       Completed without error       00%      2163         -
#15  Short offline       Completed without error       00%      2140         -
#16  Short offline       Completed without error       00%      2116         -
#17  Short offline       Completed without error       00%      2092         -
#18  Short offline       Completed without error       00%      2068         -
#19  Short offline       Completed without error       00%      2044         -
#20  Short offline       Completed without error       00%      2020         -
#21  Short offline       Completed without error       00%      1996         -

Selective Self-tests/Logging not supported
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Look at attribute 202, Percent_Lifetime_Remain. It says there's only 1% lifetime remaining. This is what's likely triggering the alarm.
 

emarj

Dabbler
Joined
Feb 7, 2018
Messages
23
Thanks for the reply.

As far as I understand, that attribute should work in the opposite way [source], thus should be 99% life remaining (the SSD in question is almost new).
By reading online seems like at some point this was called Percent_Lifetime_Used, instead of Percent_Lifetime_Remain, and was working the way you described. Maybe this is confusing smartmontools.

Moreover, if this was failing, there should be a a SMART alert in the WHEN_FAILED column I believe.

Might this be a bug on the smartmontools side?
 

emarj

Dabbler
Joined
Feb 7, 2018
Messages
23
Thanks for the clarification.
Infact on the the drivedb.h header (link) is correctly labeled (for Crucial/Micron Client SSDs ):
Code:
 "-v 202,raw48,Percent_Lifetime_Remain " // norm = max(100-raw,0); raw = percent_lifetime_used

Apparently there are no firmware updates on Crucial website. Not sure what should I do from here...
 

emarj

Dabbler
Joined
Feb 7, 2018
Messages
23
Since I have another Crucial BX500 connected to a Windows machine (not mounted tough) I installed "Crucial Storage Executive" (Crucial tool for managing disks and formwares) and it does not detect that drive. I restarted the application several times and only one time the drive appeared was for a split second showing "firmware error".

The disk is working flawlessly and windows detects it with no problem (even if not mounted).

Maybe these cheap disks are a bit quirky in general.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
I use the MX units as a set of mirrored vdevs without much in the way of an issue (I do have to add "-c 0" to the smart options for a firmware issue (this is not the issue you are having)
202 Percent_Lifetime_Remain 0x0030 091 091 001 Old_age Offline - 9
is what I get with the value gradually reducing from 100 downwards and the RAW_Value counting upwards

The same behavior as the OP is getting.

I didn't think the BX's had the endurance to be used this way - but whatever works
 

emarj

Dabbler
Joined
Feb 7, 2018
Messages
23
I use the MX units as a set of mirrored vdevs without much in the way of an issue (I do have to add "-c 0" to the smart options for a firmware issue (this is not the issue you are having)
202 Percent_Lifetime_Remain 0x0030 091 091 001 Old_age Offline - 9
is what I get with the value gradually reducing from 100 downwards and the RAW_Value counting upwards
Thanks for the confirmation

I didn't think the BX's had the endurance to be used this way - but whatever works
I originally bought them for another reason, but one of them ended up inside my NAS. I know I cannot expect them to last for too long, but as a boot drive (with no important data in it) I thought they could behave reasonably well.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
For a boot drive they will be fine
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
@emarj, are you running your system dataset on the boot pool, or your main pool? That's the only scenario I can think of wherein you'd see a lot of writes on the boot pool, as the system dataset contains the logs and the collectd RRD graph data for the GUI reporting.
 
Top