Critical error: Device: /dev/ada2, Failed SMART usage Attribute: 194 Temperature_Celsius..

Networker_77

Cadet
Joined
Jul 9, 2022
Messages
1
Hi everyone.

I've been running a FreeNAS now TrueNAS server since 2014. Let me say up front that this was my first ever server build, so it does not include recommended hardware.

Yesterday, I received a notification of the error listed in the subject and similar to the one in this post: https://www.truenas.com/community/t...sage-attribute-194-temperature_celsius.77881/. The error concerns my boot drive, which is a 2-year-old Crucial BX500 120GB 3D NAND SATA SSD (CT120BX500SSD1).

Output of -a /dev/ada2:

Code:
=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron Client SSDs
Device Model:     CT120BX500SSD1
Serial Number:    2038E4107E62
LU WWN Device Id: 0 000000 000000000
Firmware Version: M6CR013
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jul  9 10:11:06 2022 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_                                                                                                                       FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   050    Pre-fail  Always       -                                                                                                                              0
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -                                                                                                                              0
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -                                                                                                                              13452
 12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -                                                                                                                              4
171 Program_Fail_Count      0x0032   100   100   050    Old_age   Always       -                                                                                                                              0
172 Erase_Fail_Count        0x0032   100   100   050    Old_age   Always       -                                                                                                                              0
173 Ave_Block-Erase_Count   0x0032   100   100   050    Old_age   Always       -                                                                                                                              1
174 Unexpect_Power_Loss_Ct  0x0032   100   100   050    Old_age   Always       -                                                                                                                              2
180 Unused_Reserve_NAND_Blk 0x0032   100   100   050    Old_age   Always       -                                                                                                                              100
183 SATA_Interfac_Downshift 0x0032   100   100   050    Old_age   Always       -                                                                                                                              0
184 Error_Correction_Count  0x0032   100   100   050    Old_age   Always       -                                                                                                                              0
187 Reported_Uncorrect      0x0032   100   100   050    Old_age   Always       -                                                                                                                              0
194 Temperature_Celsius     0x0022   055   048   050    Old_age   Always   In_the_past 45 (Min/Max 32/52)
196 Reallocated_Event_Count 0x0032   100   100   050    Old_age   Always       -                                                                                                                              0
197 Current_Pending_ECC_Cnt 0x0032   100   100   050    Old_age   Always       -                                                                                                                              0
198 Offline_Uncorrectable   0x0030   100   100   050    Old_age   Offline      -                                                                                                                              0
199 UDMA_CRC_Error_Count    0x0032   100   100   050    Old_age   Always       -                                                                                                                              0
202 Percent_Lifetime_Remain 0x0030   100   100   001    Old_age   Offline      -                                                                                                                              100
206 Write_Error_Rate        0x002e   100   100   050    Old_age   Always       -                                                                                                                              0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   050    Old_age   Always       -                                                                                                                              0
246 Total_LBAs_Written      0x0032   100   100   050    Old_age   Always       -                                                                                                                              38461451
247 Host_Program_Page_Count 0x0032   100   100   050    Old_age   Always       -                                                                                                                              1201920
248 FTL_Program_Page_Count  0x0032   100   100   050    Old_age   Always       -                                                                                                                              0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported
      


The other post said that the error was a firmware problem because the drive is an SSD. Is the same thing happening here, or should I worry?

Thanks.
 

rcaron

Cadet
Joined
Jul 19, 2016
Messages
6
The other post also had a past maximum of 61C. That's toasty and will reduce life. Your past Max is 52 which isn't nearly as bad. I think just the "FAILING NOW" declaration in the other post may be premature / a firmware bug.

If you're OK with the reduced lifespan (and its just a boot drive, so you should be, provided you have a backup of your config), then I'd check the Disk Edit screen and see what your informational and critical temperatures are set to.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
52 degrees I would have thought, whilst warm is OK for an SSD which are:
1. Generally rated to 70 degrees centigrade
2. Will thermally throttle if they think its an issue.

The problem is that by default TN treats all disks as the same, with the same temp limits which are set for HDD's and not SSD's. You need to set specific limits for SSD's to not get spurious and irrelavent warning messages which this one is

Lastly @Networker_77 saying "my hardware isn't reccomended" is NOT the same as posting hardware details. Please see my sig, or @rcaron sig for an example of what to do
 
Top