CRITICAL Device: /dev/ada1, 1 Currently unreadable (pending) sectors.

F0urq

Cadet
Joined
May 25, 2022
Messages
4
I am currently receiving a notification on TrueNas with the following message:

CRITICAL

Device: /dev/ada1, 1 Currently unreadable (pending) sectors.

All short tests passed on all drives. When trying to run an extended/long test on ada1 I get that message.

I was wondering if anyone could help interpret the logs for me and let me know what exactly is going on. I have a lot of "Pre_Fail"'s on all my drives but not sure if its something to be concerned about or not.

I'll post the logs for ada1 below:

If Selective self-test is pending on power-up, resume after 0 minute delay. root@truenas[~]# smartctl -a /dev/ada1 smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RC6 amd64] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Hitachi Ultrastar A7K2000 Device Model: Hitachi HUA722020ALA331 Serial Number: YAJT219Z LU WWN Device Id: 5 000cca 221e71f94 Firmware Version: JKAOA3NH User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Rotation Rate: 7200 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 2.6, 3.0 Gb/s Local Time is: Wed May 25 09:53:28 2022 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x85) Offline data collection activity was aborted by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 119) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: (23506) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 392) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 65536 2 Throughput_Performance 0x0005 131 131 054 Pre-fail Offline - 109 3 Spin_Up_Time 0x0007 138 138 024 Pre-fail Always - 619 (Average 431) 4 Start_Stop_Count 0x0012 098 098 000 Old_age Always - 11247 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 18 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 119 119 020 Pre-fail Offline - 36 9 Power_On_Hours 0x0012 098 098 000 Old_age Always - 18911 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 5321 192 Power-Off_Retract_Count 0x0032 090 090 000 Old_age Always - 12464 193 Load_Cycle_Count 0x0012 090 090 000 Old_age Always - 12464 194 Temperature_Celsius 0x0002 105 105 000 Old_age Always - 57 (Min/Max 17/70) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 18 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 0 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 70% 18911 879358545 # 2 Short offline Completed without error 00% 18903 - # 3 Extended offline Completed: read failure 70% 18894 879335827 # 4 Short offline Completed without error 00% 18892 - # 5 Short offline Completed without error 00% 18879 - # 6 Short offline Completed without error 00% 18855 - # 7 Extended offline Completed without error 00% 18252 - # 8 Short offline Completed without error 00% 18220 - # 9 Short offline Completed without error 00% 18196 - #10 Short offline Completed without error 00% 18172 - #11 Short offline Completed without error 00% 18148 - #12 Short offline Completed without error 00% 18124 - #13 Extended offline Completed without error 00% 18084 - #14 Short offline Completed without error 00% 18052 - #15 Short offline Completed without error 00% 18028 - #16 Short offline Completed without error 00% 18004 - #17 Short offline Completed without error 00% 17980 - #18 Short offline Completed without error 00% 17956 - #19 Extended offline Completed without error 00% 17918 - #20 Short offline Completed without error 00% 17884 - #21 Short offline Completed without error 00% 17860 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing
 
Joined
Jun 2, 2019
Messages
591
Welcome!

That's extremely large power cycle count. What the heck are you doing?
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 5321

Temp seems a bit high. You might want to improve cooling to improve longevity.
194 Temperature_Celsius 0x0002 105 105 000 Old_age Always - 57 (Min/Max 17/70)

Time for a new drive.
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 18
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 18
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 1

More reading

 
Last edited:

F0urq

Cadet
Joined
May 25, 2022
Messages
4
Welcome!

That's extremely large power cycle count. What the heck are you doing?


Temp seems a bit high. You might want to improve cooling to improve longevity.


Time for a new drive.
Thank you for the quick reply! The drives I am using are all used drives unfortunately. So i'll definitely be replacing them with new ones going forward. Hence probably why that power on cycle count is high.

Thanks for all the information. I'll definitely be looking into getting a new HDD extremely soon as well as a potentially better cooling option.

I hope my other drives are okay :oops:
 

F0urq

Cadet
Joined
May 25, 2022
Messages
4
What are the most important things to monitor on SMART tests that are a good indication that a drive needs to be replaced right away?

From your response it looks like the Reallocated_Sector_Ct, Reallocated_Event_Count, and the Current_Pending_Sector?

I'm a little newer to all of this so just trying to get a better understanding.

Thank you so much for the help!
 
Joined
Jun 2, 2019
Messages
591
When you buy replacement drives, stay away from Shingled Magnetic Recording (SMR) drives.

 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Temp seems a bit high.
No, the temperature is extremely high. And since the min/max values represent actual readings during the current power cycle, @F0urq, you're cooking your drives. This drive has seen 70°C since the last time it was powered on, and the current value is 57°C. Best longevity is with 40°C or less. This calls for drastic action ASAP.

As to the SMART results, I agree that the reallocated sector count/events would likely have me replacing the drive (values in the low single digits probably wouldn't, but you're into two digits), but the other factor is the SMART self-tests--this drive is consistently failing the long self-tests. That's a definitely a reason to replace it.
 
Last edited:

F0urq

Cadet
Joined
May 25, 2022
Messages
4
When you buy replacement drives, stay away from Shingled Magnetic Recording (SMR) drives.

Thanks! I was just reading up on this. Looks like WD and Seagate have been moving a lot of their drives over to SMR now >.>

I've just purchased two new 2TB drives which I believe are CMR. They are 2TB 5400RPM Western Digital WD20EURX drives.
No, the temperature is extremely high. And since the min/max values represent actual readings during the current power cycle, @F0urq, you're cooking your drives. This drive has seen 70°C since the last time it was powered on, and the current value is 57°C. Best longevity is with 40°C or less. This calls for drastic action ASAP.

As to the SMART results, I agree that the reallocated sector count/events would likely have me replacing the drive, but the other factor is the SMART self-tests--this drive is consistently failing the long self-tests. That's a definitely a reason to replace it.

Thanks so much for your input! I've just purchased another fan for my tower I have to hopefully aid in reducing the temperature of my drives in there. I had my other 2TB drive running at about 48C currently and with a max temp of 63.

Crazy though because I have two other drives in there that both had a temp of 43C. All of the current drives are 7200rpm. The new one's I've purchased are 5400 so hopefully that aids in reducing temp.


Thanks for that info as well! Really appreciate all the input.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
As @danb35 said - more cooling, a lot more cooling
 
Top