SMART error - current_pending_sector and offline_unreadable

Status
Not open for further replies.

ethajn

Dabbler
Joined
Jan 5, 2013
Messages
19
So, I got the dreaded SMART error email today. I've got 15 current_pending_sector and offline_unreadable errors. I've been following this procedure to scan and try to fix it; so far it doesn't seem to be working.

Any tips would be appreciated. Here's my most recent output from smartctl -a.

The errors toward the end seem to have occurred while I was trying to scan and fix.

Code:
=== START OF INFORMATION SECTION ===
Model Family:    Seagate Barracuda 7200.14 (AF)
Device Model:    ST3000DM001-1CH166
Serial Number:    <redacted>
LU WWN Device Id: 5 000c50 04e2acf1e
Firmware Version: CC43
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:    512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Feb 22 17:26:04 2014 CST
 
==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en
 
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121) The previous self-test completed having
                                        the read element of the test failed.
Total time to complete Offline
data collection:                (  584) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (  1) minutes.
Extended self-test routine
recommended polling time:        ( 338) minutes.
Conveyance self-test routine
recommended polling time:        (  2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.
 
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000f  111  099  006    Pre-fail  Always      -      215395677
  3 Spin_Up_Time            0x0003  092  091  000    Pre-fail  Always      -      0
  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      40
  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      8
  7 Seek_Error_Rate        0x000f  072  060  030    Pre-fail  Always      -      20338659
  9 Power_On_Hours          0x0032  087  087  000    Old_age  Always      -      11770
10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      40
183 Runtime_Bad_Block      0x0032  100  100  000    Old_age  Always      -      0
184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0
187 Reported_Uncorrect      0x0032  094  094  000    Old_age  Always      -      6
188 Command_Timeout        0x0032  100  100  000    Old_age  Always      -      0 0 0
189 High_Fly_Writes        0x003a  099  099  000    Old_age  Always      -      1
190 Airflow_Temperature_Cel 0x0022  065  057  045    Old_age  Always      -      35 (Min/Max 28/43)
191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      19
193 Load_Cycle_Count        0x0032  099  099  000    Old_age  Always      -      2878
194 Temperature_Celsius    0x0022  035  043  000    Old_age  Always      -      35 (0 21 0 0 0)
197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      15
198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      15
199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0
240 Head_Flying_Hours      0x0000  100  253  000    Old_age  Offline      -      2983h+11m+29.436s
241 Total_LBAs_Written      0x0000  100  253  000    Old_age  Offline      -      20530174758
242 Total_LBAs_Read        0x0000  100  253  000    Old_age  Offline      -      20851656301
 
SMART Error Log Version: 1
ATA Error Count: 5
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
 
Error 5 occurred at disk power-on lifetime: 11768 hours (490 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 01 ff ff ff 4f 00  1d+12:52:04.880  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00  1d+12:52:04.854  READ LOG EXT
  60 00 01 ff ff ff 4f 00  1d+12:52:01.992  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00  1d+12:52:01.937  READ LOG EXT
  60 00 01 ff ff ff 4f 00  1d+12:51:59.079  READ FPDMA QUEUED
 
Error 4 occurred at disk power-on lifetime: 11768 hours (490 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 01 ff ff ff 4f 00  1d+12:52:01.992  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00  1d+12:52:01.937  READ LOG EXT
  60 00 01 ff ff ff 4f 00  1d+12:51:59.079  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00  1d+12:51:59.053  READ LOG EXT
  60 00 01 ff ff ff 4f 00  1d+12:51:56.199  READ FPDMA QUEUED
 
Error 3 occurred at disk power-on lifetime: 11768 hours (490 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 01 ff ff ff 4f 00  1d+12:51:59.079  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00  1d+12:51:59.053  READ LOG EXT
  60 00 01 ff ff ff 4f 00  1d+12:51:56.199  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00  1d+12:51:56.119  READ LOG EXT
  60 00 01 ff ff ff 4f 00  1d+12:51:52.953  READ FPDMA QUEUED
 
Error 2 occurred at disk power-on lifetime: 11768 hours (490 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 01 ff ff ff 4f 00  1d+12:51:56.199  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00  1d+12:51:56.119  READ LOG EXT
  60 00 01 ff ff ff 4f 00  1d+12:51:52.953  READ FPDMA QUEUED
  b0 d1 01 01 4f c2 40 00  1d+12:36:39.000  SMART READ ATTRIBUTE THRESHOLDS [OBS-4]
  b0 d0 01 00 4f c2 40 00  1d+12:36:38.931  SMART READ DATA
 
Error 1 occurred at disk power-on lifetime: 11768 hours (490 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 01 ff ff ff 4f 00  1d+12:51:52.953  READ FPDMA QUEUED
  b0 d1 01 01 4f c2 40 00  1d+12:36:39.000  SMART READ ATTRIBUTE THRESHOLDS [OBS-4]
  b0 d0 01 00 4f c2 40 00  1d+12:36:38.931  SMART READ DATA
  ec 00 01 00 00 00 40 00  1d+12:36:38.929  IDENTIFY DEVICE
  b0 d5 01 01 4f c2 40 00  1d+12:31:44.262  SMART READ LOG
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure      90%    11769        4142374576
# 2  Extended offline    Completed: read failure      30%    11765        4142374576
# 3  Extended offline    Aborted by host              90%    11759        -
# 4  Short offline      Completed without error      00%      6766        -
# 5  Short offline      Completed without error      00%      6765        -
# 6  Short offline      Completed without error      00%      6764        -
# 7  Short offline      Completed without error      00%      6763        -
# 8  Short offline      Completed without error      00%      6762        -
# 9  Short offline      Completed without error      00%      6761        -
#10  Short offline      Completed without error      00%      6760        -
#11  Short offline      Completed without error      00%      6759        -
#12  Short offline      Completed without error      00%      6758        -
#13  Short offline      Completed without error      00%      6757        -
#14  Short offline      Completed without error      00%      6756        -
#15  Short offline      Completed without error      00%      6756        -
#16  Short offline      Completed without error      00%      6754        -
#17  Short offline      Completed without error      00%      6753        -
#18  Short offline      Completed without error      00%      6752        -
#19  Short offline      Completed without error      00%      6751        -
#20  Short offline      Completed without error      00%      6750        -
#21  Short offline      Completed without error      00%      6749        -
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
How is your pool configured? Do you have redundancy? The drive is failing get rid of it..

What hardware/specs? What procedure did you run to "fix" these errors? I'm confused did you run a scrub (hopefully not without ECC ram) to attempt to "fix" the files?

I noticed smartctl indicated firmware update avail for the Seagate.. If you have more of these see http://forums.freenas.org/index.php?threads/updating-seagate-hard-drive-firmware.14561/
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
You also have reallocated sectors.. I consider the drives not worthy of NAS use at that point..
 

ethajn

Dabbler
Joined
Jan 5, 2013
Messages
19
True. I was just trying to see if there was anything else I can try. The drive has been in use over a year, but it's still young enough it shouldn't be failing. I guess I'll rip it out, run seatools on it and start an RMA. With the other 2 drives, I might just reconfigure them as a mirror; that will be plenty of space for the present.
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
I noticed SMART tests hadn't been run since 6766 hours lifetime then jumps to current.. Did you have SMART disabled?
 

ethajn

Dabbler
Joined
Jan 5, 2013
Messages
19
No, I never disabled SMART...I honestly have no idea why there's a gap that large in the log numbers. The gap is nearly 7 months, which is longer than this build has even been running.

If this helps, this drive was part of a previous FreeNAS build which I scrapped for parts. May have been a mistake, the old one worked better.

I don't know enough about how SMART works, but could those earlier tests on the log have been from the older build? Does the drive store SMART test log data, because I would have thought that would be stored with the OS.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
SMART tests are stored on the drive itself.

You are correct, what probably happened is when you put the drives in the FreeNAS build you never setup SMART testing. Bad boy! No cookie for you!
 

ethajn

Dabbler
Joined
Jan 5, 2013
Messages
19
I may in fact be a bad boy and undeserving of a cookie. So then why did I get the email that alerted me to the problem in the first place? Did smartd just happen to come across the error without running a scan?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Don't confuse SMART testing with SMART monitoring.

SMART monitoring will find errors when the disk comes across them through usage.

SMART testing actually looks for errors that might not be found through normal usage.
 

ethajn

Dabbler
Joined
Jan 5, 2013
Messages
19
Well, the number of unreadable sectors jumped from 16 to 158 overnight, so that drive definitely seems to be circling the drain. I have taken the offending disk out of commission and rebuilt the array (instead of just rebuilding as raid3, I just reconfigured the remaining disks as a mirror and restored from backup). So now I guess I'm going to run Seatools and start an RMA.

Thanks for the advice, everyone.
 

Scharbag

Guru
Joined
Feb 1, 2012
Messages
620
True. I was just trying to see if there was anything else I can try. The drive has been in use over a year, but it's still young enough it shouldn't be failing. I guess I'll rip it out, run seatools on it and start an RMA. With the other 2 drives, I might just reconfigure them as a mirror; that will be plenty of space for the present.


I have had drives fail within weeks of going into service. Seagate is pretty good at taking drives back without going through the drive tools process. I just print a copy of the SMART test results and include that in the RMA box. I have replaced 5 of my 13 drives in the last 18 months due to similar errors that you are seeing. It is a pain.
 
Status
Not open for further replies.
Top