Does FreeNAS write to known Offline Uncorrectable Sectors

Status
Not open for further replies.

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
Backup Build

Supermicro X10SDV-4C-TLN2F
16 GB ECC supported RAM
3 x 4TB WD Green Drives in Raidz
EVGA 500W PSU

Hey guys, I recently built a backup box to my main box to finally practice the backup that I preach. I tried to save some money on this and used WD greens that I pulled out of external hard drives. I Wdidle3 'ed these drives and tried to enable TLER (no dice with mine - worth a shot). I then did my normal Memtest testing of the memory and initial smart testing of the drives. The oldest of the four TB drives had 5 offline uncorrectable sectors, all others checked out great. I ran bad blocks destructively and wrote and read to/from the whole drive 8 times while monitoring the write speed - the afflicted drive was no slower and I was not able to elicit any errors from bad blocks nor subsequent smart errors. In fact, all subsequent smart tests (long, short, and conveyance) showed no issues at all. The number of known offline uncorrectable sectors has not increased - including after loading up this backup with 6 TB of data. I have scrubbed this data and the pool is all healthy.

My question is does my disk know that these sectors are bad and just prevent any data from being written there? If not, does freeNAS know to avoid these sectors? I will be watching this drive carefully - the server emails me smart status every day thanks to some wonderful scripts on this forum. If the number raises at all, the drive will be replaced. But for now, barring any increase (I know that the likelyhood that this drive will fail is increased relative to my other drives) is all good? Has any data been written to those bad sectors?

Thanks so much all.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Bad blocks in SATA drives will be spared out when you write to them.
So running a badblocks destructive test will have caused any bad blocks
not already spared out, to be spared out.
 

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
@Arwen Thanks for the clarification. So if more sectors become offline uncorrectable, will data be moved from them only during the next scrub (assuming parity info is all good) or will data be moved to other sectors following a smart test (I don't think so)? Or will data be moved away from that sector immediately upon being written as zfs's copy on write system would show a fail when reading / writing data? Thanks.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Bad blocks in SATA drives will be spared out when you write to them.
Ideally, but not necessarily. It may be that writes to the bad sectors succeeded, and that subsequent reads could fail. Take a look at the number of reallocated sectors on that specific drive for confirmation (SMART attribute #5).
 

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
@Robert Trevellyan I get 0 for the reallocated sectors on the offending drive which makes me think it just marked them all as bad and couldn't / didn't reallocate the data there (didn't have real data there).
 

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
Thanks for your input. The smart status data still indicate that there are 5 offline uncorrectable sectors, which leads me to believe that they are still bad? Do those numbers ever reset?
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
What do you see in #196? Can you post the SMART data (between CODE tags)?
Do those numbers ever reset?
Pending can reset, not sure about uncorrectable, but I would guess not. I'm just going by what I read on Wikipedia.
 

maglin

Patron
Joined
Jun 20, 2015
Messages
299
If it won't write to a uncorrectable sector then it can't find out later that that sector works. So no.


Sent from my iPhone using Tapatalk
 

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
I think we are all agreeing that these sectors are just bad and will not be written to at all. I'm including the smart data just for completeness but I think we have hammered this one.

Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Green
Device Model:     WDC WD40EZRX-00SPEB0
Serial Number:    WD-WCC4E3DXPS34
LU WWN Device Id: 5 0014ee 260f580f3
Firmware Version: 80.00A80
User Capacity:    4,000,753,476,096 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Jun 24 01:36:28 2016 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 249)    Self-test routine in progress...
                    90% of test remaining.
Total time to complete Offline
data collection:         (53520) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 535) minutes.
Conveyance self-test routine
recommended polling time:      (   5) minutes.
SCT capabilities:            (0x7035)    SCT Status supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       267
  3 Spin_Up_Time            0x0027   195   181   021    Pre-fail  Always       -       7208
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1090
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1027
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1090
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       640
193 Load_Cycle_Count        0x0032   191   191   000    Old_age   Always       -       28101
194 Temperature_Celsius     0x0022   119   094   000    Old_age   Always       -       33
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       5
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

SMART Error Log Version: 1
ATA Error Count: 14 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 14 occurred at disk power-on lifetime: 838 hours (34 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 00 00 00 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 e0 4f c2 a0 00      00:02:50.076  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00      00:02:50.054  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00      00:02:50.031  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00      00:02:50.017  SMART WRITE LOG
  80 45 01 01 44 57 a0 00      00:02:50.012  [VENDOR SPECIFIC]

Error 13 occurred at disk power-on lifetime: 838 hours (34 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 00 00 00 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 e0 4f c2 a0 00      00:02:50.054  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00      00:02:50.031  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00      00:02:50.017  SMART WRITE LOG
  80 45 01 01 44 57 a0 00      00:02:50.012  [VENDOR SPECIFIC]
  ec 44 01 01 00 00 a0 00      00:02:49.977  IDENTIFY DEVICE

Error 12 occurred at disk power-on lifetime: 838 hours (34 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 00 00 00 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 e0 4f c2 a0 00      00:02:50.031  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00      00:02:50.017  SMART WRITE LOG
  80 45 01 01 44 57 a0 00      00:02:50.012  [VENDOR SPECIFIC]
  ec 44 01 01 00 00 a0 00      00:02:49.977  IDENTIFY DEVICE
  80 44 10 00 44 57 a0 00      00:02:45.830  [VENDOR SPECIFIC]

Error 11 occurred at disk power-on lifetime: 838 hours (34 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 00 00 00 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 e0 4f c2 a0 00      00:02:50.017  SMART WRITE LOG
  80 45 01 01 44 57 a0 00      00:02:50.012  [VENDOR SPECIFIC]
  ec 44 01 01 00 00 a0 00      00:02:49.977  IDENTIFY DEVICE
  80 44 10 00 44 57 a0 00      00:02:45.830  [VENDOR SPECIFIC]
  b0 d6 01 e0 4f c2 a0 00      00:02:45.825  SMART WRITE LOG

Error 10 occurred at disk power-on lifetime: 838 hours (34 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 00 00 00 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 e0 4f c2 a0 00      00:02:45.825  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00      00:02:45.803  SMART WRITE LOG
  80 45 01 01 44 57 a0 00      00:02:45.787  [VENDOR SPECIFIC]
  ec 44 01 01 00 00 a0 00      00:02:45.670  IDENTIFY DEVICE
  80 44 10 00 44 57 a0 00      00:02:33.198  [VENDOR SPECIFIC]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       896         -
# 2  Conveyance offline  Aborted by host               70%       887         -
# 3  Conveyance offline  Completed without error       00%       887         -
# 4  Short offline       Completed without error       00%       887         -
# 5  Extended offline    Completed without error       00%       849         -
# 6  Short offline       Completed without error       00%       838         -
# 7  Conveyance offline  Completed without error       00%       838         -
# 8  Conveyance offline  Completed: read failure       90%       808         214085121
# 9  Short offline       Completed: read failure       90%       808         214085120
#10  Extended offline    Completed: read failure       90%       789         214085121
3 of 3 failed self-tests are outdated by newer successful extended offline self-test # 1

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Offline uncorrectable sectors are just that: offline and uncorrectable sectors, they are dead, you can't read or write them.

Pending sectors are sectors with a read error but waiting for a write. If the write succeed then the sector is removed from the pending sectors list and is now like any other good sector. If the write fail then the sector is remapped and added to the reallocated sectors list ;)
 

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
So why would one have 0 reallocated sectors and 5 offline uncorrectable like this drive then? Thanks.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
8 Conveyance offline Completed: read failure 90% 808 214085121
# 9 Short offline Completed: read failure 90% 808 214085120
#10 Extended offline Completed: read failure 90% 789 214085121
Looks like the drive is faulty to me.
 

fricker_greg

Explorer
Joined
Jun 4, 2016
Messages
71
Yeah, though those were the errors that prompted me to ask this question as all future tests indicated no errors, as shown in the smart errors.
 
Status
Not open for further replies.
Top