CRITICAL: Device: /dev/ada3, 9 Currently unreadable (pending) sectors

jay-zzz

Dabbler
Joined
Oct 24, 2014
Messages
27
I have a FreeNAS system running with the following hardware:

FreeNAS-9.3-STABLE-201604150515
Intel Pentium CPU G3220 @ 3.00 GHz Dual-core Intel HD
Motherboard: SuperMicro MBD-X10SLL+ F-O UATX
Memory 16318MB (16GB DDR3)
Storage / HDs = (6) 3TB drives with
750 w Power Supply
OS Running from a USB stick mounted directly on the motherboard.
I believe its using RAID Z-2

As you can tell, I'm on an older version of FreeNAS. I would like to update to the latest version if my current system can support it. However I'd like to first address these alerts from the Alert menu:
  • CRITICAL: Device: /dev/ada3, 9 Currently unreadable (pending) sectors
  • CRITICAL: Device: /dev/ada3, new Self-Test Log error at hour timestamp 42660
  • WARNING: The 9.3-STABLE Train is now in Maintenance Mode. No new features or non-essential changes will be made. Please switch to the 9.10-STABLE train for active support.
  • WARNING: New feature flags are available for volume media121. Refer to the "Upgrading a ZFS Pool" section of the User Guide for instructions.
The one that caused me the most concern is the first alert. I did a little reading up on this issue and came across this article: http://bytesandbolts.com/fixing-freenas-error-currently-unreadable-pending-sectors/. While this article is pretty straight forward on what to do but I am not comfortable with the process as it wants me zero the non working sectors. I know nothing about this and am concerned that I may incorrectly do something to cause data loss.

I performed the first 2 commands from the article and here are the results of the test:
Code:
[root@NAS-Server] ~# smartctl -a /dev/ada3
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red (AF)
Device Model:     WDC WD30EFRX-68AX9N0
Serial Number:    WD-WMC1T1031262
LU WWN Device Id: 5 0014ee 602eae4f9
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Apr 16 17:37:52 2019 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 121)    The previous self-test completed having
                    the read element of the test failed.
Total time to complete Offline
data collection:         (39720) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 399) minutes.
Conveyance self-test routine
recommended polling time:      (   5) minutes.
SCT capabilities:            (0x70bd)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       317
  3 Spin_Up_Time            0x0027   184   176   021    Pre-fail  Always       -       5758
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       122
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   042   042   000    Old_age   Always       -       42668
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       113
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       70
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       51
194 Temperature_Celsius     0x0022   120   098   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       9
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   198   000    Old_age   Offline      -       27

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed: read failure       90%     42660         2286744854
# 2  Short offline       Completed: read failure       60%     42345         1565516848
# 3  Short offline       Completed: read failure       60%     42273         1565516848
# 4  Extended offline    Completed: read failure       90%     42202         2739818888
# 5  Short offline       Completed: read failure       10%     42129         1565516848
# 6  Extended offline    Completed: read failure       10%     41875         1483231208
# 7  Short offline       Completed: read failure       60%     41795         1565516848
# 8  Extended offline    Completed: read failure       10%     41657         1483193496
# 9  Short offline       Completed: read failure       60%     41576         1565516848
#10  Short offline       Completed: read failure       20%     41384         1565516848
#11  Extended offline    Completed: read failure       10%     41320         1143781384
#12  Short offline       Completed: read failure       40%     41240         1565516848
#13  Short offline       Completed: read failure       60%     41049         1565516848
#14  Short offline       Completed: read failure       60%     40977         1565516848
#15  Extended offline    Completed: read failure       50%     40909         2744268752
#16  Short offline       Completed: read failure       60%     40833         1565516848
#17  Short offline       Completed: read failure       60%     40641         1565516848
#18  Extended offline    Completed: read failure       10%     40577         1484303448
#19  Short offline       Completed: read failure       60%     40497         1565516848
#20  Short offline       Completed: read failure       60%     40305         1565516848
#21  Short offline       Completed: read failure       60%     40233         1565516848

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


I need some direction and advice on how to solve the alerts. After the HD issue has been resolved, I would like to upgrade the OS to the latest version. Any direction on how to best accomplish this would also be great. Thanks in advance for you input and guidance.
 
Last edited:

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
Well, I am not sure exactly what the guy on the link you provided was doing. It seems it is just ovewriting those sectors so they don't look faulty.
With the follwoing command:
sysctl kern.geom.debugflags=16
he is litterrally bypassing ZFS and is writing to the disk. If you were to mess this one you could make things worse for yourself.

I don't know, but I think that is somewhat dubious. One sector is fine, redundancy is there to keep you safe, but you have 9 sectors which might indicate your drive is starting to go bad.

Right now, I don't think you should worry yourself with the bad sectors, but I would definitely keep an eye on it.
I would also suggest, before you decide to do any upgrade/updates, that you ran a scrub before just to make sure your pool is fine.
Nothing worse than going through an update and find out the pool is resilvering for no apparent reasons. It could bee driver issue, decryption issue if your pool is encrypted...
With a clean scrub at least you know were you stand.

You have RAID-Z2 so it is not a big problem. If in the rush a second drive fails, you will still be OK but you will have no longer redundancy.
Personally, I don't know if migration from 9.3 directly to 11.2 is doable. It might require incremental updates.
 

jay-zzz

Dabbler
Joined
Oct 24, 2014
Messages
27
@Apollo thanks for your input. I was also thinking the solution provided in the article may not be the best way to move forward.
Just so I understand you correctly, are you saying that I don't have to worry about the Critical Alert for a failing drive until a second drive also shows the same error?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
As you can tell, I'm on an older version of FreeNAS. I would like to update to the latest version if my current system can support it.
Should be fine, but you will have a little trouble... You might want to look at this thread to see if that is something you feel like attempting:

FreeNAS 9.3 upgrade not working - fix to get to 9.10:
https://www.ixsystems.com/community...file-from-web-ixsystems-com.75437/post-524208

Even if you don't want that, you can still upgrade. Just reply back on this and someone can help you. Your system is not that old and can absolutely support the latest version of FreeNAS.
The one that caused me the most concern is the first alert. I did a little reading up on this issue and came across this article: http://bytesandbolts.com/fixing-freenas-error-currently-unreadable-pending-sectors/. While this article is pretty straight forward on what to do but I am not comfortable with the process as it wants me zero the non working sectors. I know nothing about this and am concerned that I may incorrectly do something to cause data loss.
Just don't do that. It is not a good idea.
Code:
  9 Power_On_Hours          0x0032   042   042   000    Old_age   Always       -       42668
You more than have your hours out of this drive. That is about 4.8 years...
Code:
SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed: read failure       90%     42660         2286744854
# 2  Short offline       Completed: read failure       60%     42345         1565516848
# 3  Short offline       Completed: read failure       60%     42273         1565516848
# 4  Extended offline    Completed: read failure       90%     42202         2739818888
# 5  Short offline       Completed: read failure       10%     42129         1565516848
# 6  Extended offline    Completed: read failure       10%     41875         1483231208
# 7  Short offline       Completed: read failure       60%     41795         1565516848
# 8  Extended offline    Completed: read failure       10%     41657         1483193496
# 9  Short offline       Completed: read failure       60%     41576         1565516848
#10  Short offline       Completed: read failure       20%     41384         1565516848
#11  Extended offline    Completed: read failure       10%     41320         1143781384
#12  Short offline       Completed: read failure       40%     41240         1565516848
#13  Short offline       Completed: read failure       60%     41049         1565516848
#14  Short offline       Completed: read failure       60%     40977         1565516848
#15  Extended offline    Completed: read failure       50%     40909         2744268752
#16  Short offline       Completed: read failure       60%     40833         1565516848
#17  Short offline       Completed: read failure       60%     40641         1565516848
#18  Extended offline    Completed: read failure       10%     40577         1484303448
#19  Short offline       Completed: read failure       60%     40497         1565516848
#20  Short offline       Completed: read failure       60%     40305         1565516848
#21  Short offline       Completed: read failure       60%     40233         1565516848
The first time this drive failed a self test you should have got an alert and changed the drive then. This drive has been fully failed for a long time. If it is still in warranty, send it back, otherwise it is just time for a new drive.
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Just so I understand you correctly, are you saying that I don't have to worry about the Critical Alert for a failing drive until a second drive also shows the same error?
No, that is not true at all. Every drive needs to be tended to individually.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
The failing smart tests for the past 3 months are not a red flag for anyone? Why not just replace the disk?
 

jay-zzz

Dabbler
Joined
Oct 24, 2014
Messages
27
The failing smart tests for the past 3 months are not a red flag for anyone? Why not just replace the disk?
@SweetAndLow thanks for your reply. I have a silly question. I would like to replace the drive, but in doing so I have the following questions:
1. Which drive is the ada3? I have 6 drives do I count from the lowest drive and count up or do I start the count from the top and count down?
2. When I replace the drive, will I need to do anything inside FreeNAS or will the data automatically fix itselt based on the RAIDZ2 configuration?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I have 6 drives do I count from the lowest drive and count up or do I start the count from the top and count down?
The da# has no relation to counting the drives. I gave you links to all the resources you need.
Look in the "Useful Commands" link for the one called glabel status
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
In your situation, you will probably need to identify the drive by serial number.
dsc09552.jpg
Many people put labels on their drives to help with that.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Well, I am not sure exactly what the guy on the link you provided was doing. It seems it is just ovewriting those sectors so they don't look faulty.
With the follwoing command:
sysctl kern.geom.debugflags=16
he is litterrally bypassing ZFS and is writing to the disk. If you were to mess this one you could make things worse for yourself.

I don't know, but I think that is somewhat dubious. One sector is fine, redundancy is there to keep you safe, but you have 9 sectors which might indicate your drive is starting to go bad.

That's actually the way we force disks to rewrite bad sectors. This usually causes one of the reserved spares to be allocated and will usually cause the disk to become fully usable again.

SMART long tests will fail if you have an unreadable sector, so forcing the reallocation often has the unexpected side effect of returning a SATA drive to a healthy SMART state.

9 sectors could just be a slight defect in the surface that covers more than a single sector. Not alarming in the least. If the number keeps going up, that is bad.
 
Top