CRITICAL Device: /dev/da0 [SAT], 8 Currently unreadable (pending) sectors.

seb101

Contributor
Joined
Jun 29, 2019
Messages
142
Hi all,

Had an accident with the NAS today, a power cable got knocked out of the back of the UPS while I was fixing another device and the NAS lost power.

Came back online straight away with no errors, but then about 2 hours later I got an alert:

CRITICAL
Device: /dev/da0 [SAT], 8 Currently unreadable (pending) sectors.

da0 is a member of my 'primary_array' pool - but this is showing no Zpool errors:

Code:
Geom name: da0p2
Providers:
1. Name: gptid/6fcfd1c3-39c4-11ea-a333-0cc47aab393c


Code:
  pool: primary_array
state: ONLINE
  scan: resilvered 1.42T in 0 days 04:16:34 with 0 errors on Sat Jan 18 17:59:23 2020
config:

        NAME                                            STATE     READ WRITE CKSUM
        primary_array                                   ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/65bed2e9-39f8-11ea-a333-0cc47aab393c  ONLINE       0     0     0
            gptid/6fcfd1c3-39c4-11ea-a333-0cc47aab393c  ONLINE       0     0     0

errors: No known data errors


SMART shows just those 8 'Pending' sectors.

Code:
root@nas[/]# smartctl -A /dev/da0
smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   138   138   054    Pre-fail  Offline      -       100
  3 Spin_Up_Time            0x0007   134   134   024    Pre-fail  Always       -       481 (Average 497)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       66
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       2662
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       66
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       139
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       139
194 Temperature_Celsius     0x0002   166   166   000    Old_age   Always       -       36 (Min/Max 11/53)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       8
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0


Is it safe to continue with this disk and just keep an eye on this? I have to assume the two incidents (power loss and this alert) are in some way related.
 

hervon

Patron
Joined
Apr 23, 2012
Messages
353
What did your search in this forum & the internet revealed ?
Got this after a few seconds search:
 

zorak950

Dabbler
Joined
Jul 6, 2019
Messages
16
They may or may not be related, but I wouldn't strike out the alarm yet. If the unreadable count increases definitely replace it. If it stays the same, I'd consider replacement optional: depends how paranoid you are (folks around here certainly run the gamut). If it goes away, forget about it: probably a temporary hiccup caused by the power loss.
 

seb101

Contributor
Joined
Jun 29, 2019
Messages
142
What did your search in this forum & the internet revealed ?
Got this after a few seconds search:
My search revealed a lot of conflicting opinion and different advice - from "REPLACE IMMEDIATELY THE DRIVE IS ABOUT TO DESTROY ITSELF" to "its probably nothing, just keep an eye on it"

As I understand it, a 'pending' sector is one that was unable to be read once, it can't be reallocated yet beause the drive doesn't know what the value should be as it can't be read. The next time the sector is written to it will either be reallocated or, in some drives, written and if then successfuly read again, marked as 'good'.

I've triggered a scrub on the pool. If the pending count goes down - or those exact 8 sectors get reallocated, then I'll just keep an eye on it and replace if there is any further increase. If the scrub triggers more pendings, I'll replace straight away.
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
There's no particular reason to expect that you'd see pool errors when you have reported bad sectors--they could be sectors without data, so the pool wouldn't know anything about them. Eight is about on the edge of what I'd be willing to wait and see about, though--bad sectors in the low single digits, IMO, are "wait and see." Double digits are definitely "replace ASAP."
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
And don't truncate the output of smart commands when posting. Post all the data, it's relevant.
 

hervon

Patron
Joined
Apr 23, 2012
Messages
353
If it's still on warranty ask for a replacement.

Edit : And when you post an issue please tell us what you tried or looked for as solutions.
 

ChiknNutz

Patron
Joined
Nov 6, 2015
Messages
217
I have had this exact same thing going on for a couple months now. The count hasn't increased, but I ordered new drives anyway as a precaution (I have two disks giving some errors, but just one with this particular issue). So far, everything has been working fine, so I've not yet replaced the offending drive.
 

seb101

Contributor
Joined
Jun 29, 2019
Messages
142
Code:
  pool: primary_array
state: ONLINE
  scan: scrub repaired 0 in 0 days 02:48:49 with 0 errors on Fri Feb  7 01:30:09 2020
config:

        NAME                                            STATE     READ WRITE CKSUM
        primary_array                                   ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/65bed2e9-39f8-11ea-a333-0cc47aab393c  ONLINE       0     0     0
            gptid/6fcfd1c3-39c4-11ea-a333-0cc47aab393c  ONLINE       0     0     0

errors: No known data errors

root@nas[/]# smartctl -A /dev/da0
smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   138   138   054    Pre-fail  Offline      -       100
  3 Spin_Up_Time            0x0007   134   134   024    Pre-fail  Always       -       481 (Average 497)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       66
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       2671
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       66
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       139
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       139
194 Temperature_Celsius     0x0002   200   200   000    Old_age   Always       -       30 (Min/Max 11/53)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       8
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0


Scrub completed, no errors. However still 8 pending. I guess I will just wait and see!
 
Last edited:

seb101

Contributor
Joined
Jun 29, 2019
Messages
142
And don't truncate the output of smart commands when posting. Post all the data, it's relevant.
I didn't truncate anything, I used the -A option.

Would it be useful to see the complete -a option?
 

seb101

Contributor
Joined
Jun 29, 2019
Messages
142
There's no particular reason to expect that you'd see pool errors when you have reported bad sectors--they could be sectors without data, so the pool wouldn't know anything about them. Eight is about on the edge of what I'd be willing to wait and see about, though--bad sectors in the low single digits, IMO, are "wait and see." Double digits are definitely "replace ASAP."

If they are sectors without ZPOOL data, why did they get flagged up at all? Is it part of the SMART routine to check random parts of the disk?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Is it part of the SMART routine to check random parts of the disk?
It should be part of the SMART routine (if you're doing long SMART tests, which you should be) to test the entire disk.
Would it be useful to see the complete -a option?
Yes, the -a or -x flag would provide more information that would be helpful.
 

seb101

Contributor
Joined
Jun 29, 2019
Messages
142
Code:
root@nas[/]# smartctl -a /dev/da0
smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     HGST Ultrastar 7K6000
Device Model:     HGST HUS726060ALE610
Serial Number:    NCC5K5XS
LU WWN Device Id: 5 000cca 24aa2fac2
Firmware Version: APGNT7J0
User Capacity:    6,001,175,126,016 bytes [6.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Feb  7 16:59:13 2020 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  113) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 741) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   138   138   054    Pre-fail  Offline      -       100
  3 Spin_Up_Time            0x0007   134   134   024    Pre-fail  Always       -       481 (Average 497)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       66
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       2681
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       66
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       140
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       140
194 Temperature_Celsius     0x0002   193   193   000    Old_age   Always       -       31 (Min/Max 11/53)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       8
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
You need to set up smart tests on your hard drives. I'd run a long test on that drive now to check it's integrity.
 

seb101

Contributor
Joined
Jun 29, 2019
Messages
142
I have completed the Long SMART test on the drive in question. There were no errors reported but there has been no change to the Pending Sector counter either. I don't know what this means, Long test claims to read every sector. So either:

1) The exact same 8 pending sectors could still not be read during the Long test and the test result is misleading.
or
2) The 8 pending sectors could now be read during the Long test but the counter is sticky.

I assume it's option 2 otherwise it would have been reported as an error on the test result right?

Code:
root@nas[/]# smartctl -a /dev/da0
smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     HGST Ultrastar 7K6000
Device Model:     HGST HUS726060ALE610
Serial Number:    NCC5K5XS
LU WWN Device Id: 5 000cca 24aa2fac2
Firmware Version: APGNT7J0
User Capacity:    6,001,175,126,016 bytes [6.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Feb  8 09:25:25 2020 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  113) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 741) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   138   138   054    Pre-fail  Offline      -       100
  3 Spin_Up_Time            0x0007   134   134   024    Pre-fail  Always       -       481 (Average 497)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       66
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       2697
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       66
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       141
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       141
194 Temperature_Celsius     0x0002   187   187   000    Old_age   Always       -       32 (Min/Max 11/53)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       8
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      2696         -
# 2  Short offline       Completed without error       00%      2684         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Joined
May 10, 2017
Messages
838
2) The 8 pending sectors could now be read during the Long test but the counter is sticky.

Yes, this sometimes happens, especially with WD and HGST drives, they are "false positives", a full drive write will usully correct that, or just ignore.
 

seb101

Contributor
Joined
Jun 29, 2019
Messages
142
Yes, this sometimes happens, especially with WD and HGST drives, they are "false positives", a full drive write will usully correct that, or just ignore.
I figured. Is there any way to have FreeNAS ignore this too? It's generating log entries every half hour and alerts on a schedule that I've yet to fathom. I need it to 'acknowledge' that there are 8 and only alarm if it goes higher.
 

mkarwin

Dabbler
Joined
Jun 10, 2021
Messages
40
Yes, this sometimes happens, especially with WD and HGST drives, they are "false positives", a full drive write will usully correct that, or just ignore.
Any proposition/option how to do it in a safe manner when the drive is one in 4-disk RaidZ1? One of my HC550s just started showing 8 of these after last long scan, though short test completes without any errors...
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
I take the drive out of the array / pool and then badblocks it and see what the result is. Often the pending goes back to zero and the drive can be used again (so far without issue)
 

seb101

Contributor
Joined
Jun 29, 2019
Messages
142
I take the drive out of the array / pool and then badblocks it and see what the result is. Often the pending goes back to zero and the drive can be used again (so far without issue)
Please let me know the results of this! My HGST is still showing the same 8 sector unreadable error over 2 years later, but is otherwise in perfect working order.

I think you'll need to do the full read-write badblocks test to have a chance of solving it.
 
Top