One or more devices has experienced an unrecoverable error. An attempt was made to correct the error...

ant0n

Dabbler
Joined
Jul 28, 2020
Messages
11
Code:
root@antnas:~ # zpool status -v

  pool: antvol

state: ONLINE

status: One or more devices has experienced an unrecoverable error.  An

        attempt was made to correct the error.  Applications are unaffected.

action: Determine if the device needs to be replaced, and clear the errors

        using 'zpool clear' or replace the device with 'zpool replace'.

   see: http://illumos.org/msg/ZFS-8000-9P

  scan: scrub repaired 69.5K in 0 days 06:02:46 with 0 errors on Sun Sep 20 06:02:52 2020

config:



        NAME                                            STATE     READ WRITE CKSUM

        antvol                                          ONLINE       0     0     0

          raidz1-0                                      ONLINE       0     0     0

            gptid/a0a1d9c4-a3e6-11e6-b443-68b59972b9fb  ONLINE       0     0     0

            gptid/a1a9d979-a3e6-11e6-b443-68b59972b9fb  ONLINE       0     0     2

            gptid/a2c9a957-a3e6-11e6-b443-68b59972b9fb  ONLINE       0     0     0

            gptid/a4109b09-a3e6-11e6-b443-68b59972b9fb  ONLINE       0     0     0



errors: No known data errors



  pool: freenas-boot

state: ONLINE

  scan: scrub repaired 0 in 0 days 00:03:25 with 0 errors on Wed Sep 23 03:48:26 2020

config:



        NAME        STATE     READ WRITE CKSUM

        freenas-boot  ONLINE       0     0     0

          mirror-0  ONLINE       0     0     0

            da0p2   ONLINE       0     0     0

            da1p2   ONLINE       0     0     0



errors: No known data errors



root@antnas:~ # glabel status

                                      Name  Status  Components

gptid/a2c9a957-a3e6-11e6-b443-68b59972b9fb     N/A  ada0p2

gptid/a4109b09-a3e6-11e6-b443-68b59972b9fb     N/A  ada1p2

gptid/a0a1d9c4-a3e6-11e6-b443-68b59972b9fb     N/A  ada2p2

gptid/a1a9d979-a3e6-11e6-b443-68b59972b9fb     N/A  ada3p2

gptid/70190399-be84-11ea-a037-68b59972b9fb     N/A  da0p1

gptid/ac9db190-d097-11ea-90bd-68b59972b9fb     N/A  da1p1



root@antnas:~ # smartctl -a /dev/ada3

smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p11 amd64] (local build)

Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org



=== START OF INFORMATION SECTION ===

Model Family:     Western Digital Caviar Green

Device Model:     WDC WD10EAVS-00D7B1

Serial Number:    WD-WCAU4A684326

LU WWN Device Id: 5 0014ee 25836582a

Firmware Version: 01.01A01

User Capacity:    1,000,204,886,016 bytes [1.00 TB]

Sector Size:      512 bytes logical/physical

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   ATA8-ACS (minor revision not indicated)

SATA Version is:  SATA 2.5, 3.0 Gb/s

Local Time is:    Thu Sep 24 05:31:53 2020 CEST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled



=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED



General SMART Values:

Offline data collection status:  (0x84) Offline data collection activity

                                        was suspended by an interrupting command from host.

                                        Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever

                                        been run.

Total time to complete Offline

data collection:                (23400) seconds.

Offline data collection

capabilities:                    (0x7b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine

recommended polling time:        (   2) minutes.

Extended self-test routine

recommended polling time:        ( 268) minutes.

Conveyance self-test routine

recommended polling time:        (   5) minutes.

SCT capabilities:              (0x303f) SCT Status supported.

                                        SCT Error Recovery Control supported.

                                        SCT Feature Control supported.

                                        SCT Data Table supported.



SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0

  3 Spin_Up_Time            0x0027   163   162   021    Pre-fail  Always       -       6841

  4 Start_Stop_Count        0x0032   077   077   000    Old_age   Always       -       23092

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0

  9 Power_On_Hours          0x0032   001   001   000    Old_age   Always       -       89025

10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0

11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0

12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       212

192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       23

193 Load_Cycle_Count        0x0032   193   193   000    Old_age   Always       -       23092

194 Temperature_Celsius     0x0022   111   100   000    Old_age   Always       -       39

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0



SMART Error Log Version: 1

No Errors Logged



SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%     22061         -

# 2  Extended offline    Completed without error       00%     21968         -

# 3  Short offline       Completed without error       00%     21893         -

# 4  Short offline       Completed without error       00%     21737         -

# 5  Extended offline    Completed: read failure       90%     21631         8752669

# 6  Short offline       Completed without error       00%     21345         -

# 7  Extended offline    Completed without error       00%     21252         -

# 8  Short offline       Completed without error       00%     21177         -

# 9  Short offline       Completed without error       00%     21009         -

#10  Extended offline    Completed without error       00%     20917         -

#11  Short offline       Completed without error       00%     20841         -

#12  Short offline       Completed without error       00%     20601         -

#13  Extended offline    Completed without error       00%     20509         -

#14  Short offline       Completed without error       00%     20434         -

#15  Short offline       Completed without error       00%     20266         -

#16  Extended offline    Completed without error       00%     20174         -

#17  Short offline       Completed without error       00%     20098         -

#18  Short offline       Completed without error       00%     19882         -

#19  Extended offline    Completed without error       00%     19790         -

#20  Short offline       Completed without error       00%     19725         -

#21  Short offline       Completed without error       00%     19548         -

1 of 1 failed self-tests are outdated by newer successful extended offline self-test # 2



SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Last edited:

ant0n

Dabbler
Joined
Jul 28, 2020
Messages
11
My Question:

Is it time to pick my cold spare HDD from the shelf and replace the 3rd disk in the RAID5 ?

root@antnas:~ # cat /etc/version
FreeNAS-11.3-U4.1 (66add776a2)

HP Microserver N36L (oldschool :) )

Thank you
 

Fredda

Guru
Joined
Jul 9, 2019
Messages
608
Well it could be a single occurrence. Apart from the one failed test (and the newer one running successful)
the SMART data looks quite OK. (BTW, readability is much better with [CODE][/CODE] tags).

On the other hand, your HDD has a total runtime of over 10 years. And this even on a WD green,
which are according to the manufacturer not meant for 24/7 usage.

So it might be a good idea to think about replacing the HDD anyway.
 

ant0n

Dabbler
Joined
Jul 28, 2020
Messages
11
Thank you Fredda

i am greatful for your experience
i couldnt understand that so good
the error repeats every night in the email so i guess i will power it down and do what you recommended.

this disk is for 2 years on the cold shelf and it will be happy to be used
i will have to order a new spare ;)

have a good day
 

Fredda

Guru
Joined
Jul 9, 2019
Messages
608
You get the email, because the 2 errors are still shown. They'll be cleared if you issue a zpool clear command.
After that you should not get mails anymore, unless the error shows up again.

That said, I this it's still a good idea to replace a HDD after 10 years.
 
Top