One or more devices has experienced an unrecoverable error.

Norlig

Explorer
Joined
Jul 13, 2013
Messages
59
Hi,

Running FreeNAS-11.3-U2

I woke up to an error notification email today,
I have this Alert on my Freenas Dashboard.
Alert.png


These are my disks:
Disks.png


Checking ZPool Status and Smartctl attributes, I cant see where the error is, or what drive it is.

Any recommendations?

I believe the damage may be related to a large file move I did about 2 weeks ago, which took 2-3 days to complete
 
Joined
Oct 18, 2018
Messages
969
From zpool status I see the following
Code:
  pool: Freenas-ZFS-RAID-NAS
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 176K in 0 days 07:50:57 with 0 errors on Thu Apr 16 12:50:58 2020
config:

        NAME                                            STATE     READ WRITE CKSUM
        Freenas-ZFS-RAID-NAS                             ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/8c792cbe-d261-11e4-9315-94de802b5aeb  ONLINE       0     0     0
            gptid/cc07c58f-d14a-11e4-acba-94de802b5aeb  ONLINE       0     0     0
            gptid/65b38c70-d1e9-11e4-809b-94de802b5aeb  ONLINE       0     0     0
            gptid/7673f6f3-d0a0-11e4-b2f4-94de802b5aeb  ONLINE       0     0     4

That bit about scan: scrub repaids 176k . . . is telling you that your ZFS scrub worked and actually repaired your data! Good news.

The bit about gptid/7673f6f3-d0a0-11e4-b2f4-94de802b5aeb ONLINE 0 0 4 tells you which disk was the offending disk.

Looking at your smart results I didn't see an immediate indication of the error on the disks you listed. it does depend on when the smart tests were last run though. You'll want to map the gptid id of 7673f6f3-d0a0-11e4-b2f4-94de802b5aeb to the analogous /dev/adaX or /dev/daX and check that disk after rerunning a long smart test on it..
 

Norlig

Explorer
Joined
Jul 13, 2013
Messages
59
From zpool status I see the following


That bit about scan: scrub repaids 176k . . . is telling you that your ZFS scrub worked and actually repaired your data! Good news.

The bit about gptid/7673f6f3-d0a0-11e4-b2f4-94de802b5aeb ONLINE 0 0 4 tells you which disk was the offending disk.

Looking at your smart results I didn't see an immediate indication of the error on the disks you listed. it does depend on when the smart tests were last run though. You'll want to map the gptid id of 7673f6f3-d0a0-11e4-b2f4-94de802b5aeb to the analogous /dev/adaX or /dev/daX and check that disk after rerunning a long smart test on it..

Thank you for the detailed reply! :)

I run Long Smart Tests 2 times a month and short smart tests about 2 times a week,

Scrubs are run 2 times a month.

Smart%20tests.png


Scrub%20Tasks.png


From what you're saying, it seems like it would be safe to run Zpool clear, this time?
 
Joined
Oct 18, 2018
Messages
969
If it were me, I'd run a long test on all drives in that pool right now unless you're certain one ran after the error was detected. You're using RAIDZ1 for that vdev so if two drives go down you're outa luck.
 

Norlig

Explorer
Joined
Jul 13, 2013
Messages
59
If it were me, I'd run a long test on all drives in that pool right now unless you're certain one ran after the error was detected. You're using RAIDZ1 for that vdev so if two drives go down you're outa luck.
I ran a long test on all drives, no errors were identified, so I ran zpool clear.

Thanks :)
 

Gimpymoo

Dabbler
Joined
Sep 10, 2018
Messages
39
I had a similiar error when my SATA controller was on the fritz. The error was intermittent.

Replaced the SATA controller over 6 months ago, no more errors.
 

Norlig

Explorer
Joined
Jul 13, 2013
Messages
59
I had a similiar error when my SATA controller was on the fritz. The error was intermittent.

Replaced the SATA controller over 6 months ago, no more errors.
My drives are connected to the motherboard though =/

I do have a HBA flashed LSI card, but when I use that, the drives seem to be slower to read/write from.
 
Top