one-or-more-devices-has-experienced-an-unrecoverable-error

Irgendjemand5

Dabbler
Joined
Sep 20, 2021
Messages
14
Pool pool_sicher state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.

the cpu cooler fan tried to saw into a sas to sata cable... error might be related to that.
The pool consists of two mirrored drives: 2TB wd red + 2TB Samsung evo 870 (because its my opinion that samsung evo drives don't die on you)

I followed this tread:
but I'm not sure what the output means.

zpool status -v
Code:
pool: pool_sicher
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: resilvered 1.85M in 00:00:30 with 0 errors on Mon Nov 22 11:23:50 2021
config:

        NAME                                            STATE     READ WRITE CKSUM
        pool_sicher                                     ONLINE       0     0 0
          mirror-0                                      ONLINE       0     0 0
            gptid/25224f30-1e1e-11ec-97f0-e0cb4e196da1  ONLINE       0     0 0
            gptid/25898187-1e1e-11ec-97f0-e0cb4e196da1  ONLINE       0    49 0

errors: No known data errors

checked drives with glabel status
drive is the WD red -> long smart Test

smartctl -A /dev/da5
Code:
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       2
  3 Spin_Up_Time            0x0027   169   169   021    Pre-fail  Always       -       4516
  4 Start_Stop_Count        0x0032   094   094   000    Old_age   Always       -       6066
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   041   041   000    Old_age   Always       -       43690
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   095   095   000    Old_age   Always       -       5368
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       43
193 Load_Cycle_Count        0x0032   198   198   000    Old_age   Always       -       6069
194 Temperature_Celsius     0x0022   112   104   000    Old_age   Always       -       35
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   123   000    Old_age   Always       -       101
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0



soo delete error message and see if error repeats itself? and if yes replace drive?
or replace drive now?


System:
mainboard: S26361-D3128-A1
CPU: Intel Xeon E5-2630
Ram: 4x 8gb M393B1K70CH0-YH9 ecc rdim
LSI 6Gbps SAS HBA 9240-8I IT Mode card
 
Last edited:

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Personally I would replace the drive using a spare and then put the drive through a badblocks torture test to see if its OK

Wait a minute are you mirroring a HDD and an SSD?
I would replace the drive with another SSD and then use the HDD as a paperweight / door opener
 

Irgendjemand5

Dabbler
Joined
Sep 20, 2021
Messages
14
Personally I would replace the drive using a spare and then put the drive through a badblocks torture test to see if its OK
thanks for the reply I will do that. I have a spare. But can you please explain the output from my zpool status -v and SMART values a little to me?

I mirror ans SSD and HDD because money is an issue, I don't care about the speed. And somebody told me Samsung evos were unkillable.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
The only issue I see is
199 UDMA_CRC_Error_Count 0x0032 200 123 000 Old_age Always - 101

Try replacing the cable to this drive before running badblocks

ZPool status is telling you the drive has failed a few writes, but the usual fatal SMART error records are all showing zero
 
Top