Limitedheadroom
Dabbler
- Joined
 - Aug 21, 2015
 
- Messages
 - 34
 
Hi,  I've just increased the size of my pool by replacing all the drives one by one with larger drives.  No sooner had I done this and I got the dreaded
I've never had to deal with this before so would appreciate a bit of guidance.
Checking zpool status I get the following
I can see one of the drives has 9 checksum errors and another had 23! none of these drives are more than 5 days old, and the one with 23 errors had literally just finished resilvering when the alert appeared.
I've run a long smart test on all drives and they all appear to pass. Here is the results for the drive with 23 errors. Again, I don't really know what most of this is telling me, but it says the drive passed.
My question is could this just be down to a loose cable or something from me fiddling about with drives and moving this, this appeared just as I was beginning the resilvering of the final drive. In which case, am I safe to run zpool clear and wait to see if anything pops up again?
Are there other checks I should do first?
Thank you
	
		
			
		
		
	
			
			Code:
One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.I've never had to deal with this before so would appreciate a bit of guidance.
Checking zpool status I get the following
Code:
pool: ChassPool1
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 12K in 0 days 04:21:29 with 0 errors on Sun May  1 06:21:37 2022
config:
    NAME                                            STATE     READ WRITE CKSUM
    ChassPool1                                      ONLINE       0     0     0
      raidz2-0                                      ONLINE       0     0     0
        gptid/e1cd7ad2-c635-11ec-b7a1-d05099c3a27c  ONLINE       0     0     0
        gptid/16ebafcb-c6fb-11ec-991d-d05099c3a27c  ONLINE       0     0     9
        gptid/e471663b-bf1c-11ec-ba23-d05099c3a27c  ONLINE       0     0     0
        gptid/5deacf17-c023-11ec-9c6a-d05099c3a27c  ONLINE       0     0     0
        gptid/54c5a68c-c79e-11ec-8765-d05099c3a27c  ONLINE       0     0    23
        gptid/c4287624-c7f4-11ec-bc5f-d05099c3a27c  ONLINE       0     0     0
errors: No known data errors
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:08:07 with 0 errors on Thu Apr 28 03:53:07 2022
config:
    NAME        STATE     READ WRITE CKSUM
    freenas-boot  ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        da0p2   ONLINE       0     0     0
        da1p2   ONLINE       0     0     0
errors: No known data errorsI can see one of the drives has 9 checksum errors and another had 23! none of these drives are more than 5 days old, and the one with 23 errors had literally just finished resilvering when the alert appeared.
I've run a long smart test on all drives and they all appear to pass. Here is the results for the drive with 23 errors. Again, I don't really know what most of this is telling me, but it says the drive passed.
Code:
smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model:     TOSHIBA HDWG480
Serial Number:    2260A0YSFA3H
LU WWN Device Id: 5 000039 b78cae12a
Firmware Version: 0601
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon May  2 12:16:36 2022 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status:  (0x80)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (  120) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 681) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       12318
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       3
  5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       74
 10 Spin_Retry_Count        0x0033   100   100   030    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       3
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       3
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       31 (Min/Max 14/35)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       83886081
222 Loaded_Hours            0x0032   100   100   000    Old_age   Always       -       74
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       515
240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%        69         -
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.My question is could this just be down to a loose cable or something from me fiddling about with drives and moving this, this appeared just as I was beginning the resilvering of the final drive. In which case, am I safe to run zpool clear and wait to see if anything pops up again?
Are there other checks I should do first?
Thank you