Is my drive(s) dying?

AndrewParsons

Dabbler
Joined
Jun 14, 2016
Messages
40
Hello Truenas community,

Tursnas Scale is telling me one of my pool is in a critical state.

Am I looking at a failing drive? 3 drives show checksum, of which, one drive shows multi checksums.

Could someone be so kind, and look be low and let me know their thoughts? Also if you need anything to help diagnose the issue please let me know.

Thank you,
Andrew

Code:
root@truenas[~]# zpool status -v
  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:00:50 with 0 errors on Sun Nov 14 03:45:52 2021
config:
 
        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          sda3      ONLINE       0     0     0
 
errors: No known data errors
 
  pool: myJails
 state: ONLINE
  scan: scrub repaired 0B in 00:03:05 with 0 errors on Mon Nov  1 00:03:07 2021
config:
 
        NAME                                    STATE     READ WRITE CKSUM
        myJails                                 ONLINE       0     0     0
          8ea681cb-862e-4350-ae07-1b6c25eeaa21  ONLINE       0     0     0
          50be0eb6-1c40-4aea-886b-35d45fbdd185  ONLINE       0     0     0
 
errors: No known data errors
 
  pool: myVol
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 48K in 16:00:45 with 0 errors on Sat Nov 13 16:00:47 2021
config:
 
        NAME                                      STATE     READ WRITE CKSUM
        myVol                                     ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            deb44d3f-3fbd-11eb-9e5b-90e2ba6f5198  ONLINE       0     0     0
            df6f0741-3fbd-11eb-9e5b-90e2ba6f5198  ONLINE       0     0     1
            df822ffd-3fbd-11eb-9e5b-90e2ba6f5198  ONLINE       0     0     0
            df960a47-3fbd-11eb-9e5b-90e2ba6f5198  ONLINE       0     0     0
            dfb40c25-3fbd-11eb-9e5b-90e2ba6f5198  ONLINE       0     0     1
            dfdda377-3fbd-11eb-9e5b-90e2ba6f5198  ONLINE       0     0     0
            dffe3bd7-3fbd-11eb-9e5b-90e2ba6f5198  ONLINE       0     0     2
            e01590c1-3fbd-11eb-9e5b-90e2ba6f5198  ONLINE       0     0     0
          raidz2-1                                ONLINE       0     0     0
            99cb0640-40d2-11eb-8eea-90e2ba6f5198  ONLINE       0     0     0
            9a2e6c3e-40d2-11eb-8eea-90e2ba6f5198  ONLINE       0     0     0
            9a43695e-40d2-11eb-8eea-90e2ba6f5198  ONLINE       0     0     0
            9ac62f98-40d2-11eb-8eea-90e2ba6f5198  ONLINE       0     0     0
            9af9c604-40d2-11eb-8eea-90e2ba6f5198  ONLINE       0     0     0
            9afeeb1b-40d2-11eb-8eea-90e2ba6f5198  ONLINE       0     0     0
            9af34a3f-40d2-11eb-8eea-90e2ba6f5198  ONLINE       0     0     0
            9b27dd96-40d2-11eb-8eea-90e2ba6f5198  ONLINE       0     0     0
 
errors: No known data errors
root@truenas[~]# blkid | grep "dffe3bd7-3fbd-11eb-9e5b-90e2ba6f5198"
/dev/sds2: LABEL="myVol" UUID="17951602597737542789" UUID_SUB="2123188659640093731" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="dffe3bd7-3fbd-11eb-9e5b-90e2ba6f5198"
root@truenas[~]# smartctl -a /dev/sds2
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.70+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST8000NM0065
Revision:             K004
Compliance:           SPC-4
User Capacity:        8,001,563,222,016 bytes [8.00 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c50093e1c857
Serial number:        ZA17H1V70000R737R0NA
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Wed Nov 17 10:14:43 2021 MST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
 
Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     26 C
Drive Trip Temperature:        60 C
 
Accumulated power on time, hours:minutes 34883:10
Manufactured in week 15 of year 2017
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  141
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  1546
Elements in grown defect list: 0
 
Vendor (Seagate Cache) information
  Blocks sent to initiator = 2610990666
  Blocks received from initiator = 2596448457
  Blocks read from cache and sent to initiator = 1448362391
  Number of read and write commands whose size <= segment size = 99683809
  Number of read and write commands whose size > segment size = 148075
 
Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 34883.17
  number of minutes until next internal SMART test = 35
 
Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   858641983        0         0  858641983          0      98621.803           0
write:         0        0         0         0          0      10681.223           0
verify: 16551786        0         0  16551786          0         33.745           0
 
Non-medium error count:       75
 
 
[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   34560                 - [-   -    -]
# 2  Background long   Completed                   -   34241                 - [-   -    -]
# 3  Background short  Completed                   -   33816                 - [-   -    -]
# 4  Background long   Completed                   -   33519                 - [-   -    -]
# 5  Background short  Completed                   -   33074                 - [-   -    -]
# 6  Background short  Completed                   -   32738                 - [-   -    -]
# 7  Background long   Completed                   -   32644                 - [-   -    -]
# 8  Background short  Completed                   -   32522                 - [-   -    -]
# 9  Background short  Completed                   -   32330                 - [-   -    -]
#10  Background short  Completed                   -   31994                 - [-   -    -]
#11  Background long   Completed                   -   31893                 - [-   -    -]
#12  Background short  Completed                   -   31778                 - [-   -    -]
#13  Background short  Completed                   -   31586                 - [-   -    -]
#14  Background short  Completed                   -   31274                 - [-   -    -]
#15  Background long   Completed                   -   31167                 - [-   -    -]
#16  Background short  Completed                   -   31058                 - [-   -    -]
#17  Background short  Completed                   -   30866                 - [-   -    -]
#18  Background short  Completed                   -   30531                 - [-   -    -]
#19  Background long   Completed                   -   30467                 - [-   -    -]
#20  Background short  Completed                   -   30316                 - [-   -    -]
 
Long (extended) Self-test duration: 47220 seconds [787.0 minutes]
 
root@truenas[~]#
 

AndrewParsons

Dabbler
Joined
Jun 14, 2016
Messages
40
Thank you for your reply, so its looking more like bad disks over HBA issue?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
If it was HBA I would expect the issue to grow, and probably quickly. Also spread across other devices on the HBA.
BTW - your signature is iaccurate. Wrong number of disks and type and no mention of the HBA - but I suppose it could be built into the m/b.
Also (as you haven't told us) are all the disks on the same HBA, same port on the HBA?
 
Top