SCSI medium errors during scrub, showing data repaired but no known data errors in zpool status

83n

Cadet
Joined
Sep 14, 2020
Messages
3
Hey, I am hoping i can get some assistance or advice on some errors i am getting while scrubbing my main pool. My setup is as follows:

Server:
Model: IBM System x3550 M4 Server (7914-32M)
CPU: 2x Intel Xeon CPU E5-2643 0 (3.30GHz 4 Core, 8 Thread)
Memory: 384 GiB (24x Hynix 16 GB 2Rx4 PC3L 10600R)
RAID: ServeRAID M5110 (2x RAID 1 volumes, 1 for OS and 1 for Jails)
NIC: 4x Intel I350 Gigabit (Onboard) 1x Emulex OneConnect 10Gb NIC (Duel port)
HBA: LSI SAS 9201-16e (Flashed IT)

Storage :
Enclosures: 2x NetApp DS4243 (1 IOM6 in each enclosure)
Disks: 48x Seagate ST33000650SS 3TB 7.2K RPM (512b Sectors reformatted from
528b sectors)

FreeNAS:
OS Version: FreeNAS-11.3-U4.1
Pool Configuration:

HEALTHY: (55%) Used / 37.09 TiB Free (8x 6 disk RAIDZ2)

NameTypeUsedAvailableCompressionCompression RatioReadonlyDedupComments
Datadataset46.79 TiB37.09 TiBlz41.00xfalseoffmore_vert

HEALTHY: (5%) Used / 506.94 GiB Free (Simple pool of a hardware RAID 1 volume)

NameTypeUsedAvailableCompressionCompression RatioReadonlyDedupComments
Servicesdataset31.69 GiB506.94 GiBlz41.21xfalseoffmore_vert

I am getting the following errors while running a scrub

Code:
Sep 14 00:24:27 freenas (da14:mps0:0:30:0): READ(10). CDB: 28 00 c0 60 ef c8 00 01 00 00
Sep 14 00:24:27 freenas (da14:mps0:0:30:0): CAM status: SCSI Status Error
Sep 14 00:24:27 freenas (da14:mps0:0:30:0): SCSI status: Check Condition
Sep 14 00:24:27 freenas (da14:mps0:0:30:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
Sep 14 00:24:27 freenas (da14:mps0:0:30:0): Info: 0xc060f09f
Sep 14 00:24:27 freenas (da14:mps0:0:30:0): Field Replaceable Unit: 129
Sep 14 00:24:27 freenas (da14:mps0:0:30:0): Command Specific Info: 0xa1615189
Sep 14 00:24:27 freenas (da14:mps0:0:30:0): Actual Retry Count: 255
Sep 14 00:24:27 freenas (da14:mps0:0:30:0): Descriptor 0x80: 00 00 03 11 00 81 01 fc db 09 01 1e 00 00
Sep 14 00:24:27 freenas (da14:mps0:0:30:0): Error 5, Unretryable error
Sep 14 00:24:30 freenas ZFS: vdev state changed, pool_guid=11123641492454680720 vdev_guid=10686807130487373415

Sep 14 06:30:07 freenas (da23:mps0:0:39:0): READ(16). CDB: 88 00 00 00 00 01 06 5d c5 00 00 00 01 00 00 00
Sep 14 06:30:07 freenas (da23:mps0:0:39:0): CAM status: SCSI Status Error
Sep 14 06:30:07 freenas (da23:mps0:0:39:0): SCSI status: Check Condition
Sep 14 06:30:07 freenas (da23:mps0:0:39:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
Sep 14 06:30:07 freenas (da23:mps0:0:39:0): Info: 0x1065dc5c3
Sep 14 06:30:07 freenas (da23:mps0:0:39:0): Field Replaceable Unit: 129
Sep 14 06:30:07 freenas (da23:mps0:0:39:0): Command Specific Info: 0xa1615189
Sep 14 06:30:07 freenas (da23:mps0:0:39:0): Actual Retry Count: 255
Sep 14 06:30:07 freenas (da23:mps0:0:39:0): Descriptor 0x80: 00 00 03 11 00 81 02 e9 5a 0(da23:mps0:0:39:0): Error 5, Unretryable error
Sep 14 06:30:09 freenas ZFS: vdev state changed, pool_guid=11123641492454680720 vdev_guid=18107924374852554239

Sep 14 18:00:46 freenas (da24:mps0:0:41:0): READ(10). CDB: 28 00 d3 0b e7 20 00 01 00 00
Sep 14 18:00:46 freenas (da24:mps0:0:41:0): CAM status: SCSI Status Error
Sep 14 18:00:46 freenas (da24:mps0:0:41:0): SCSI status: Check Condition
Sep 14 18:00:46 freenas (da24:mps0:0:41:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
Sep 14 18:00:46 freenas (da24:mps0:0:41:0): Info: 0xd30be819
Sep 14 18:00:46 freenas (da24:mps0:0:41:0): Field Replaceable Unit: 129
Sep 14 18:00:46 freenas (da24:mps0:0:41:0): Command Specific Info: 0xa69d0701
Sep 14 18:00:46 freenas (da24:mps0:0:41:0): Actual Retry Count: 255
Sep 14 18:00:46 freenas (da24:mps0:0:41:0): Descriptor 0x80: 00 00 03 11 00 81 02 1b 5b 05 01 34 00 00
Sep 14 18:00:46 freenas (da24:mps0:0:41:0): Error 5, Unretryable error
Sep 14 18:00:52 freenas ZFS: vdev state changed, pool_guid=11123641492454680720 vdev_guid=7468315239559028326


When i run zpool status i can see there has been data repaired but it is not showing any data errors

Code:
zpool status
  pool: Data
 state: ONLINE
  scan: scrub in progress since Sun Sep 13 23:00:09 2020
        70.1T scanned at 918M/s, 43.5T issued at 570M/s, 70.1T total
        3M repaired, 62.14% done, 0 days 13:33:01 to go
config:

        NAME                                            STATE     READ WRITE CKSUM
        Data                                            ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/8a1c1896-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/8abe167d-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/8db855ce-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/8d986eb0-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/8d8d8520-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/8e007041-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
          raidz2-1                                      ONLINE       0     0     0
            gptid/8af2eae2-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/8c272ae9-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/8e61292d-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/8efdc51b-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/8ec09204-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/8f1d7f99-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
          raidz2-2                                      ONLINE       0     0     0
            gptid/8ae937b6-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/8b961045-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/8dc37f9e-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/8f5d9220-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/95e13ddb-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/98634c94-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
          raidz2-3                                      ONLINE       0     0     0
            gptid/986ffa00-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/98cb54af-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/99991cfd-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/99c3ca51-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/9a0c41ac-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/9b943f72-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
          raidz2-4                                      ONLINE       0     0     0
            gptid/9a9a28fd-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/9b5a082f-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/9c1316e1-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/9d33e2ac-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/9cd956b2-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/9cf3e01b-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
          raidz2-5                                      ONLINE       0     0     0
            gptid/9d21a32e-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/9d63f987-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/a1a3dbf6-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/a49ae247-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/a4b35860-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/96712300-eae3-11ea-ba6e-0090fa96a2c8  ONLINE       0     0     0
          raidz2-6                                      ONLINE       0     0     0
            gptid/a6910bd6-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/a714fce5-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/a72a55bf-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/a7e34b60-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/a8ccbb39-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/aad7536b-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
          raidz2-7                                      ONLINE       0     0     0
            gptid/ab1c7045-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/ab3fb99c-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/ab609dde-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/ab557025-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/ab8b1cc1-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
            gptid/aba30cf0-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0

errors: No known data errors

  pool: Services
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:15:26 with 0 errors on Sat Sep 12 23:15:28 2020
config:

        NAME                                          STATE     READ WRITE CKSUM
        Services                                      ONLINE       0     0     0
          gptid/56bdab3e-dfdb-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:44 with 0 errors on Sun Sep 13 03:45:44 2020
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          da48p2    ONLINE       0     0     0

errors: No known data errors


This is what i see from smartctl for the 3 disks that has thrown the errors and one that has not

Code:
root@freenas[~]# smartctl /dev/da14 -a
smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p11 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               IBM-SSG
Product:              S7AQ3P0
Revision:             A058
Compliance:           SPC-4
User Capacity:        3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c5005716b697
Serial number:        Z298BSRN0000C4036ZRA
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Sep 14 21:16:19 2020 AEST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     35 C
Drive Trip Temperature:        65 C

Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   2285659877        0         0  2285659877          1       4294.049           1
write:         0        0         0         0          0      22915.078           0
verify: 3160043153        0         0  3160043153          0      12539.994           0

Non-medium error count:       41

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   33751                 - [-   -    -]
# 2  Background short  Completed                   -   33727                 - [-   -    -]
# 3  Background short  Completed                   -   33703                 - [-   -    -]
# 4  Background short  Completed                   -   33690                 - [-   -    -]
# 5  Background short  Completed                   -   33667                 - [-   -    -]
# 6  Background short  Completed                   -   33665                 - [-   -    -]
# 7  Background short  Completed                   -   33641                 - [-   -    -]
# 8  Background short  Completed                   -   33617                 - [-   -    -]
# 9  Background short  Completed                   -   33593                 - [-   -    -]
#10  Background short  Completed                   -   33568                 - [-   -    -]
#11  Background short  Completed                   -   33544                 - [-   -    -]
#12  Background short  Completed                   -   33520                 - [-   -    -]
#13  Background short  Completed                   -   33496                 - [-   -    -]
#14  Background short  Completed                   -   33472                 - [-   -    -]
#15  Background short  Completed                   -   33447                 - [-   -    -]
#16  Background short  Completed                   -   33423                 - [-   -    -]
#17  Background short  Completed                   -   33400                 - [-   -    -]
#18  Background short  Completed                   -   33376                 - [-   -    -]
#19  Background short  Completed                   -   33352                 - [-   -    -]
#20  Background short  Completed                   -   33327                 - [-   -    -]

Long (extended) Self-test duration: 27600 seconds [460.0 minutes]

root@freenas[~]# smartctl /dev/da23 -a
smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p11 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               IBM-SSG
Product:              S7AQ3P0
Revision:             A058
Compliance:           SPC-4
User Capacity:        3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c50040f7dc8f
Serial number:        Z291R4PN00009227YSFB
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Sep 14 21:16:29 2020 AEST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     38 C
Drive Trip Temperature:        65 C

Elements in grown defect list: 20

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   2809016694        0         0  2809016694          1    1334324.714           1
write:         0        0         0         0          0     124564.651           0
verify: 2369234760        0         0  2369234760          0    5534693.455           0

Non-medium error count:       79

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   65535                 - [-   -    -]
# 2  Background short  Completed                   -   65535                 - [-   -    -]
# 3  Background short  Completed                   -   65535                 - [-   -    -]
# 4  Background short  Completed                   -   65535                 - [-   -    -]
# 5  Background short  Completed                   -   65535                 - [-   -    -]
# 6  Background short  Completed                   -   65535                 - [-   -    -]
# 7  Background short  Completed                   -   65535                 - [-   -    -]
# 8  Background short  Completed                   -   65535                 - [-   -    -]
# 9  Background short  Completed                   -   65535                 - [-   -    -]
#10  Background short  Completed                   -   65535                 - [-   -    -]
#11  Background short  Completed                   -   65535                 - [-   -    -]
#12  Background short  Completed                   -   65535                 - [-   -    -]
#13  Background short  Completed                   -   65535                 - [-   -    -]
#14  Background short  Completed                   -   65535                 - [-   -    -]
#15  Background short  Completed                   -   65535                 - [-   -    -]
#16  Background short  Completed                   -   65535                 - [-   -    -]
#17  Background short  Completed                   -   65535                 - [-   -    -]
#18  Background short  Completed                   -   65535                 - [-   -    -]
#19  Background short  Completed                   -   65535                 - [-   -    -]
#20  Background short  Completed                   -   65535                 - [-   -    -]

Long (extended) Self-test duration: 27600 seconds [460.0 minutes]

root@freenas[~]# smartctl /dev/da24 -a
smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p11 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               IBM-SSG
Product:              S7AQ3P0
Revision:             A058
Compliance:           SPC-4
User Capacity:        3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c50040f7f11b
Serial number:        Z291QJJ00000921939NB
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Sep 14 21:16:35 2020 AEST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     37 C
Drive Trip Temperature:        65 C

Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   3824800365        0         0  3824800365          1    1256600.361           1
write:         0        0         0         0          0     117323.045           0
verify: 331869279        0         0  331869279          0    6185375.410           0

Non-medium error count:       81

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   65535                 - [-   -    -]
# 2  Background short  Completed                   -   65535                 - [-   -    -]
# 3  Background short  Completed                   -   65535                 - [-   -    -]
# 4  Background short  Completed                   -   65535                 - [-   -    -]
# 5  Background short  Completed                   -   65535                 - [-   -    -]
# 6  Background short  Completed                   -   65535                 - [-   -    -]
# 7  Background short  Completed                   -   65535                 - [-   -    -]
# 8  Background short  Completed                   -   65535                 - [-   -    -]
# 9  Background short  Completed                   -   65535                 - [-   -    -]
#10  Background short  Completed                   -   65535                 - [-   -    -]
#11  Background short  Completed                   -   65535                 - [-   -    -]
#12  Background short  Completed                   -   65535                 - [-   -    -]
#13  Background short  Completed                   -   65535                 - [-   -    -]
#14  Background short  Completed                   -   65535                 - [-   -    -]
#15  Background short  Completed                   -   65535                 - [-   -    -]
#16  Background short  Completed                   -   65535                 - [-   -    -]
#17  Background short  Completed                   -   65535                 - [-   -    -]
#18  Background short  Completed                   -   65535                 - [-   -    -]
#19  Background short  Completed                   -   65535                 - [-   -    -]
#20  Background short  Completed                   -   65535                 - [-   -    -]

Long (extended) Self-test duration: 27600 seconds [460.0 minutes]

root@freenas[~]# smartctl /dev/da40 -a
smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p11 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               IBM-SSG
Product:              S7AQ3P0
Revision:             A058
Compliance:           SPC-4
User Capacity:        3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c50056669933
Serial number:        Z297A1VP00009339Z4S2
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Sep 14 21:16:48 2020 AEST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     39 C
Drive Trip Temperature:        65 C

Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   188487665        0         0  188487665          0     759299.408           0
write:         0        0         0         0          0      92571.794           0
verify: 3513884870        0         0  3513884870          2    2819465.820           1

Non-medium error count:       95

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   35608                 - [-   -    -]
# 2  Background short  Completed                   -   35584                 - [-   -    -]
# 3  Background short  Completed                   -   35559                 - [-   -    -]
# 4  Background short  Completed                   -   35546                 - [-   -    -]
# 5  Background short  Completed                   -   35523                 - [-   -    -]
# 6  Background short  Completed                   -   35522                 - [-   -    -]
# 7  Background short  Completed                   -   35498                 - [-   -    -]
# 8  Background short  Completed                   -   35474                 - [-   -    -]
# 9  Background short  Completed                   -   35449                 - [-   -    -]
#10  Background short  Completed                   -   35425                 - [-   -    -]
#11  Background short  Completed                   -   35401                 - [-   -    -]
#12  Background short  Completed                   -   35377                 - [-   -    -]
#13  Background short  Completed                   -   35353                 - [-   -    -]
#14  Background short  Completed                   -   35328                 - [-   -    -]
#15  Background short  Completed                   -   35304                 - [-   -    -]
#16  Background short  Completed                   -   35280                 - [-   -    -]
#17  Background short  Completed                   -   35257                 - [-   -    -]
#18  Background short  Completed                   -   35233                 - [-   -    -]
#19  Background short  Completed                   -   35209                 - [-   -    -]
#20  Background short  Completed                   -   35184                 - [-   -    -]

Long (extended) Self-test duration: 27600 seconds [460.0 minutes]


Does anyone have a suggestion on what might be causing the errors and/or what should be done about them?


I have also noticed the scrub seems to vary in speed quite a lot. It was running at:

scan: scrub in progress since Sun Sep 13 23:00:09 2020 39.0T scanned at 1.40G/s, 35.6T issued at 1.28G/s, 70.1T total 2M repaired, 50.81% done, 0 days 07:40:12 to go

But slowed down substantally:

scan: scrub in progress since Sun Sep 13 23:00:09 2020 39.0T scanned at 601M/s, 37.8T issued at 583M/s, 70.1T total 2M repaired, 53.95% done, 0 days 16:07:10 to go

And then seemed to be speeding up again

scan: scrub in progress since Sun Sep 13 23:00:09 2020 68.1T scanned at 919M/s, 39.4T issued at 531M/s, 70.1T total 3M repaired, 56.17% done, 0 days 16:50:09 to go

and now it seems to be showing it has scanned all 70.1T but is only 64.78% complete....

scan: scrub in progress since Sun Sep 13 23:00:09 2020 70.1T scanned at 906M/s, 45.4T issued at 587M/s, 70.1T total 3M repaired, 64.78% done, 0 days 12:14:51 to go

I assume the slowing down/speeding up is due to the sizes of the files being scrubbed and the load on the system but the numbers just feel a bit strange.


The only other issue i have had that may be related is some intermittent system hangs. When this occurs it seems to lock up the system at the console, crashes the WebUI and Plex transcoding freezes (but can restart) but SSH/SAMBA continue to work.

I get the following error in the message log:

Code:
Sep 13 02:44:46 freenas collectd[2072]: Traceback (most recent call last):
  File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 60, in read
    with Client() as c:
  File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 250, in __init__
    self._ws.connect()
  File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 93, in connect
    rv = super(WSClient, self).connect()
  File "/usr/local/lib/python3.7/site-packages/ws4py/client/__init__.py", line 223, in connect
    bytes = self.sock.recv(128)
socket.timeout: timed out
Sep 13 02:49:46 freenas collectd[2072]: Traceback (most recent call last):
  File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 60, in read
    with Client() as c:
  File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 250, in __init__
    self._ws.connect()
  File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 93, in connect
    rv = super(WSClient, self).connect()
  File "/usr/local/lib/python3.7/site-packages/ws4py/client/__init__.py", line 223, in connect
    bytes = self.sock.recv(128)
socket.timeout: timed out


I had a disk that was logging increasing SMART errors that i thought was causing it so i replaced that disk and it seemed to stop for a while but has now happened again:

Uptime.PNG

(Red=Hang Blue=Replace disk Green=Manual reboot)

Any help would be greatly appreciated!
 

83n

Cadet
Joined
Sep 14, 2020
Messages
3
After doing some more research and watching a great video I now understand why it is shows the pool fully scanned but still ~60% scrubbed. "Scanned" is looking for the data and orders it ready to be read/checksummed and "Issued" is how much has actually been checksummed.

As for the time taken it looks like the time estimates DrKK gives (1-2 hours per TB of used space) will be more time than mine is taking so nothing to complain about there...

scrub in progress since Sun Sep 13 23:00:09 2020 70.1T scanned at 802M/s, 64.2T issued at 734M/s, 70.1T total 3M repaired, 91.57% done, 0 days 02:20:38 to go

So the only issues are the read errors, why they are not showing up in the zpool status and the system hang.
 

83n

Cadet
Joined
Sep 14, 2020
Messages
3
So after the errors above i had a good look at the stats on the disks and found a few things...
Disk stats spreadsheet
  • SMART stats stop counting the time correctly in log reports at 65535 hours... Odd but not really an issue.
  • When i run smartctl -t short -C /dev/dax i get an error unsupported field in scsi command
    Code:
    smartctl -t short -C -r ioctl /dev/da17smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p11 amd64] (local build)
    Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
    
    [inquiry: 12 00 00 00 24 00 ]
    CAM status=0x1, SCSI status=0x0, resid=0x0
    status=0x0
    [inquiry: 12 01 00 00 fc 00 ]
    CAM status=0x1, SCSI status=0x0, resid=0xe3
    status=0x0
    [inquiry: 12 00 00 00 24 00 ]
    CAM status=0x1, SCSI status=0x0, resid=0x0
    status=0x0
    [send diagnostic: 1d a0 00 00 00 00 ]
    CAM status=0x8c, SCSI status=0x2, resid=0x0
    sense_len=0x20, sense_resid=0x0
    status=0x2: [desc] sense_key=0x5 asc=0x24 ascq=0x0
    Short foreground self test failed [unsupported field in scsi command]
    

    but if i run it in background the test will run and reports status and errors correctly in smartctl -l selftest /dev/dax
  • A lot of my disks have... non ideal.... stats.
So i am going to replace a heap of (~16) disks and have kicked off the first 4
Code:
zpool status
  pool: Data
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Sep 19 00:39:23 2020
        59.5T scanned at 1.49G/s, 31.0T issued at 1.46G/s, 70.3T total
        203G resilvered, 44.16% done, 0 days 07:40:00 to go
config:

        NAME                                              STATE     READ WRITE CKSUM
        Data                                              ONLINE       0     0     0
          raidz2-0                                        ONLINE       0     0     0
            gptid/8a1c1896-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/8abe167d-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/8db855ce-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/8d986eb0-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/8d8d8520-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/8e007041-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
          raidz2-1                                        ONLINE       0     0     0
            gptid/8af2eae2-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/8c272ae9-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/8e61292d-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/8efdc51b-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/8ec09204-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/8f1d7f99-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
          raidz2-2                                        ONLINE       0     0     0
            gptid/8ae937b6-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/8b961045-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/8dc37f9e-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/8f5d9220-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/95e13ddb-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/98634c94-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
          raidz2-3                                        ONLINE       0     0     0
            gptid/986ffa00-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/98cb54af-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/99991cfd-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/99c3ca51-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/9a0c41ac-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            replacing-5                                   ONLINE       0     0     0
              gptid/9b943f72-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
              gptid/ac587acf-f9bc-11ea-8564-0090fa96a2c8  ONLINE       0     0     0
          raidz2-4                                        ONLINE       0     0     0
            gptid/9a9a28fd-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/9b5a082f-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            replacing-2                                   ONLINE       0     0     0
              gptid/9c1316e1-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
              gptid/e8dfab76-f9bb-11ea-8564-0090fa96a2c8  ONLINE       0     0     0
            gptid/9d33e2ac-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/9cd956b2-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/9cf3e01b-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
          raidz2-5                                        ONLINE       0     0     0
            gptid/9d21a32e-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/9d63f987-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/a1a3dbf6-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            replacing-3                                   ONLINE       0     0     0
              gptid/a49ae247-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
              gptid/3e8fae4e-f9ba-11ea-8564-0090fa96a2c8  ONLINE       0     0     0
            gptid/a4b35860-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/96712300-eae3-11ea-ba6e-0090fa96a2c8    ONLINE       0     0     0
          raidz2-6                                        ONLINE       0     0     0
            gptid/a6910bd6-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/a714fce5-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            replacing-2                                   ONLINE       0     0     0
              gptid/a72a55bf-dfda-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0
              gptid/68727592-f9b9-11ea-8564-0090fa96a2c8  ONLINE       0     0     0
            gptid/a7e34b60-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/a8ccbb39-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/aad7536b-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
          raidz2-7                                        ONLINE       0     0     0
            gptid/ab1c7045-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/ab3fb99c-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/ab609dde-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/ab557025-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/ab8b1cc1-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0
            gptid/aba30cf0-dfda-11ea-a498-40f2e90ac8fc    ONLINE       0     0     0

errors: No known data errors

  pool: Services
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:15:26 with 0 errors on Sat Sep 12 23:15:28 2020
config:

        NAME                                          STATE     READ WRITE CKSUM
        Services                                      ONLINE       0     0     0
          gptid/56bdab3e-dfdb-11ea-a498-40f2e90ac8fc  ONLINE       0     0     0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:44 with 0 errors on Sun Sep 13 03:45:44 2020
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          da60p2    ONLINE       0     0     0

errors: No known data errors


I still don't understand a couple of things. Why the error in the top of my first post did not show up in the pool stats or generate alerts and what is causing the freeze. The server has not been up long enough to see if the hang/freeze happens again due to reboots with formatting etc. Any help with either of those issues would be great!
 
Top