Erroneous (?) "Read SMART Self-Test Log Failed" Error

Joined
Nov 27, 2020
Messages
3
TrueNAS (13.0) sends me the following critical error every day for several days now: "Device: /dev/da2, Read SMART Self-Test Log Failed. [date and time from that morning]". HOWEVER, the TrueNAS GUI and smartctl -x /dev/da2 appear to read the self-test log just fine and the disk appears to be OK. Does anyone know what this is about?

System: 4x16TB + 4x8TB raidz1 SAS devs + 2 2x2TB special SATA vdevs | TrueNAS Core 13 | Xeon D-1541 | Supermicro X10SDV-TLN4F | 128GB DDR4 2300 ECC RAM | HDs connected to LSI 9400 16i and SSDs use onboard SATA| 450WPSU. The affected disk "/dev/da2" is an HGST enterprise model HUH728080AL4200

Smartctl -x results:
Code:
root@fileserver[~]# smartctl -x /dev/da2
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              HUH728080AL4200
Revision:             A7D8
Compliance:           SPC-4
User Capacity:        8,001,563,222,016 bytes [8.00 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca23bc8a7c0
Serial number:        2EKKAYKV
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Sun Apr 30 14:40:42 2023 EDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled
Read Cache is:        Enabled
Writeback Cache is:   Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     46 C
Drive Trip Temperature:        85 C

Manufactured in week 03 of year 2016
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  3720
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  6117
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 20380896632766464

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        2         0         2    1979769     179502.542  0
write:         0        0         0         0    6196136      88111.822  0
verify:        0        0         0         0     253876          0.000  0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   51403                 - [-   -    -]
# 2  Background short  Completed                   -   51377                 - [-   -    -]
# 3  Background long   Completed                   -   51323                 - [-   -    -]
# 4  Background long   Self test in progress ...   -     NOW                 - [-   -    -]
# 5  Background short  Self test in progress ...   -     NOW                 - [-   -    -]
# 6  Background long   Completed                   -   50919                 - [-   -    -]
# 7  Background short  Completed                   -   50805                 - [-   -    -]
# 8  Background short  Completed                   -   50469                 - [-   -    -]
# 9  Background short  Completed                   -   50134                 - [-   -    -]
#10  Background short  Completed                   -   49798                 - [-   -    -]
#11  Background short  Completed                   -   49416                 - [-   -    -]
#12  Background short  Completed                   -   49080                 - [-   -    -]
#13  Background long   Completed                   -   48786                 - [-   -    -]
#14  Background short  Completed                   -   48674                 - [-   -    -]
#15  Background short  Completed                   -   48337                 - [-   -    -]
#16  Background short  Completed                   -   47953                 - [-   -    -]
#17  Background short  Completed                   -   47617                 - [-   -    -]
#18  Background short  Completed                   -   47222                 - [-   -    -]
#19  Background short  Completed                   -   46886                 - [-   -    -]
#20  Background long   Completed                   -   46592                 - [-   -    -]

Long (extended) Self-test duration: 65535 seconds [1092.2 minutes]

Background scan results log
  Status: scan is active
    Accumulated power on time, hours:minutes 51409:46 [3084586 minutes]
    Number of background scans performed: 323,  scan progress: 0.00%
    Number of background medium scans performed: 323

Protocol Specific port log page for SAS SSP
relative target port id = 1
  generation code = 3
  number of phys = 1
  phy identifier = 0
    attached device type: SAS or SATA device
    attached reason: power on
    reason: unknown
    negotiated logical link rate: phy enabled; 12 Gbps
    attached initiator port: ssp=1 stp=1 smp=1
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000cca23bc8a7c1
    attached SAS address = 0x500605b00fdeac65
    attached phy identifier = 5
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0
    Phy event descriptors:
     Invalid word count: 0
     Running disparity error count: 0
     Loss of dword synchronization count: 0
     Phy reset problem count: 0
relative target port id = 2
  generation code = 3
  number of phys = 1
  phy identifier = 1
    attached device type: no device attached
    attached reason: unknown
    reason: power on
    negotiated logical link rate: phy enabled; unknown
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000cca23bc8a7c2
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0


Note for context/history: I was plagued by read/write and data errors affecting various disks for a while, but I *may* have recently fixed them by changing out my HBA a week or so ago. I have scrubbed the pool and SMART tested all the disks with successful results.

The pool, which had faulted under the old controller, appeared to automatically import with the new controller fine with no new file corruption (The pool has a few permanent errors from an old fault, but they appear to just be performance log files:

Code:
errors: Permanent errors have been detected in the following files:

        Pool1/.system/rrd-1e9984bcf13340bcb68bc263ecb0a902@auto-20221226.1800-3m:/localhost/disk-da4/disk_time.rrd
        Pool1/.system/rrd-1e9984bcf13340bcb68bc263ecb0a902@auto-20221226.1800-3m:/localhost/disk-da5/disk_time.rrd
        Pool1/.system/rrd-1e9984bcf13340bcb68bc263ecb0a902@auto-20221226.1800-3m:/localhost/ctl-tpc/disk_time.rrd
        Pool1/.system/rrd-1e9984bcf13340bcb68bc263ecb0a902@auto-20221107.0400-3m:/localhost/memory/memory-inactive.rrd
        Pool1/.system/rrd-1e9984bcf13340bcb68bc263ecb0a902@auto-20221226.1830-3m:/localhost/geom_stat/geom_latency-ada1.rrd
        Pool1/.system/rrd-1e9984bcf13340bcb68bc263ecb0a902@auto-20221226.1830-3m:/localhost/geom_stat


Thanks in advance.
 

jenksdrummer

Patron
Joined
Jun 7, 2011
Messages
250
I have had the same issue with WD Red SSDs whenever using the onboard Intel Chipset SATA connectors. No issues really noted; SMART tests pass with flying colors, Scrubs never show any errors. I figure it's a bug, not sure when it showed up but I recall it was happening pre-13U4 as well.
 
Top