markshelbyperry
Cadet
- Joined
- Nov 27, 2020
- Messages
- 3
TrueNAS (13.0) sends me the following critical error every day for several days now: "Device: /dev/da2, Read SMART Self-Test Log Failed. [date and time from that morning]". HOWEVER, the TrueNAS GUI and
System: 4x16TB + 4x8TB raidz1 SAS devs + 2 2x2TB special SATA vdevs | TrueNAS Core 13 | Xeon D-1541 | Supermicro X10SDV-TLN4F | 128GB DDR4 2300 ECC RAM | HDs connected to LSI 9400 16i and SSDs use onboard SATA| 450WPSU. The affected disk "/dev/da2" is an HGST enterprise model HUH728080AL4200
Smartctl -x results:
Note for context/history: I was plagued by read/write and data errors affecting various disks for a while, but I *may* have recently fixed them by changing out my HBA a week or so ago. I have scrubbed the pool and SMART tested all the disks with successful results.
The pool, which had faulted under the old controller, appeared to automatically import with the new controller fine with no new file corruption (The pool has a few permanent errors from an old fault, but they appear to just be performance log files:
Thanks in advance.
smartctl -x /dev/da2 appear to read the self-test log just fine and the disk appears to be OK. Does anyone know what this is about?System: 4x16TB + 4x8TB raidz1 SAS devs + 2 2x2TB special SATA vdevs | TrueNAS Core 13 | Xeon D-1541 | Supermicro X10SDV-TLN4F | 128GB DDR4 2300 ECC RAM | HDs connected to LSI 9400 16i and SSDs use onboard SATA| 450WPSU. The affected disk "/dev/da2" is an HGST enterprise model HUH728080AL4200
Smartctl -x results:
Code:
root@fileserver[~]# smartctl -x /dev/da2
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: HGST
Product: HUH728080AL4200
Revision: A7D8
Compliance: SPC-4
User Capacity: 8,001,563,222,016 bytes [8.00 TB]
Logical block size: 4096 bytes
LU is fully provisioned
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000cca23bc8a7c0
Serial number: 2EKKAYKV
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Sun Apr 30 14:40:42 2023 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
Read Cache is: Enabled
Writeback Cache is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature: 46 C
Drive Trip Temperature: 85 C
Manufactured in week 03 of year 2016
Specified cycle count over device lifetime: 50000
Accumulated start-stop cycles: 3720
Specified load-unload count over device lifetime: 600000
Accumulated load-unload cycles: 6117
Elements in grown defect list: 0
Vendor (Seagate Cache) information
Blocks sent to initiator = 20380896632766464
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 2 0 2 1979769 179502.542 0
write: 0 0 0 0 6196136 88111.822 0
verify: 0 0 0 0 253876 0.000 0
Non-medium error count: 0
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background short Completed - 51403 - [- - -]
# 2 Background short Completed - 51377 - [- - -]
# 3 Background long Completed - 51323 - [- - -]
# 4 Background long Self test in progress ... - NOW - [- - -]
# 5 Background short Self test in progress ... - NOW - [- - -]
# 6 Background long Completed - 50919 - [- - -]
# 7 Background short Completed - 50805 - [- - -]
# 8 Background short Completed - 50469 - [- - -]
# 9 Background short Completed - 50134 - [- - -]
#10 Background short Completed - 49798 - [- - -]
#11 Background short Completed - 49416 - [- - -]
#12 Background short Completed - 49080 - [- - -]
#13 Background long Completed - 48786 - [- - -]
#14 Background short Completed - 48674 - [- - -]
#15 Background short Completed - 48337 - [- - -]
#16 Background short Completed - 47953 - [- - -]
#17 Background short Completed - 47617 - [- - -]
#18 Background short Completed - 47222 - [- - -]
#19 Background short Completed - 46886 - [- - -]
#20 Background long Completed - 46592 - [- - -]
Long (extended) Self-test duration: 65535 seconds [1092.2 minutes]
Background scan results log
Status: scan is active
Accumulated power on time, hours:minutes 51409:46 [3084586 minutes]
Number of background scans performed: 323, scan progress: 0.00%
Number of background medium scans performed: 323
Protocol Specific port log page for SAS SSP
relative target port id = 1
generation code = 3
number of phys = 1
phy identifier = 0
attached device type: SAS or SATA device
attached reason: power on
reason: unknown
negotiated logical link rate: phy enabled; 12 Gbps
attached initiator port: ssp=1 stp=1 smp=1
attached target port: ssp=0 stp=0 smp=0
SAS address = 0x5000cca23bc8a7c1
attached SAS address = 0x500605b00fdeac65
attached phy identifier = 5
Invalid DWORD count = 0
Running disparity error count = 0
Loss of DWORD synchronization = 0
Phy reset problem = 0
Phy event descriptors:
Invalid word count: 0
Running disparity error count: 0
Loss of dword synchronization count: 0
Phy reset problem count: 0
relative target port id = 2
generation code = 3
number of phys = 1
phy identifier = 1
attached device type: no device attached
attached reason: unknown
reason: power on
negotiated logical link rate: phy enabled; unknown
attached initiator port: ssp=0 stp=0 smp=0
attached target port: ssp=0 stp=0 smp=0
SAS address = 0x5000cca23bc8a7c2
attached SAS address = 0x0
attached phy identifier = 0
Invalid DWORD count = 0
Running disparity error count = 0
Loss of DWORD synchronization = 0
Phy reset problem = 0
Note for context/history: I was plagued by read/write and data errors affecting various disks for a while, but I *may* have recently fixed them by changing out my HBA a week or so ago. I have scrubbed the pool and SMART tested all the disks with successful results.
The pool, which had faulted under the old controller, appeared to automatically import with the new controller fine with no new file corruption (The pool has a few permanent errors from an old fault, but they appear to just be performance log files:
Code:
errors: Permanent errors have been detected in the following files:
Pool1/.system/rrd-1e9984bcf13340bcb68bc263ecb0a902@auto-20221226.1800-3m:/localhost/disk-da4/disk_time.rrd
Pool1/.system/rrd-1e9984bcf13340bcb68bc263ecb0a902@auto-20221226.1800-3m:/localhost/disk-da5/disk_time.rrd
Pool1/.system/rrd-1e9984bcf13340bcb68bc263ecb0a902@auto-20221226.1800-3m:/localhost/ctl-tpc/disk_time.rrd
Pool1/.system/rrd-1e9984bcf13340bcb68bc263ecb0a902@auto-20221107.0400-3m:/localhost/memory/memory-inactive.rrd
Pool1/.system/rrd-1e9984bcf13340bcb68bc263ecb0a902@auto-20221226.1830-3m:/localhost/geom_stat/geom_latency-ada1.rrd
Pool1/.system/rrd-1e9984bcf13340bcb68bc263ecb0a902@auto-20221226.1830-3m:/localhost/geom_statThanks in advance.