Read SMART Self-Test Log Failed (many such errors)

Rabinovitch

Dabbler
Joined
Apr 3, 2021
Messages
43
After upgrading to TrueNAS-12.0-U8 errors of this type began to appear:

1658743665793.png


The disks themselves seem to be OK (well, or at least being polled):

Code:
root@lenstor2[~]# smartctl -a /dev/da143
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org


=== START OF INFORMATION SECTION ===
Vendor:               WDC
Product:              WUH721818AL5204
Revision:             C232
Compliance:           SPC-5
User Capacity:        18,000,207,937,536 bytes [18.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca2c20b0384
Serial number:        3FG61SYT
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Jul 25 13:09:31 2022 MSK
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled


=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK


Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     37 C
Drive Trip Temperature:        85 C


Accumulated power on time, hours:minutes 8129:08
Manufactured in week 23 of year 2021
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  26
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  360
Elements in grown defect list: 0


Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0     4789         0      4789     488513     141804.586           0
write:         0        0         0         0      63998      58388.294           0
verify:        0     7964         0      7964     234339          0.000           0


Non-medium error count:        0


SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -    5834                 - [-   -    -]


Long (extended) Self-test duration: 65535 seconds [1092.2 minutes]


root@lenstor2[~]# smartctl -a /dev/da61
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org


=== START OF INFORMATION SECTION ===
Vendor:               WDC
Product:              WUH721818AL5204
Revision:             C232
Compliance:           SPC-5
User Capacity:        18,000,207,937,536 bytes [18.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca2c27bd190
Serial number:        3FJ62Z4T
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Jul 25 13:11:21 2022 MSK
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled


=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK


Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     42 C
Drive Trip Temperature:        85 C


Accumulated power on time, hours:minutes 4125:12
Manufactured in week 35 of year 2021
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  12
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  182
Elements in grown defect list: 0


Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0     100815      70968.100           0
write:         0        0         0         0      11282      18034.306           0
verify:        0        0         0         0        302          0.000           0


Non-medium error count:        0


SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -    1830                 - [-   -    -]


Long (extended) Self-test duration: 65535 seconds [1092.2 minutes]


Please tell me what it is and whether it is necessary to deal with it?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Based on all the read errors needing correction on da143, it may be time to consider what your warranty period is and return it if possible. I would no longer be trusting that disk.

You can see da61 isn't suffering from the same issue with all zeros in the errors needing correction.
 

Rabinovitch

Dabbler
Joined
Apr 3, 2021
Messages
43
Yes, thanks, but why "Read SMART Self-Test Log Failed" if we can perfectly watch SMART data anytime?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
why "Read SMART Self-Test Log Failed" if we can perfectly watch SMART data anytime?
Don't know, but my guess would be either that you don't specify that those are SAS disks in the disk settings and/or that the presence of errors/output isn't in a format that the script that assesses them is expecting.
 

Rabinovitch

Dabbler
Joined
Apr 3, 2021
Messages
43
If possible, please tell me how I should indicate that this is a SAS disk? Just like in the screenshot? Or with -d scsi?
1674569547312.png
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
If you're getting output from smartctl without any switches, there's no need to have anything in the extra options field.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Try smartctl -x /dev/xxx

Also you should note that the drives have not had a SMART test run on them in thousands of hours. You should run a SMART long test and see what the results are again after that.

After upgrading to TrueNAS-12.0-U8 errors of this type began to appear:
If you did not update your ZFS features then you should be able to roll back to the previous version. BTW, what was the previous version? Why do I ask, because I believe you may be running an HBA that might have old firmware and is not compatible with Version 12 of FreeNAS. It's just a guess based on practically no information provided.

Now I need to say it... Please read the forum rules (located at the top of each page in red font) and post your system specs. We are only as good as the data provided and we actually do like to help out.
 

Rabinovitch

Dabbler
Joined
Apr 3, 2021
Messages
43
post your system specs
  • Motherboard make and model
    Some OEM motherboard as part of Lenovo P620 workstation (Version: SDK0T08861)
  • CPU make and model
    AMD Ryzen Threadripper PRO 3945WX 12-Cores
  • RAM quantity
    128 DDR4-3200
  • Hard drives, quantity, model numbers, and RAID configuration, including boot drives
    WDC PC SN730 SDBQ as boot drive
    vdevs of raidz and raidz2 type consisting of TOSHIBA MG08ACA16TE (SATA), WDC WUH721818ALE6L4 (SATA) and WUH721818AL5204 (SAS) HDDs; Each zpool consists, of course, of disks of the same type.
    HDDs are running in AIC XJ1-41081-02 and AIC XJ1-40781-02 JBODs.
  • Hard disk controllers
    HBA 9405W-16e, FW version 19.00.00.00
  • Network cards
    Intel Corporation Ethernet Controller 2*10G X550T
BTW now TrueNAS is TrueNAS-13.0-U3.1
 
Last edited:

Rabinovitch

Dabbler
Joined
Apr 3, 2021
Messages
43
For each disk that appeared in such an error message, the output of the smartctl -a command is perfectly displayed. Furthermore, even smartctl -d test /dev/daXY unmistakably identifies the device type as scsi:

root@lenstor2[~]# smartctl -d test /dev/da307
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p2 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/da307: Device of type 'scsi' [SCSI] detected
/dev/da307: Device of type 'scsi' [SCSI] opened
root@lenstor2[~]#
 
Top