Hi everyone,
I've got a Dell R730xd installed with TrueNAS-12.0-U8 and I am seeing some really weird SMART-errors here that I am unable to understand.
The TrueNAS is installed on a boot-drive which consists of 2x600Gb 10K SAS-drives that are configured in a RAID1 with the built in Perc H730 mini.
There are additional 15x1.8Tb 10K SAS-drives and 3x200Gb SSD-drives which are configured in "passthrough" mode so that the disks are presented directly in the TrueNAS gui under Storage -> Disks.
Ive then created a 3 vdevs with 5 drives in each vdev and the pool itself is configured in RAIDZ2, I then have 2 of the SSD disks configured as a mirrored slog and 1 of the SSD disks configured as a l2arc. So far so good.
After running a couple of days I am getting alerts from this machine that there are SMART failures with the code "ascq=0x9c" so I tried googling it, didnt really find anything usefull so I went a head and upgraded the firmware (Latest version PER730XD_BOOTABLE_21.12.00.24) and then after some more days I am getting more alerts about more disks failing.
TrueNAS @ hby-san3
New alerts:
* Device: /dev/da17, Self-Test Log error count increased from 0 to 1.
Current alerts:
* Device: /dev/da17, SMART Failure: WARNING: ascq=0x9c.
* Device: /dev/da12, SMART Failure: FIRMWARE IMPENDING FAILURE DATA ERROR RATE TOO HIGH.
* Device: /dev/da11, SMART Failure: WARNING: ascq=0x9c.
* Device: /dev/da10, SMART Failure: WARNING: ascq=0x9c.
* Device: /dev/da9, SMART Failure: WARNING: ascq=0x9c.
* Device: /dev/da5, SMART Failure: WARNING: ascq=0x9c.
* Device: /dev/da4, SMART Failure: WARNING: ascq=0x9c.
* Device: /dev/da17, Self-Test Log error count increased from 0 to 1.
I've checked the iDRAC and it is only complaining about 1 of the disks, da11, the rest is fine according to iDRAC.
I then tried doing a SMART long test on da17 and I got an error that it didnt go that well, this is the output of smartctrl -a
root@hby-san3[/]# smartctl -a /dev/da17
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: HGST
Product: HUC101818CS4204
Revision: FK45
Compliance: SPC-4
User Capacity: 1,800,360,124,416 bytes [1.80 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
Formatted with type 2 protection
8 bytes of protection information per logical block
LU is fully provisioned
Rotation Rate: 10000 rpm
Form Factor: 2.5 inches
Logical Unit id: 0x5000cca02c17a900
Serial number: 08GE0BTA
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Mon Apr 11 10:27:00 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Disabled or Not Supported
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature: 39 C
Drive Trip Temperature: 50 C
Accumulated power on time, hours:minutes 52904:48
Manufactured in week 50 of year 2015
Specified cycle count over device lifetime: 50000
Accumulated start-stop cycles: 2022
Specified load-unload count over device lifetime: 600000
Accumulated load-unload cycles: 4216
Elements in grown defect list: 0
Vendor (Seagate Cache) information
Blocks sent to initiator = 143340003655680000
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 3797 0 40473 35022226 224017.218 2
write: 0 0 0 0 3211588 382554.413 0
verify: 0 0 0 0 3563076 0.684 0
Non-medium error count: 1
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Failed in segment --> 100 52904 4294967295 [- - -]
# 2 Background short Completed 80 4 - [- - -]
# 3 Reserved(7) Completed 64 4 - [- - -]
Long (extended) Self-test duration: 11697 seconds [194.9 minutes]
I cant really tell if these errors are for real or not, so I am worried about putting any data on the pool for now.
Any pointers or suggestions would be really appreciated.
Many thanks!
I've got a Dell R730xd installed with TrueNAS-12.0-U8 and I am seeing some really weird SMART-errors here that I am unable to understand.
The TrueNAS is installed on a boot-drive which consists of 2x600Gb 10K SAS-drives that are configured in a RAID1 with the built in Perc H730 mini.
There are additional 15x1.8Tb 10K SAS-drives and 3x200Gb SSD-drives which are configured in "passthrough" mode so that the disks are presented directly in the TrueNAS gui under Storage -> Disks.
Ive then created a 3 vdevs with 5 drives in each vdev and the pool itself is configured in RAIDZ2, I then have 2 of the SSD disks configured as a mirrored slog and 1 of the SSD disks configured as a l2arc. So far so good.
After running a couple of days I am getting alerts from this machine that there are SMART failures with the code "ascq=0x9c" so I tried googling it, didnt really find anything usefull so I went a head and upgraded the firmware (Latest version PER730XD_BOOTABLE_21.12.00.24) and then after some more days I am getting more alerts about more disks failing.
TrueNAS @ hby-san3
New alerts:
* Device: /dev/da17, Self-Test Log error count increased from 0 to 1.
Current alerts:
* Device: /dev/da17, SMART Failure: WARNING: ascq=0x9c.
* Device: /dev/da12, SMART Failure: FIRMWARE IMPENDING FAILURE DATA ERROR RATE TOO HIGH.
* Device: /dev/da11, SMART Failure: WARNING: ascq=0x9c.
* Device: /dev/da10, SMART Failure: WARNING: ascq=0x9c.
* Device: /dev/da9, SMART Failure: WARNING: ascq=0x9c.
* Device: /dev/da5, SMART Failure: WARNING: ascq=0x9c.
* Device: /dev/da4, SMART Failure: WARNING: ascq=0x9c.
* Device: /dev/da17, Self-Test Log error count increased from 0 to 1.
I've checked the iDRAC and it is only complaining about 1 of the disks, da11, the rest is fine according to iDRAC.
I then tried doing a SMART long test on da17 and I got an error that it didnt go that well, this is the output of smartctrl -a
root@hby-san3[/]# smartctl -a /dev/da17
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: HGST
Product: HUC101818CS4204
Revision: FK45
Compliance: SPC-4
User Capacity: 1,800,360,124,416 bytes [1.80 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
Formatted with type 2 protection
8 bytes of protection information per logical block
LU is fully provisioned
Rotation Rate: 10000 rpm
Form Factor: 2.5 inches
Logical Unit id: 0x5000cca02c17a900
Serial number: 08GE0BTA
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Mon Apr 11 10:27:00 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Disabled or Not Supported
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature: 39 C
Drive Trip Temperature: 50 C
Accumulated power on time, hours:minutes 52904:48
Manufactured in week 50 of year 2015
Specified cycle count over device lifetime: 50000
Accumulated start-stop cycles: 2022
Specified load-unload count over device lifetime: 600000
Accumulated load-unload cycles: 4216
Elements in grown defect list: 0
Vendor (Seagate Cache) information
Blocks sent to initiator = 143340003655680000
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 3797 0 40473 35022226 224017.218 2
write: 0 0 0 0 3211588 382554.413 0
verify: 0 0 0 0 3563076 0.684 0
Non-medium error count: 1
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Failed in segment --> 100 52904 4294967295 [- - -]
# 2 Background short Completed 80 4 - [- - -]
# 3 Reserved(7) Completed 64 4 - [- - -]
Long (extended) Self-test duration: 11697 seconds [194.9 minutes]
I cant really tell if these errors are for real or not, so I am worried about putting any data on the pool for now.
Any pointers or suggestions would be really appreciated.
Many thanks!