SOLVED Entire Pool of 6 SSDs Reporting Errors Following In-Place Upgrade from Core to Scale

dirtycamacho · Jun 8, 2023

Hi all,

First, I apologize if there is an existing thread documenting this issue, but I could not find anything quite related. Full system specs are included at the end of this post, but some high-level information to set the stage:
My TrueNAS system is comprised of 3 pools:

freenas-boot - System Dataset Pool, comprised of 2x PNY CS900 120GB SATA SSDs
user-ssd-pool - Used for hosting network shares, comprised of 8x HGST Ultrastar HSCAC2DA2SUN1.6T 1.6TB SAS SSDs (These drives have DIF enabled which I know I need to correct, but this was an issue that was only made prevalent after the upgrade from Core to Scale and not the main focus of the issue at hand; they're all functioning fine)
vm-ssd-pool - Used for iSCSI shared storage for a virtualization cluster, comprised of 6x Intel SSDSC2BB600G4 600GB SATA SSDs

The two drives from the freenas-boot pool are connected directly to the motherboard's SATA ports, and all the other drives are connected to a single LSI SAS9300-8i HBA (IT Mode) via a SAS expander (the server has 24 hard drive bays and the backplane and expander to support it).

After using FreeNAS and TrueNAS Core for the past 4 years I finally made the switch to Scale about 2 weeks ago in hopes of taking advantage of Linux containers. The upgrade seemed to go well, I tested everything and was back up and running in under 30 minutes.

The following morning I woke up to an alert that one of the Intel SSDs from the vm-ssd-pool was in a faulted state due to too many r/w errors. Sure enough, there were a combined 15 r/w errors on one drive. I have a spare, but performed all the testing and troubleshooting I could first so as to not replace a perfectly fine drive. I ran a SMART short and long test on the faulted drive and, although I am not the best at reading the results, nothing stood out to me as being indicative of a failure or even pre-failure. I ran a scrub on the pool and there were no errors returned whatsoever. So I made a note to keep an eye on that drive, performed a zpool clear, and went on with my day.

Later that same day I received an alert that another of the Intel SSDs from the vm-ssd-pool had been marked with a degraded state due to r/w errors. Followed the same process with the same result.

Over the next few days I continued to receive these errors from each drive in the vm-ssd-pool, but after the first day I stopped clearing the pool. On Wednesday (yesterday) morning I shut down the server, removed the drives, dusted out the system, cleaned contact points, and reseated the drives. At first all seemed well, but this morning I woke up to the same thing. Now all drives have r/w errors.

I'm trying to figure out what could be going on here. The SSDs from the user-ssd-pool, attached to the same backplane and HBA, are working just fine and producing no errors, and the same can be said of the SSDs from the freenas-boot pool (though they are directly connected to the motherboard, so not a direct comparison). AND all of the errored drives have been in use in this system for over 2 years without error until the upgrade from Core to Scale. I can see a drive or two starting to fail, but I find it very hard to believe that all 6 drives are failing at the same time when they were humming along happily before the upgrade.

Any and all assistance provided is greatly appreciated. I just need to get to the bottom of this before it becomes a larger issue.

I'll reach the character limit by posting the output from smartctl on each drive, so the outputs will be in the first reply to this post.

Case - Supermicro C216
Motherboard - Supermicro X11SS
CPU - Intel Xeon E3-1260L v5 @ 2.90GHz
RAM - 2x Crucial 16GB DDR4 2666MHZ ECC Memory CT16G4XFD8266
Hard drives - 2x PNY CS900 120GB SATA SSDs (ZFS Mirror), 8x HGST Ultrastar HSCAC2DA2SUN1.6T 1.6TB SAS SSDs (4x Mirrored vdevs creating 'RAID10'), 6x Intel SSDSC2BB600G4 600GB SATA SSDs (3x Mirrored vdevs creating 'RAID10')
Controller - Onboard SATA for boot pool, AVAGO/LSI SAS9300-8i HBA (IT Mode) connected to 24-port backplane with SAS expander via 2x SFF-8643 cables for all other drives in the system
Network Card - Chelsio T520-SO 2-Port 10G SFP+ (connected via DAC cable)

zpool status vm-ssd-pool
pool: vm-ssd-pool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 00:19:34 with 0 errors on Thu Jun 8 21:27:43 2023
config:

NAME STATE READ WRITE CKSUM
vm-ssd-pool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
507ef7fa-7c25-11ea-87c8-a0369f202eac ONLINE 0 3 0
50a0fced-7c25-11ea-87c8-a0369f202eac ONLINE 4 5 0
mirror-1 ONLINE 0 0 0
5085030a-7c25-11ea-87c8-a0369f202eac ONLINE 6 2 0
509c0543-7c25-11ea-87c8-a0369f202eac ONLINE 2 1 0
mirror-2 ONLINE 0 0 0
41490268-70ad-11eb-9419-00074339f020 ONLINE 0 7 0
414e033c-70ad-11eb-9419-00074339f020 ONLINE 1 8 0

errors: No known data errors

dirtycamacho · Jun 8, 2023

Oh! Duh, and my software version: TrueNAS-SCALE-22.12.2 (Bluefin Release)

Output from smartctl on each errored drive:

smartctl -a /dev/sdf
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Intel 730 and DC S35x0/3610/3700 Series SSDs
Device Model: INTEL SSDSC2BB600G4
Serial Number: BTWL345206B6600TGN
LU WWN Device Id: 5 5cd2e4 04b522a27
Firmware Version: D2010355
User Capacity: 600,127,266,816 bytes [600 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Jun 8 22:40:54 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x79) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 098 098 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 68127
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 41
170 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
174 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 18
175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 622 (395 2349)
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Temperature_Case 0x0022 073 065 000 Old_age Always - 27 (Min/Max 21/35)
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 18
194 Temperature_Internal 0x0022 100 100 000 Old_age Always - 27
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 1512042
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 0
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 74
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 2349
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 095 095 000 Old_age Always - 0
234 Thermal_Throttle 0x0032 100 100 000 Old_age Always - 0/0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 1512042
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 868997

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl -a /dev/sdi
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Intel 730 and DC S35x0/3610/3700 Series SSDs
Device Model: INTEL SSDSC2BB600G4
Serial Number: BTWL416302ME600TGN
LU WWN Device Id: 5 5cd2e4 04b61d693
Firmware Version: D2010370
User Capacity: 600,127,266,816 bytes [600 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Jun 8 22:43:18 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x79) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 59309
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 41
170 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
174 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 19
175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 628 (343 2352)
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Temperature_Case 0x0022 073 067 000 Old_age Always - 27 (Min/Max 20/33)
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 19
194 Temperature_Internal 0x0022 100 100 000 Old_age Always - 38
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 2247136
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 10137
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 71
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 3558483
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 091 091 000 Old_age Always - 0
234 Thermal_Throttle 0x0032 100 100 000 Old_age Always - 0/0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 2247136
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 5505151

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl -a /dev/sdj
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Intel 730 and DC S35x0/3610/3700 Series SSDs
Device Model: INTEL SSDSC2BB600G4
Serial Number: BTWL345205LB600TGN
LU WWN Device Id: 5 5cd2e4 04b52274d
Firmware Version: D2010355
User Capacity: 600,127,266,816 bytes [600 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Jun 8 22:44:55 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x79) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 098 098 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 67918
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 60
170 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
174 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 34
175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 629 (393 2353)
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 2
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Temperature_Case 0x0022 072 065 000 Old_age Always - 28 (Min/Max 21/35)
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 34
194 Temperature_Internal 0x0022 100 100 000 Old_age Always - 28
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 5756349
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 31
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 74
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 2353
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 085 085 000 Old_age Always - 0
234 Thermal_Throttle 0x0032 100 100 000 Old_age Always - 0/0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 5756349
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 4695634

SMART Error Log Version: 1
ATA Error Count: 2
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 41589 hours (1732 days + 21 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
12 01 00 00 00 00 a0

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ec 00 00 00 00 00 a0 08 00:00:07.149 IDENTIFY DEVICE
ec 00 00 00 00 00 a0 08 00:00:01.866 IDENTIFY DEVICE

Error 1 occurred at disk power-on lifetime: 41589 hours (1732 days + 21 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
12 01 00 00 00 00 a0

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ec 00 00 00 00 00 a0 08 00:00:01.866 IDENTIFY DEVICE

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl -a /dev/sdk
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Intel 730 and DC S35x0/3610/3700 Series SSDs
Device Model: INTEL SSDSC2BB600G4
Serial Number: BTWL42250116600TGN
LU WWN Device Id: 5 5cd2e4 04b65503f
Firmware Version: D2010370
User Capacity: 600,127,266,816 bytes [600 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Jun 8 22:47:22 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x79) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 48890
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 51
170 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
174 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 27
175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 645 (279 2355)
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Temperature_Case 0x0022 072 065 000 Old_age Always - 28 (Min/Max 20/35)
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 27
194 Temperature_Internal 0x0022 100 100 000 Old_age Always - 39
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 6761160
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 39372
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 27
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 2933348
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 062 062 000 Old_age Always - 0
234 Thermal_Throttle 0x0032 100 100 000 Old_age Always - 0/0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 6761160
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 2501937

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl -a /dev/sdm
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Intel 730 and DC S35x0/3610/3700 Series SSDs
Device Model: INTEL SSDSC2BB600G4
Serial Number: BTWL345203NR600TGN
LU WWN Device Id: 5 5cd2e4 04b521f86
Firmware Version: D2010355
User Capacity: 600,127,266,816 bytes [600 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Jun 8 22:48:13 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 2) seconds.
Offline data collection
capabilities: (0x79) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 098 098 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 75621
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 22
170 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
174 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 7
175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 637 (444 2356)
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Temperature_Case 0x0022 072 067 000 Old_age Always - 28 (Min/Max 21/34)
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 7
194 Temperature_Internal 0x0022 100 100 000 Old_age Always - 28
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 2419971
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 41
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 66
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 2356
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 088 088 000 Old_age Always - 0
234 Thermal_Throttle 0x0032 100 100 000 Old_age Always - 0/0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 2419971
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 6686914

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 55518 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl -a /dev/sdo
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Intel 730 and DC S35x0/3610/3700 Series SSDs
Device Model: INTEL SSDSC2BB600G4
Serial Number: BTWL34520470600TGN
LU WWN Device Id: 5 5cd2e4 04b52218f
Firmware Version: D2010355
User Capacity: 600,127,266,816 bytes [600 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Jun 8 22:48:45 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 2) seconds.
Offline data collection
capabilities: (0x79) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 098 098 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 75603
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 26
170 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
174 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 10
175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 650 (443 2357)
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Temperature_Case 0x0022 072 067 000 Old_age Always - 28 (Min/Max 21/34)
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 10
194 Temperature_Internal 0x0022 100 100 000 Old_age Always - 28
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 2025105
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 30
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 66
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 2357
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 088 088 000 Old_age Always - 0
234 Thermal_Throttle 0x0032 100 100 000 Old_age Always - 0/0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 2025105
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 6267845

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 9864 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

sretalla · Jun 9, 2023

Most of those drives are around 8 years old.

Not necessarily a reason for them to fail, but they are pretty old.

With ~70 - 210 TBW, I don't think you're even beyond the original warranty of TBW, but have obviously gone past the years already, so too late.

dirtycamacho said:
LSI SAS9300-8i HBA

What firmware are you running?

I think the custom fix iX had made available related to SSDs... maybe the Linux driver is exposing that more than FreeBSD did.

LSI 9300-xx Firmware Update

Hey Community, If you are using an LSI 9300 HBA with FreeNAS or the soon-to-be TrueNAS CORE, you may experience some performance issues causing the controller to reset when using SATA HDDs. After working with Broadcom, we’ve come up with a...

www.truenas.com

Etorix · Jun 9, 2023

Long SMART tests on all SSDs would be in order.

If it's not the firmware, it may be a Linux-specific issue—in which case reverting to CORE would solve it.
Else, it could be that the batch of Intel drives is reaching end-of-life.

dirtycamacho · Jun 9, 2023

What firmware are you running?

@sretalla - Hoping this is what we're looking for. If not, do you have recommendations on how to find the firmware version?

sas3flash -listall
Avago Technologies SAS3 Flash Utility
Version 16.00.00.00 (2017.05.02)
Copyright 2008-2017 Avago Technologies. All rights reserved.

Adapter Selected is a Avago SAS: SAS3008(C0)

Num Ctlr FW Ver NVDATA x86-BIOS PCI Addr
----------------------------------------------------------------------------

0 SAS3008(C0) 09.00.00.00 09.00.00.07 08.21.00.00 00:01:00:00

Finished Processing Commands Successfully.
Exiting SAS3Flash.

Long SMART tests on all SSDs would be in order.

@Etorix - Starting long tests on each drive (one at a time) and will report back.

dirtycamacho · Jun 9, 2023

@Etorix - Updates post-long test on each drive:

smartctl -a /dev/sdf
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Intel 730 and DC S35x0/3610/3700 Series SSDs
Device Model: INTEL SSDSC2BB600G4
Serial Number: BTWL345206B6600TGN
LU WWN Device Id: 5 5cd2e4 04b522a27
Firmware Version: D2010355
User Capacity: 600,127,266,816 bytes [600 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jun 9 09:24:05 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 90) seconds.
Offline data collection
capabilities: (0x79) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 098 098 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 68138
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 41
170 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
174 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 18
175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 622 (395 2989)
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Temperature_Case 0x0022 074 065 000 Old_age Always - 26 (Min/Max 21/35)
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 18
194 Temperature_Internal 0x0022 100 100 000 Old_age Always - 26
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 1512464
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 31
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 86
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 2989
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 095 095 000 Old_age Always - 0
234 Thermal_Throttle 0x0032 100 100 000 Old_age Always - 0/0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 1512464
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 886942

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 2601 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl -a /dev/sdi
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Intel 730 and DC S35x0/3610/3700 Series SSDs
Device Model: INTEL SSDSC2BB600G4
Serial Number: BTWL416302ME600TGN
LU WWN Device Id: 5 5cd2e4 04b61d693
Firmware Version: D2010370
User Capacity: 600,127,266,816 bytes [600 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jun 9 09:26:35 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 90) seconds.
Offline data collection
capabilities: (0x79) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 59319
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 41
170 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
174 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 19
175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 628 (343 2992)
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Temperature_Case 0x0022 075 067 000 Old_age Always - 25 (Min/Max 20/33)
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 19
194 Temperature_Internal 0x0022 100 100 000 Old_age Always - 36
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 2247558
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 10137
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 71
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 3559123
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 091 091 000 Old_age Always - 0
234 Thermal_Throttle 0x0032 100 100 000 Old_age Always - 0/0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 2247558
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 5523096

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 59319 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl -a /dev/sdj
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Intel 730 and DC S35x0/3610/3700 Series SSDs
Device Model: INTEL SSDSC2BB600G4
Serial Number: BTWL345205LB600TGN
LU WWN Device Id: 5 5cd2e4 04b52274d
Firmware Version: D2010355
User Capacity: 600,127,266,816 bytes [600 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jun 9 09:28:28 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 90) seconds.
Offline data collection
capabilities: (0x79) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 098 098 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 67929
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 60
170 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
174 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 34
175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 629 (393 2994)
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 2
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Temperature_Case 0x0022 073 065 000 Old_age Always - 27 (Min/Max 21/35)
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 34
194 Temperature_Internal 0x0022 100 100 000 Old_age Always - 27
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 5756769
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 31
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 86
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 2994
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 085 085 000 Old_age Always - 0
234 Thermal_Throttle 0x0032 100 100 000 Old_age Always - 0/0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 5756769
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 4713581

SMART Error Log Version: 1
ATA Error Count: 2
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 41589 hours (1732 days + 21 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
12 01 00 00 00 00 a0

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ec 00 00 00 00 00 a0 08 00:00:07.149 IDENTIFY DEVICE
ec 00 00 00 00 00 a0 08 00:00:01.866 IDENTIFY DEVICE

Error 1 occurred at disk power-on lifetime: 41589 hours (1732 days + 21 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
12 01 00 00 00 00 a0

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ec 00 00 00 00 00 a0 08 00:00:01.866 IDENTIFY DEVICE

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 2392 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl -a /dev/sdk
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Intel 730 and DC S35x0/3610/3700 Series SSDs
Device Model: INTEL SSDSC2BB600G4
Serial Number: BTWL42250116600TGN
LU WWN Device Id: 5 5cd2e4 04b65503f
Firmware Version: D2010370
User Capacity: 600,127,266,816 bytes [600 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jun 9 09:29:49 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 90) seconds.
Offline data collection
capabilities: (0x79) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 48900
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 51
170 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
174 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 27
175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 645 (279 2995)
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Temperature_Case 0x0022 072 065 000 Old_age Always - 28 (Min/Max 20/35)
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 27
194 Temperature_Internal 0x0022 100 100 000 Old_age Always - 39
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 6761580
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 39372
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 27
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 2933988
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 062 062 000 Old_age Always - 0
234 Thermal_Throttle 0x0032 100 100 000 Old_age Always - 0/0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 6761580
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 2519883

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 48900 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl -a /dev/sdm
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Intel 730 and DC S35x0/3610/3700 Series SSDs
Device Model: INTEL SSDSC2BB600G4
Serial Number: BTWL345203NR600TGN
LU WWN Device Id: 5 5cd2e4 04b521f86
Firmware Version: D2010355
User Capacity: 600,127,266,816 bytes [600 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jun 9 09:35:18 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 90) seconds.
Offline data collection
capabilities: (0x79) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 098 098 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 75632
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 22
170 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
174 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 7
175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 637 (444 3001)
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Temperature_Case 0x0022 066 065 000 Old_age Always - 34 (Min/Max 21/35)
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 7
194 Temperature_Internal 0x0022 100 100 000 Old_age Always - 34
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 2420346
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 41
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 86
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 3001
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 088 088 000 Old_age Always - 0
234 Thermal_Throttle 0x0032 100 100 000 Old_age Always - 0/0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 2420346
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 6704834

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 10096 -
# 2 Short offline Completed without error 00% 55518 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl -a /dev/sdo
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Intel 730 and DC S35x0/3610/3700 Series SSDs
Device Model: INTEL SSDSC2BB600G4
Serial Number: BTWL34520470600TGN
LU WWN Device Id: 5 5cd2e4 04b52218f
Firmware Version: D2010355
User Capacity: 600,127,266,816 bytes [600 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jun 9 09:39:41 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 90) seconds.
Offline data collection
capabilities: (0x79) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 098 098 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 75614
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 26
170 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
174 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 10
175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 650 (443 3005)
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Temperature_Case 0x0022 067 065 000 Old_age Always - 33 (Min/Max 21/35)
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 10
194 Temperature_Internal 0x0022 100 100 000 Old_age Always - 33
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 2025482
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 30
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 86
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 3005
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 088 088 000 Old_age Always - 0
234 Thermal_Throttle 0x0032 100 100 000 Old_age Always - 0/0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 2025482
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 6285765

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 10078 -
# 2 Short offline Completed without error 00% 9864 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

dirtycamacho · Jun 9, 2023

sretalla said:
Most of those drives are around 8 years old.

Not necessarily a reason for them to fail, but they are pretty old.

With ~70 - 210 TBW, I don't think you're even beyond the original warranty of TBW, but have obviously gone past the years already, so too late.

What firmware are you running?

I think the custom fix iX had made available related to SSDs... maybe the Linux driver is exposing that more than FreeBSD did.

https://www.truenas.com/community/resources/lsi-9300-xx-firmware-update.145/[/U

I read through the post you linked related to that firmware. I'm not really experiencing system stability issues, but I'm not opposed to upgrading the HBA firmware to this version as a test if you think it would be recommended, especially considering that my current firmware version appears to be 8 years old (hey! about as old as my drives lol)

sretalla · Jun 9, 2023

You're on OOB firmware, so I would certainly suggest that upgrading can't make it worse and will probably help at least in some way.

dirtycamacho · Jun 9, 2023

sretalla said:
You're on OOB firmware, so I would certainly suggest that upgrading can't make it worse and will probably help at least in some way.

Cool. Running the firmware update and will monitor for a few days then rerun smart tests, scrubs, etc. and report back. Thank you!

dirtycamacho · Jun 12, 2023

I don't want to jump the gun here and I'm knocking on wood, but it seems like the HBA firmware update resolved the issue.

On Friday I performed the update per the instructions in the article which completed successfully. I rebooted per the instructions and upon reboot the active errors were no longer present. I re-ran SMART tests on all the impacted drives, scrubbed the pool, and everything came back clean (or as clean as possible for 8 YO drives). I've been monitoring for 3 days now and received no new r/w errors on any drives and I'm not seeing any adverse side effects on any of the other drives either.

I'm going to continue to monitor for a week or two to be safe, but I'm feeling confident. Now to develop a plan to remove the data integrity features from the HGST drives.

Huge thank you to you both, @sretalla and @Etorix, for all your help!

dirtycamacho · Aug 4, 2023

Almost 2 months later, no errors. Thanks all!

Important Announcement for the TrueNAS Community.

SOLVED Entire Pool of 6 SSDs Reporting Errors Following In-Place Upgrade from Core to Scale

dirtycamacho

Cadet

dirtycamacho

Cadet

sretalla

Powered by Neutrality

LSI 9300-xx Firmware Update

Etorix

Wizard

dirtycamacho

Cadet

dirtycamacho

Cadet

dirtycamacho

Cadet

sretalla

Powered by Neutrality

dirtycamacho

Cadet

dirtycamacho

Cadet

dirtycamacho

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

SOLVED Entire Pool of 6 SSDs Reporting Errors Following In-Place Upgrade from Core to Scale

Cadet

Cadet

Powered by Neutrality

Wizard

Cadet

Cadet

Cadet

Powered by Neutrality

Cadet

Cadet

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Entire Pool of 6 SSDs Reporting Errors Following In-Place Upgrade from Core to Scale"

Similar threads