I got a set of 4 brand-new SSDs and created a pool. And it's immediately degraded. Every type of SMART test turns up green.
Here's my zpool status:
And smartctl of the offending drive:
It looks to me like the reserved space is a little bit too low. I'm not sure if this is a real actual problem, nor if it's correct for TrueNAS to "trip over" this pre-fault. Isn't reserved space supposed to vary from drive to drive anyway?
So the question is: is it safe to use this drive, and if so, how do I tell TrueNAS to go and use it as normal?
Here's my zpool status:
Code:
pool: SSD
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
config:
NAME STATE READ WRITE CKSUM
SSD DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
9636c9db-39f1-466b-b238-973bfee2a415 ONLINE 0 0 0
57ba6532-4e94-42c2-a513-14e6dc8127ab ONLINE 0 0 0
05df70c4-4f3f-4939-a597-f0a31d67a635 ONLINE 0 0 0
cdc73920-be07-4cc3-a996-0511ae1df527 FAULTED 0 28 0 too many errors
errors: No known data errors
And smartctl of the offending drive:
Code:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.131+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: WD Blue / Red / Green SSDs
Device Model: WDC WDS500G1R0A-68A4W0
Serial Number: 214703A00055
LU WWN Device Id: 5 001b44 8bc38af2b
Firmware Version: 411000WR
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 1.5 Gb/s)
Local Time is: Mon Sep 26 17:50:30 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x11) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.
SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 --- Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 --- Old_age Always - 10
12 Power_Cycle_Count 0x0032 100 100 --- Old_age Always - 8
165 Block_Erase_Count 0x0032 100 100 --- Old_age Always - 65536
166 Minimum_PE_Cycles_TLC 0x0032 100 100 --- Old_age Always - 0
167 Max_Bad_Blocks_per_Die 0x0032 100 100 --- Old_age Always - 130
168 Maximum_PE_Cycles_TLC 0x0032 100 100 --- Old_age Always - 0
169 Total_Bad_Blocks 0x0032 100 100 --- Old_age Always - 283
170 Grown_Bad_Blocks 0x0032 100 100 --- Old_age Always - 0
171 Program_Fail_Count 0x0032 100 100 --- Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 --- Old_age Always - 0
173 Average_PE_Cycles_TLC 0x0032 100 100 --- Old_age Always - 0
174 Unexpected_Power_Loss 0x0032 100 100 --- Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 --- Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 --- Old_age Always - 0
188 Command_Timeout 0x0032 100 100 --- Old_age Always - 0
194 Temperature_Celsius 0x0022 075 030 --- Old_age Always - 25 (Min/Max 21/30)
199 UDMA_CRC_Error_Count 0x0032 100 100 --- Old_age Always - 11
230 Media_Wearout_Indicator 0x0032 001 001 --- Old_age Always - 0x000000000000
232 Available_Reservd_Space 0x0033 100 100 004 Pre-fail Always - 100
233 NAND_GB_Written_TLC 0x0032 100 100 --- Old_age Always - 0
234 NAND_GB_Written_SLC 0x0032 100 100 --- Old_age Always - 3
241 Host_Writes_GiB 0x0030 253 253 --- Old_age Offline - 3
242 Host_Reads_GiB 0x0030 253 253 --- Old_age Offline - 6
244 Temp_Throttle_Status 0x0032 000 100 --- Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Conveyance offline Completed without error 00% 0 -
# 2 Extended offline Completed without error 00% 9 -
Selective Self-tests/Logging not supportedIt looks to me like the reserved space is a little bit too low. I'm not sure if this is a real actual problem, nor if it's correct for TrueNAS to "trip over" this pre-fault. Isn't reserved space supposed to vary from drive to drive anyway?
So the question is: is it safe to use this drive, and if so, how do I tell TrueNAS to go and use it as normal?