Stromkompressor
Dabbler
- Joined
- Mar 13, 2023
- Messages
- 18
Hi!
After 1 week downtime of my TrueNAS Scale system (I just had it shut off), I turned it on today. I manually started a long SMART test for all disks (4 HDDs and 1 SSD as the boot drive). After some time I was alerted with this error:
I checked via SSH what's going on:
I am not aware that I interrupted the test. smartctl says
I am confused why zpool status shows errors for the SSD drive.
After 1 week downtime of my TrueNAS Scale system (I just had it shut off), I turned it on today. I manually started a long SMART test for all disks (4 HDDs and 1 SSD as the boot drive). After some time I was alerted with this error:
Code:
TrueNAS @ truenas
New alerts:
Boot pool status is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected..
I checked via SSH what's going on:
Code:
admin@truenas[~]$ sudo zpool status -v
[sudo] password for admin:
pool: boot-pool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 00:00:21 with 0 errors on Sat Jun 10 03:45:22 2023
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
sdd3 ONLINE 0 12 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub repaired 0B in 00:41:34 with 0 errors on Sun Jun 4 00:41:35 2023
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
e73ded72-01b5-488c-9b07-5db6305c3d1f ONLINE 0 0 0
2ad2fa7f-6b38-4d95-acd6-df02afdafc0e ONLINE 0 0 0
e70e9be2-8007-4bbb-a822-5cf81c50ebfd ONLINE 0 0 0
e0886fa9-0284-4117-8cc0-8e682467a884 ONLINE 0 0 0
errors: No known data errors
admin@truenas[~]$ sudo smartctl -a /dev/sdd
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: INTENSO SSD
Serial Number: AA000000000000014880
Firmware Version: V0718B0
User Capacity: 240,057,409,536 bytes [240 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sun Jun 18 18:04:58 2023 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x03) Offline data collection activity
is in progress.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 41) The self-test routine was interrupted
by the host with a hard or soft reset.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x11) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.
SCT capabilities: (0x0001) SCT Status supported.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 100 100 050 Old_age Always - 0
5 Reallocated_Sector_Ct 0x0032 100 100 050 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 050 Old_age Always - 995
12 Power_Cycle_Count 0x0032 100 100 050 Old_age Always - 42
160 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
161 Unknown_Attribute 0x0033 100 100 050 Pre-fail Always - 100
163 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 14
164 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 163
165 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 3
166 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 1
167 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 2
168 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 5050
169 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 100
175 Program_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 0
176 Erase_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 0
177 Wear_Leveling_Count 0x0032 100 100 050 Old_age Always - 0
178 Used_Rsvd_Blk_Cnt_Chip 0x0032 100 100 050 Old_age Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 050 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 050 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age Always - 27
194 Temperature_Celsius 0x0022 100 100 050 Old_age Always - 40
195 Hardware_ECC_Recovered 0x0032 100 100 050 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 100 100 050 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 050 Old_age Always - 0
198 Offline_Uncorrectable 0x0032 100 100 050 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 050 Old_age Always - 0
232 Available_Reservd_Space 0x0032 100 100 050 Old_age Always - 100
241 Total_LBAs_Written 0x0030 100 100 050 Old_age Offline - 3858
242 Total_LBAs_Read 0x0030 100 100 050 Old_age Offline - 6394
245 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 2640
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Interrupted (host reset) 90% 992 -
# 2 Extended offline Interrupted (host reset) 80% 991 -
# 3 Short offline Completed without error 00% 975 -
# 4 Short offline Completed without error 00% 951 -
# 5 Short offline Completed without error 00% 927 -
# 6 Short offline Completed without error 00% 903 -
# 7 Short offline Completed without error 00% 879 -
# 8 Short offline Completed without error 00% 855 -
# 9 Short offline Completed without error 00% 832 -
#10 Short offline Completed without error 00% 808 -
#11 Short offline Completed without error 00% 784 -
#12 Extended offline Completed without error 00% 768 -
#13 Short offline Completed without error 00% 760 -
#14 Short offline Completed without error 00% 736 -
#15 Short offline Completed without error 00% 712 -
#16 Extended offline Completed without error 00% 706 -
#17 Short offline Completed without error 00% 688 -
#18 Short offline Completed without error 00% 677 -
#19 Short offline Completed without error 00% 675 -
#20 Short offline Completed without error 00% 655 -
#21 Short offline Completed without error 00% 631 -
Selective Self-tests/Logging not supportedI am not aware that I interrupted the test. smartctl says
Code:
The self-test routine was interrupted by the host with a hard or soft reset.
I am confused why zpool status shows errors for the SSD drive.