So i've been trying to figure this out and sometimes it stays away for a week or so, sometimes it's instantly back. Basically it's an alert i get.
Boot pool status is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected..
So i went looking in other topics where i found someone recommending to zpool status -v, which gives me the following result.
Now i can see that one of my SSD's in the freenas boot mirror has a 4 under checksum and according to other topics this can be caused by SMART data of the drives. 1 of the SSD's is brandnew and the other one is the first SSD i ever owned but since it was only 120gb i didn't use it anymore (still worked fine for me).
Next someone said you need the device and run smartctl -a /dev/ada1 . Which in my case is ada1 and gave me the following:
And that points me to my brandnew wd green 120gb ssd. I hope i provided enough information so that you can help me figure out why i keep getting that notification about the unrecoverable error.
Boot pool status is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected..
So i went looking in other topics where i found someone recommending to zpool status -v, which gives me the following result.
Code:
state: ONLINE
scan: scrub repaired 0 in 0 days 08:00:16 with 0 errors on Sun Jun 28 08:00:16 2020
config:
NAME STATE READ WRITE CKSUM
PoolA ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/8917829c-7dc0-11ea-a619-7824af43cfa9 ONLINE 0 0 0
gptid/716454bd-9787-11ea-a381-7824af43cfa9 ONLINE 0 0 0
gptid/898dbed0-7dc0-11ea-a619-7824af43cfa9 ONLINE 0 0 0
gptid/897ceb83-7dc0-11ea-a619-7824af43cfa9 ONLINE 0 0 0
gptid/899c4e6c-7dc0-11ea-a619-7824af43cfa9 ONLINE 0 0 0
gptid/8fa6b8ea-9787-11ea-a381-7824af43cfa9 ONLINE 0 0 0
errors: No known data errors
pool: freenas-boot
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub repaired 4K in 0 days 00:05:15 with 0 errors on Fri Jun 26 03:50:15 2020
config:
NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ada0p2 ONLINE 0 0 0 block size: 512B configured, 4096B native
ada1p2 ONLINE 0 0 4
errors: No known data errors
pool: ssd
state: ONLINE
scan: scrub repaired 144K in 0 days 00:32:09 with 0 errors on Sun Jun 21 00:32:09 2020
config:
NAME STATE READ WRITE CKSUM
ssd ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
Now i can see that one of my SSD's in the freenas boot mirror has a 4 under checksum and according to other topics this can be caused by SMART data of the drives. 1 of the SSD's is brandnew and the other one is the first SSD i ever owned but since it was only 120gb i didn't use it anymore (still worked fine for me).
Next someone said you need the device and run smartctl -a /dev/ada1 . Which in my case is ada1 and gave me the following:
Code:
/dev/ada1p2: Unable to detect device type
Please specify device type with the -d option.
Use smartctl -h to get a usage summary
root@Brisingr[~]# smartctl -a /dev/ada1
smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: WD Blue / Red / Green SSDs
Device Model: WDC WDS120G2G0A-00JH30
Serial Number: 2003BQ461911
LU WWN Device Id: 5 001b44 4a830af06
Firmware Version: UE510000
User Capacity: 120,040,980,480 bytes [120 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Jun 30 21:56:14 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x15) SMART execute Offline immediate.
No Auto Offline data collection support.
Abort Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 21) minutes.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 1929
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 15
165 Block_Erase_Count 0x0032 100 100 000 Old_age Always - 379
166 Minimum_PE_Cycles_TLC 0x0032 100 100 --- Old_age Always - 2
167 Max_Bad_Blocks_per_Die 0x0032 100 100 --- Old_age Always - 0
168 Maximum_PE_Cycles_TLC 0x0032 100 100 --- Old_age Always - 5
169 Total_Bad_Blocks 0x0032 100 100 --- Old_age Always - 225
170 Grown_Bad_Blocks 0x0032 100 100 --- Old_age Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
173 Average_PE_Cycles_TLC 0x0032 100 100 000 Old_age Always - 2
174 Unexpected_Power_Loss 0x0032 100 100 000 Old_age Always - 10
184 End-to-End_Error 0x0032 100 100 --- Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 --- Old_age Always - 0
194 Temperature_Celsius 0x0022 065 051 000 Old_age Always - 35 (Min/Max 18/51)
199 UDMA_CRC_Error_Count 0x0032 100 100 --- Old_age Always - 0
230 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 0x010800280108
232 Available_Reservd_Space 0x0033 100 100 005 Pre-fail Always - 100
233 NAND_GB_Written_TLC 0x0032 100 100 --- Old_age Always - 262
234 NAND_GB_Written_SLC 0x0032 100 100 000 Old_age Always - 2115
241 Host_Writes_GiB 0x0030 100 100 000 Old_age Offline - 777
242 Host_Reads_GiB 0x0030 100 100 000 Old_age Offline - 37
244 Temp_Throttle_Status 0x0032 000 100 --- Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 1913 -
# 2 Short offline Completed without error 00% 1889 -
# 3 Extended offline Completed without error 00% 1860 -
# 4 Short offline Completed without error 00% 1841 -
# 5 Short offline Completed without error 00% 1817 -
# 6 Short offline Completed without error 00% 1794 -
# 7 Short offline Completed without error 00% 1770 -
# 8 Short offline Completed without error 00% 1745 -
# 9 Short offline Completed without error 00% 1721 -
#10 Short offline Completed without error 00% 1673 -
#11 Short offline Completed without error 00% 1649 -
#12 Short offline Completed without error 00% 1625 -
#13 Short offline Completed without error 00% 1601 -
#14 Short offline Completed without error 00% 1577 -
#15 Extended offline Completed without error 00% 1524 -
#16 Extended offline Self-test routine in progress 40% 1524 -
Selective Self-tests/Logging not supported
And that points me to my brandnew wd green 120gb ssd. I hope i provided enough information so that you can help me figure out why i keep getting that notification about the unrecoverable error.