Kevin Horton
Guru
- Joined
- Dec 2, 2015
- Messages
- 730
Overnight, I received an Alert email from one of my FreeNAS servers:
FreeNAS @ big_bertha.local
New alerts:
* Device: /dev/ada1, Self-Test Log error count increased from 0 to 1
=============
I logged on via ssh, and checked the status via SMART, and found that the most recent SMART short test had logged a read failure:
I ran a long SMART test, and it passed:
I'm not seeing anything of concern in the latest SMART output, other than the read error from the failed short test. Am I missing something? Is there anything else I should check?
The pool in question is an 8 disk RAIDZ2 pool that is a backup of my main pool. I have another backup on a second local server, and an offsite rsync backup on a two disk strip. I have one badblock tested spare drive on the shelf. Of course this occurs immediately before I head on the road for seven to ten days. I'm considering adding this drive to the pool as a spare, in case the disk fails while I'm away (I know that I must be careful to not add it to the vdev as a stripe).
If I add the disk as a spare to the backup pool, can I remove it later in case the main pool is the first pool to suffer an actual disk failure?
FreeNAS @ big_bertha.local
New alerts:
* Device: /dev/ada1, Self-Test Log error count increased from 0 to 1
=============
I logged on via ssh, and checked the status via SMART, and found that the most recent SMART short test had logged a read failure:
Code:
smartctl -x /dev/ada1
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD40EFRX-68WT0N0
Serial Number: WD-WCC4E1HY83CJ
LU WWN Device Id: 5 0014ee 20d3e05cc
Firmware Version: 82.00A82
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Sep 25 05:36:38 2019 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 117) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: (52560) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 526) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0
3 Spin_Up_Time POS--K 183 175 021 - 7833
4 Start_Stop_Count -O--CK 099 099 000 - 1200
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 100 253 000 - 0
9 Power_On_Hours -O--CK 068 068 000 - 23413
10 Spin_Retry_Count -O--CK 100 100 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 87
192 Power-Off_Retract_Count -O--CK 200 200 000 - 75
193 Load_Cycle_Count -O--CK 200 200 000 - 1273
194 Temperature_Celsius -O---K 115 111 000 - 37
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb6 GPL,SL VS 1 Device vendor specific log
0xb7 GPL,SL VS 39 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 50% 23404 3178608
# 2 Short offline Completed without error 00% 23380 -
# 3 Short offline Completed without error 00% 23356 -
# 4 Short offline Completed without error 00% 23332 -
# 5 Extended offline Completed without error 00% 23318 -
# 6 Short offline Completed without error 00% 23308 -
# 7 Short offline Completed without error 00% 23284 -
# 8 Short offline Completed without error 00% 23260 -
# 9 Short offline Completed without error 00% 23236 -
#10 Short offline Completed without error 00% 23212 -
#11 Short offline Completed without error 00% 23188 -
#12 Short offline Completed without error 00% 23164 -
#13 Extended offline Completed without error 00% 23150 -
#14 Short offline Completed without error 00% 23140 -
#15 Short offline Completed without error 00% 23116 -
#16 Short offline Completed without error 00% 23092 -
#17 Short offline Completed without error 00% 23068 -
#18 Short offline Completed without error 00% 23044 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 37 Celsius
Power Cycle Min/Max Temperature: 29/39 Celsius
Lifetime Min/Max Temperature: 3/41 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (217)
Index Estimated Time Temperature Celsius
218 2019-09-24 21:39 36 *****************
... ..( 8 skipped). .. *****************
227 2019-09-24 21:48 36 *****************
228 2019-09-24 21:49 37 ******************
... ..(420 skipped). .. ******************
171 2019-09-25 04:50 37 ******************
172 2019-09-25 04:51 36 *****************
... ..( 20 skipped). .. *****************
193 2019-09-25 05:12 36 *****************
194 2019-09-25 05:13 37 ******************
... ..( 19 skipped). .. ******************
214 2019-09-25 05:33 37 ******************
215 2019-09-25 05:34 36 *****************
216 2019-09-25 05:35 36 *****************
217 2019-09-25 05:36 36 *****************
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
Device Statistics (GP/SMART Log 0x04) not supported
Pending Defects log (GP Log 0x0c) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 7 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 7 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 4659470 Vendor specific
I ran a long SMART test, and it passed:
Code:
smartctl -x /dev/ada1
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD40EFRX-68WT0N0
Serial Number: WD-WCC4E1HY83CJ
LU WWN Device Id: 5 0014ee 20d3e05cc
Firmware Version: 82.00A82
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Sep 25 19:35:14 2019 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (52560) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 526) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0
3 Spin_Up_Time POS--K 183 175 021 - 7833
4 Start_Stop_Count -O--CK 099 099 000 - 1200
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 068 068 000 - 23427
10 Spin_Retry_Count -O--CK 100 100 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 87
192 Power-Off_Retract_Count -O--CK 200 200 000 - 75
193 Load_Cycle_Count -O--CK 200 200 000 - 1273
194 Temperature_Celsius -O---K 115 111 000 - 37
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 9
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb6 GPL,SL VS 1 Device vendor specific log
0xb7 GPL,SL VS 39 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 23423 -
# 2 Short offline Completed: read failure 50% 23404 3178608
# 3 Short offline Completed without error 00% 23380 -
# 4 Short offline Completed without error 00% 23356 -
# 5 Short offline Completed without error 00% 23332 -
# 6 Extended offline Completed without error 00% 23318 -
# 7 Short offline Completed without error 00% 23308 -
# 8 Short offline Completed without error 00% 23284 -
# 9 Short offline Completed without error 00% 23260 -
#10 Short offline Completed without error 00% 23236 -
#11 Short offline Completed without error 00% 23212 -
#12 Short offline Completed without error 00% 23188 -
#13 Short offline Completed without error 00% 23164 -
#14 Extended offline Completed without error 00% 23150 -
#15 Short offline Completed without error 00% 23140 -
#16 Short offline Completed without error 00% 23116 -
#17 Short offline Completed without error 00% 23092 -
#18 Short offline Completed without error 00% 23068 -
1 of 1 failed self-tests are outdated by newer successful extended offline self-test # 1
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 37 Celsius
Power Cycle Min/Max Temperature: 29/39 Celsius
Lifetime Min/Max Temperature: 3/41 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (98)
Index Estimated Time Temperature Celsius
99 2019-09-25 11:38 37 ******************
... ..( 92 skipped). .. ******************
192 2019-09-25 13:11 37 ******************
193 2019-09-25 13:12 39 ********************
... ..( 13 skipped). .. ********************
207 2019-09-25 13:26 39 ********************
208 2019-09-25 13:27 38 *******************
... ..( 34 skipped). .. *******************
243 2019-09-25 14:02 38 *******************
244 2019-09-25 14:03 39 ********************
... ..( 12 skipped). .. ********************
257 2019-09-25 14:16 39 ********************
258 2019-09-25 14:17 38 *******************
... ..( 20 skipped). .. *******************
279 2019-09-25 14:38 38 *******************
280 2019-09-25 14:39 39 ********************
... ..( 6 skipped). .. ********************
287 2019-09-25 14:46 39 ********************
288 2019-09-25 14:47 38 *******************
... ..( 20 skipped). .. *******************
309 2019-09-25 15:08 38 *******************
310 2019-09-25 15:09 39 ********************
... ..( 7 skipped). .. ********************
318 2019-09-25 15:17 39 ********************
319 2019-09-25 15:18 38 *******************
... ..( 25 skipped). .. *******************
345 2019-09-25 15:44 38 *******************
346 2019-09-25 15:45 39 ********************
... ..( 5 skipped). .. ********************
352 2019-09-25 15:51 39 ********************
353 2019-09-25 15:52 38 *******************
... ..( 24 skipped). .. *******************
378 2019-09-25 16:17 38 *******************
379 2019-09-25 16:18 39 ********************
... ..( 5 skipped). .. ********************
385 2019-09-25 16:24 39 ********************
386 2019-09-25 16:25 38 *******************
... ..( 24 skipped). .. *******************
411 2019-09-25 16:50 38 *******************
412 2019-09-25 16:51 39 ********************
... ..( 3 skipped). .. ********************
416 2019-09-25 16:55 39 ********************
417 2019-09-25 16:56 38 *******************
... ..( 5 skipped). .. *******************
423 2019-09-25 17:02 38 *******************
424 2019-09-25 17:03 39 ********************
425 2019-09-25 17:04 39 ********************
426 2019-09-25 17:05 39 ********************
427 2019-09-25 17:06 38 *******************
... ..( 27 skipped). .. *******************
455 2019-09-25 17:34 38 *******************
456 2019-09-25 17:35 37 ******************
... ..(119 skipped). .. ******************
98 2019-09-25 19:35 37 ******************
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
Device Statistics (GP/SMART Log 0x04) not supported
Pending Defects log (GP Log 0x0c) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 7 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 7 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 4709736 Vendor specific
I'm not seeing anything of concern in the latest SMART output, other than the read error from the failed short test. Am I missing something? Is there anything else I should check?
The pool in question is an 8 disk RAIDZ2 pool that is a backup of my main pool. I have another backup on a second local server, and an offsite rsync backup on a two disk strip. I have one badblock tested spare drive on the shelf. Of course this occurs immediately before I head on the road for seven to ten days. I'm considering adding this drive to the pool as a spare, in case the disk fails while I'm away (I know that I must be careful to not add it to the vdev as a stripe).
If I add the disk as a spare to the backup pool, can I remove it later in case the main pool is the first pool to suffer an actual disk failure?