Hi all,
I am hoping someone can educate me with this problem. This is the second time I am getting this error and this one is with a new vdev. So problem happened 2 times, once with different drives.
I have a suspicion this is just a failing drive or some bad sectors. Since this system has been running for about a year with Unraid before I started migration to freenas a week ago. (Obviously HDDs were stress tested at one point or another before being added to a system. Memtests all pass)
So this morning I got this message:
Status of the pool
I got similar error on another vdev a day before on a different drive but I discarded it as a bad sector and did zpool clear after running a smart test.
This is today's status
The second highlighted rectangle had same error as first a day ago.
smartctl -a /dev/da18
This is error happened the other day with da11 I also added new vdev that day i am asuming error was ok with corrupt GPT stuff
smartctl -a /dev/da11
Here are my firmware versions I am also assuming all is ok there:
I guess my questions are:
edit: correct smart test
I am hoping someone can educate me with this problem. This is the second time I am getting this error and this one is with a new vdev. So problem happened 2 times, once with different drives.
I have a suspicion this is just a failing drive or some bad sectors. Since this system has been running for about a year with Unraid before I started migration to freenas a week ago. (Obviously HDDs were stress tested at one point or another before being added to a system. Memtests all pass)
So this morning I got this message:
Code:
kernel log messages: > mps2: SAS Address for SATA device = 4874463cffdcbe94 > mps2: SAS Address from SATA device = 4874463cffdcbe94 > da20 at mps2 bus 0 scbus2 target 11 lun 0 > da20: <ATA WDC WD50EFRX-68M 0A82> Fixed Direct Access SPC-4 SCSI device > da20: Serial Number WD-xxxxxxxxx > da20: 600.000MB/s transfers > da20: Command Queueing enabled > da20: 4769307MB (9767541168 512 byte sectors) > da20: quirks=0x8<4K> > ahcich1: Timeout on slot 8 port 0 > ahcich1: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd c0 serr 00000000 cmd 0004c817 > (ada1:ahcich1:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 > (ada1:ahcich1:0:0:0): CAM status: Command timeout > (ada1:ahcich1:0:0:0): Retrying command > ahcich0: Timeout on slot 7 port 0 > ahcich0: is 00000000 cs 00000080 ss 00000000 rs 00000080 tfd c0 serr 00000000 cmd 0004c717 > (ada0:ahcich0:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Retrying command > (da18:mps0:0:3:0): WRITE(10). CDB: 2a 00 7b b7 df e8 00 00 08 00 > (da18:mps0:0:3:0): CAM status: SCSI Status Error > (da18:mps0:0:3:0): SCSI status: Check Condition > (da18:mps0:0:3:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) > (da18:mps0:0:3:0): Info: 0x7bb7dfe8 > (da18:mps0:0:3:0): Error 22, Unretryable error -- End of security output --
Status of the pool
Code:
Checking status of zfs pools:
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
freenas-boot 14.9G 1.09G 13.8G - - 7% 1.00x ONLINE -
zfast 460G 195G 265G - 27% 42% 1.00x ONLINE /mnt
zroot 81.8T 38.3T 43.5T - 24% 46% 1.00x ONLINE /mnt
pool: zroot
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: resilvered 68K in 0h0m with 0 errors on Sat Jul 2 02:53:31 2016
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/775bcae9-38d8-11e6-8455-0cc47a6b6816 ONLINE 0 0 0
gptid/7861ae25-38d8-11e6-8455-0cc47a6b6816 ONLINE 0 0 0
gptid/79076331-38d8-11e6-8455-0cc47a6b6816 ONLINE 0 0 0
gptid/79b53996-38d8-11e6-8455-0cc47a6b6816 ONLINE 0 0 0
gptid/7ac4a958-38d8-11e6-8455-0cc47a6b6816 ONLINE 0 0 0
gptid/7bdf71d9-38d8-11e6-8455-0cc47a6b6816 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
gptid/7cf57b75-38d8-11e6-8455-0cc47a6b6816 ONLINE 0 0 0
gptid/7e037123-38d8-11e6-8455-0cc47a6b6816 ONLINE 0 0 0
gptid/7f1e9fd8-38d8-11e6-8455-0cc47a6b6816 ONLINE 0 0 0
gptid/80358d0f-38d8-11e6-8455-0cc47a6b6816 ONLINE 0 0 0
gptid/814a11ec-38d8-11e6-8455-0cc47a6b6816 ONLINE 0 0 0
gptid/81fe2a3b-38d8-11e6-8455-0cc47a6b6816 ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
gptid/063d5bbf-3ebe-11e6-8f50-0cc47a6b6816 ONLINE 0 0 0
gptid/0751dcd6-3ebe-11e6-8f50-0cc47a6b6816 ONLINE 0 0 0
gptid/080d0dc1-3ebe-11e6-8f50-0cc47a6b6816 ONLINE 0 0 0
gptid/08b9ac7e-3ebe-11e6-8f50-0cc47a6b6816 ONLINE 0 0 0
gptid/09630467-3ebe-11e6-8f50-0cc47a6b6816 ONLINE 0 1 0
gptid/0a0c005f-3ebe-11e6-8f50-0cc47a6b6816 ONLINE 0 0 0
errors: No known data errors
-- End of daily output --I got similar error on another vdev a day before on a different drive but I discarded it as a bad sector and did zpool clear after running a smart test.
This is today's status
The second highlighted rectangle had same error as first a day ago.
smartctl -a /dev/da18
Code:
smartctl -a /dev/da18
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD50EFRX-68MYMN1
Serial Number: WD-xxxx
LU WWN Device Id: 5 0014ee 260b5e9b4
Firmware Version: 82.00A82
User Capacity: 5,000,981,078,016 bytes [5.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5700 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sat Jul 2 08:59:33 2016 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 248) Self-test routine in progress...
80% of test remaining.
Total time to complete Offline
data collection: (57960) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 579) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x303d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 204 202 021 Pre-fail Always - 8791
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 310
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 10207
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 21
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 9
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 381
194 Temperature_Celsius 0x0022 118 112 000 Old_age Always - 34
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 5569 -
# 2 Short offline Completed without error 00% 5554 -
# 3 Extended offline Completed without error 00% 5542 -
# 4 Short offline Completed without error 00% 5530 -
# 5 Short offline Completed without error 00% 5506 -
# 6 Short offline Completed without error 00% 5472 -
# 7 Short offline Completed without error 00% 5448 -
# 8 Short offline Completed without error 00% 5424 -
# 9 Short offline Completed without error 00% 5400 -
#10 Short offline Completed without error 00% 5376 -
#11 Short offline Completed without error 00% 5353 -
#12 Short offline Completed without error 00% 5329 -
#13 Short offline Completed without error 00% 5305 -
#14 Short offline Completed without error 00% 5281 -
#15 Short offline Completed without error 00% 5257 -
#16 Extended offline Completed without error 00% 5245 -
#17 Short offline Completed without error 00% 5233 -
#18 Short offline Completed without error 00% 5232 -
#19 Short offline Completed without error 00% 5208 -
#20 Short offline Completed without error 00% 5184 -
#21 Short offline Completed without error 00% 5160 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
This is error happened the other day with da11 I also added new vdev that day i am asuming error was ok with corrupt GPT stuff
Code:
anomaly.m3ki.net kernel log messages: > mps2: SAS Address for SATA device = 4873463ef7c2c595 > mps2: SAS Address from SATA device = 4873463ef7c2c595 > da19 at mps2 bus 0 scbus2 target 2 lun 0 > da19: <ATA WDC WD50EFRX-68M 0A82> Fixed Direct Access SPC-4 SCSI device > da19: Serial Number WD---------- > da19: 600.000MB/s transfers > da19: Command Queueing enabled > da19: 4769307MB (9767541168 512 byte sectors) > da19: quirks=0x8<4K> > GEOM_ELI: Device da13p1.eli destroyed. > GEOM_ELI: Detached da13p1.eli on last close. > GEOM_ELI: Device da14p1.eli destroyed. > GEOM_ELI: Detached da14p1.eli on last close. > GEOM_ELI: Device da0p1.eli destroyed. > GEOM_ELI: Detached da0p1.eli on last close. > GEOM_ELI: Device da1p1.eli destroyed. > GEOM_ELI: Detached da1p1.eli on last close. > GEOM_ELI: Device da2p1.eli destroyed. > GEOM_ELI: Detached da2p1.eli on last close. > GEOM_ELI: Device da3p1.eli destroyed. > GEOM_ELI: Detached da3p1.eli on last close. > GEOM_ELI: Device da4p1.eli destroyed. > GEOM_ELI: Detached da4p1.eli on last close. > GEOM_ELI: Device da5p1.eli destroyed. > GEOM_ELI: Detached da5p1.eli on last close. > GEOM_ELI: Device da7p1.eli destroyed. > GEOM_ELI: Detached da7p1.eli on last close. > GEOM_ELI: Device da8p1.eli destroyed. > GEOM_ELI: Detached da8p1.eli on last close. > GEOM_ELI: Device da9p1.eli destroyed. > GEOM_ELI: Detached da9p1.eli on last close. > GEOM_ELI: Device da11p1.eli destroyed. > GEOM_ELI: Detached da11p1.eli on last close. > GEOM_ELI: Device ada0p1.eli destroyed. > GEOM_ELI: Detached ada0p1.eli on last close. > GEOM_ELI: Device ada1p1.eli destroyed. > GEOM_ELI: Detached ada1p1.eli on last close. > GEOM: da6: the primary GPT table is corrupt or invalid. > GEOM: da6: using the secondary instead -- recovery strongly advised. > GEOM: da10: the primary GPT table is corrupt or invalid. > GEOM: da10: using the secondary instead -- recovery strongly advised. > GEOM: da12: the primary GPT table is corrupt or invalid. > GEOM: da12: using the secondary instead -- recovery strongly advised. > GEOM: da15: the primary GPT table is corrupt or invalid. > GEOM: da15: using the secondary instead -- recovery strongly advised. > GEOM: da18: the primary GPT table is corrupt or invalid. > GEOM: da18: using the secondary instead -- recovery strongly advised. > GEOM: da19: the primary GPT table is corrupt or invalid. > GEOM: da19: using the secondary instead -- recovery strongly advised. > GEOM_ELI: Device da13p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device da14p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device da0p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device da1p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device da2p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device da3p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device da4p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device da5p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device da7p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device da8p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device da9p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device da11p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device da6p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device da10p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device da12p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device da15p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device da18p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device da19p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device ada0p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > GEOM_ELI: Device ada1p1.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: hardware > (da11:mps2:0:6:0): READ(10). CDB: 28 00 40 56 04 58 00 00 40 00 length 32768 SMID 965 terminated ioc 804b scsi 0 state 0 xfer 0 > (da11:mps2:0:6:0): READ(10). CDB: 28 00 40 56 05 18 00 00 40 00 length 32768 SMID 744 terminated ioc 804b scsi 0 state 0 xfer(da11:mps2:0:6:0): READ(10). CDB: 28 00 40 56 04 58 00 00 40 00 > 0 > (da11:mps2:0:6:0): CAM status: CCB request completed with an error > (da11:mps2:0:6:0): READ(10). CDB: 28 00 40 56 04 98 00 00 40 00 length 32768 SMID 973 terminated ioc 804b scsi 0 state 0 xfer(da11: 0 > mps2:0:6:0): Retrying command > (da11:mps2:0:6:0): READ(10). CDB: 28 00 40 56 05 18 00 00 40 00 > (da11:mps2:0:6:0): CAM status: CCB request completed with an error > (da11:mps2:0:6:0): Retrying command > (da11:mps2:0:6:0): READ(10). CDB: 28 00 40 56 04 98 00 00 40 00 > (da11:mps2:0:6:0): CAM status: CCB request completed with an error > (da11:mps2:0:6:0): Retrying command > (da11:mps2:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 2c 93 cd 60 00 00 00 40 00 00 > (da11:mps2:0:6:0): CAM status: SCSI Status Error > (da11:mps2:0:6:0): SCSI status: Check Condition > (da11:mps2:0:6:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) > (da11:mps2:0:6:0): Info: 0x12c93cd60 > (da11:mps2:0:6:0): Error 22, Unretryable error -- End of security output --
smartctl -a /dev/da11
Code:
smartctl -a /dev/da11
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD50EFRX-68MYMN1
Serial Number: WD-xxx
LU WWN Device Id: 5 0014ee 260b5abef
Firmware Version: 82.00A82
User Capacity: 5,000,981,078,016 bytes [5.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5700 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sat Jul 2 08:52:53 2016 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (57660) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 576) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x303d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 203 202 021 Pre-fail Always - 8816
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 205
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7325
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 25
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 6
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 324
194 Temperature_Celsius 0x0022 116 109 000 Old_age Always - 36
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 7303 -
# 2 Short offline Completed without error 00% 7291 -
# 3 Short offline Completed without error 00% 7255 -
# 4 Short offline Completed without error 00% 7159 -
# 5 Short offline Completed without error 00% 6475 -
# 6 Short offline Completed without error 00% 6451 -
# 7 Short offline Completed without error 00% 6427 -
# 8 Extended offline Completed without error 00% 6404 -
# 9 Short offline Completed without error 00% 6390 -
#10 Short offline Completed without error 00% 6366 -
#11 Short offline Completed without error 00% 6343 -
#12 Short offline Completed without error 00% 6324 -
#13 Short offline Completed without error 00% 6300 -
#14 Short offline Completed without error 00% 6267 -
#15 Short offline Completed without error 00% 6243 -
#16 Short offline Completed without error 00% 6219 -
#17 Short offline Completed without error 00% 6195 -
#18 Short offline Completed without error 00% 6171 -
#19 Short offline Completed without error 00% 6146 -
#20 Short offline Completed without error 00% 6122 -
#21 Short offline Completed without error 00% 6098 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.Here are my firmware versions I am also assuming all is ok there:
Code:
> mps0: Firmware: 20.00.04.00, Driver: 20.00.00.00-fbsd > mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc> > pcib2: <ACPI PCI-PCI bridge> irq 16 at device 1.1 on pci0 > pci2: <ACPI PCI bus> on pcib2 > mps1: <Avago Technologies (LSI) SAS2308> port 0xd000-0xd0ff mem 0xf7240000-0xf724ffff,0xf7200000-0xf723ffff irq 17 at device 0.0 on pci2 > mps1: Firmware: 20.00.04.00, Driver: 20.00.00.00-fbsd > mps1: IOCCapabilities: 5285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc> > xhci0: <Intel Lynx Point USB 3.0 controller> mem 0xf7700000-0xf770ffff irq 16 at device 20.0 on pci0 > xhci0: 32 bytes context size, 64-bit DMA > xhci0: Port routing mask set to 0xffffffff > usbus0 on xhci0 > ehci0: <Intel Lynx Point USB 2.0 controller USB-B> mem 0xf7714000-0xf77143ff irq 16 at device 26.0 on pci0 > usbus1: EHCI version 1.0 > usbus1 on ehci0 > pcib3: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0 > pci3: <ACPI PCI bus> on pcib3 > pcib4: <ACPI PCI-PCI bridge> at device 0.0 on pci3 > pci4: <ACPI PCI bus> on pcib4 > vgapci0: <VGA-compatible display> port 0xc000-0xc07f mem 0xf6000000-0xf6ffffff,0xf7000000-0xf701ffff irq 16 at device 0.0 on pci4 > vgapci0: Boot video device > pcib5: <ACPI PCI-PCI bridge> irq 18 at device 28.2 on pci0 > pci5: <ACPI PCI bus> on pcib5 > igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xb000-0xb01f mem 0xf7500000-0xf757ffff,0xf7580000-0xf7583fff irq 18 at device 0.0 on pci5 > igb0: Using MSIX interrupts with 5 vectors > igb0: Ethernet address: 0c:c4:7a:6b:68:16 > igb0: Bound queue 0 to cpu 0 > igb0: Bound queue 1 to cpu 1 > igb0: Bound queue 2 to cpu 2 > igb0: Bound queue 3 to cpu 3 > pcib6: <ACPI PCI-PCI bridge> irq 19 at device 28.3 on pci0 > pci6: <ACPI PCI bus> on pcib6 > igb1: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xa000-0xa01f mem 0xf7400000-0xf747ffff,0xf7480000-0xf7483fff irq 19 at device 0.0 on pci6 > igb1: Using MSIX interrupts with 5 vectors > igb1: Ethernet address: 0c:c4:7a:6b:68:17 > igb1: Bound queue 0 to cpu 0 > igb1: Bound queue 1 to cpu 1 > igb1: Bound queue 2 to cpu 2 > igb1: Bound queue 3 to cpu 3 > pcib7: <ACPI PCI-PCI bridge> irq 16 at device 28.4 on pci0 > pci7: <ACPI PCI bus> on pcib7 > mps2: <Avago Technologies (LSI) SAS2008> port 0x9000-0x90ff mem 0xf73c0000-0xf73c3fff,0xf7380000-0xf73bffff irq 16 at device 0.0 on pci7 > mps2: Firmware: 20.00.04.00, Driver: 20.00.00.00-fbsd > mps2: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc> > ehci1: <Intel Lynx Point USB 2.0 controller USB-A> mem 0xf7713000-0xf77133ff irq 22 at device 29.0 on pci0 > usbus2: EHCI version 1.0 > usbus2 on ehci1 > isab0: <PCI-ISA bridge> at device 31.0 on pci0
I guess my questions are:
- Am I right to assume the issue is with soon to be failing HDDs?is there anything else I can do to test?
- Is normal procedure just to ignore these errors for now since it's only one error each? and do zpool clear?
- And obviously if errors persist replace the drive?
edit: correct smart test