[Edit] Sorry, this should probably be in the Sorage sub-forum. I mis-posted.
Hey all, so last week I was asking about preventative failure drive replacement, and after getting all of the information together I triggered a replacement of da0 with da8. After resilvering da8 then scrubbing the volume, I had a drive in the good array (da7) that went crazy, throwing all sorts of errors:
zpool status showed a bunch of readwrite errors and a couple checksum errors. I replaced the "failed" da7 disk with da9, resilvered and everything was happy. da7 seems to still be working fine:
Searching around the forums for this error set is tricky, but it seems that it might be a cable problem? Potentially something else, like an actual failure? Thanks for any insight!
Hey all, so last week I was asking about preventative failure drive replacement, and after getting all of the information together I triggered a replacement of da0 with da8. After resilvering da8 then scrubbing the volume, I had a drive in the good array (da7) that went crazy, throwing all sorts of errors:
Code:
May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 54 3f 82 00 00 00 80 00 length 65536 SMID 312 terminated ioc 804b scsi 0 state c xfer 0 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 54 3f 82 88 00 00 80 00 length 65536 SMID 785 terminated ioc 804b scsi 0 state c xfer 0 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 54 3f 82 00 00 00 80 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 54 3f 82 88 00 00 80 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 54 3f 82 00 00 00 80 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 54 3f 82 88 00 00 80 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 54 3f 82 00 00 00 80 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 54 3f 82 88 00 00 80 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 54 3f 82 00 00 00 80 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 54 3f 82 88 00 00 80 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 54 3f 82 00 00 00 80 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Error 5, Retries exhausted May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 54 3f 82 88 00 00 80 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Error 5, Retries exhausted May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 54 3f 83 08 00 00 80 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 74 70 68 90 00 00 10 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 74 70 6a 90 00 00 10 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 54 3f 83 90 00 01 00 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 54 3f 83 08 00 00 80 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 74 70 68 90 00 00 10 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 74 70 6a 90 00 00 10 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 54 3f 83 90 00 01 00 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:45 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 54 3f 83 08 00 00 80 00 May 23 01:16:45 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:45 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) <there are a lot of these lines> May 23 01:16:46 library kernel: (da7:mps1:0:3:0): Retrying command (per sense data) May 23 01:16:46 library kernel: (da7:mps1:0:3:0): READ(10). CDB: 28 00 74 70 6d 87 00 00 01 00 May 23 01:16:46 library kernel: (da7:mps1:0:3:0): CAM status: SCSI Status Error May 23 01:16:46 library kernel: (da7:mps1:0:3:0): SCSI status: Check Condition May 23 01:16:46 library kernel: (da7:mps1:0:3:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable) May 23 01:16:46 library kernel: (da7:mps1:0:3:0): Error 5, Retries exhausted
zpool status showed a bunch of readwrite errors and a couple checksum errors. I replaced the "failed" da7 disk with da9, resilvered and everything was happy. da7 seems to still be working fine:
Code:
[root@library] ~# smartctl -x /dev/da7
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p15 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Blue (SATA 6Gb/s)
Device Model: WDC WD10EZEX-00KUWA0
Serial Number: WD-WCC1S7329586
LU WWN Device Id: 5 0014ee 2b421c8ad
Firmware Version: 15.01H15
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue May 26 11:51:31 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (10500) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 115) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x30b5) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0
3 Spin_Up_Time POS--K 180 175 021 - 1966
4 Start_Stop_Count -O--CK 100 100 000 - 16
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 086 086 000 - 10911
10 Spin_Retry_Count -O--CK 100 253 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 16
192 Power-Off_Retract_Count -O--CK 200 200 000 - 15
193 Load_Cycle_Count -O--CK 200 200 000 - 0
194 Temperature_Celsius -O---K 115 111 000 - 28
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 200 200 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb7 GPL,SL VS 1 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 10876 -
# 2 Extended offline Completed without error 00% 10807 -
# 3 Short offline Completed without error 00% 10781 -
# 4 Short offline Completed without error 00% 10733 -
# 5 Short offline Completed without error 00% 10685 -
# 6 Short offline Completed without error 00% 10637 -
# 7 Short offline Completed without error 00% 10589 -
# 8 Short offline Completed without error 00% 10541 -
# 9 Short offline Completed without error 00% 10493 -
#10 Extended offline Completed without error 00% 10472 -
#11 Short offline Completed without error 00% 10445 -
#12 Short offline Completed without error 00% 10397 -
#13 Short offline Completed without error 00% 10349 -
#14 Short offline Completed without error 00% 10301 -
#15 Short offline Completed without error 00% 10253 -
#16 Short offline Completed without error 00% 10205 -
#17 Short offline Completed without error 00% 10157 -
#18 Short offline Completed without error 00% 10109 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 28 Celsius
Power Cycle Min/Max Temperature: 28/31 Celsius
Lifetime Min/Max Temperature: 22/32 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (334)
Index Estimated Time Temperature Celsius
335 2015-05-26 03:54 31 ************
... ..(171 skipped). .. ************
29 2015-05-26 06:46 31 ************
30 2015-05-26 06:47 30 ***********
... ..( 11 skipped). .. ***********
42 2015-05-26 06:59 30 ***********
43 2015-05-26 07:00 29 **********
... ..( 55 skipped). .. **********
99 2015-05-26 07:56 29 **********
100 2015-05-26 07:57 28 *********
... ..(205 skipped). .. *********
306 2015-05-26 11:23 28 *********
307 2015-05-26 11:24 31 ************
... ..( 26 skipped). .. ************
334 2015-05-26 11:51 31 ************
SCT Error Recovery Control command not supported
Device Statistics (GP Log 0x04) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 1 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 2 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 296933 Vendor specific
[root@library] ~#
Searching around the forums for this error set is tricky, but it seems that it might be a cable problem? Potentially something else, like an actual failure? Thanks for any insight!
Last edited: