Seemingly endless resilver

Status
Not open for further replies.

TeKEffect

Dabbler
Joined
Apr 9, 2015
Messages
17
My server has been resilvering for going on 4 days and it seems to have restarted the process multiple times. When I do a zpool status check everything appears normal. Do I just need to continue waiting or am I missing something?
Thank you for any help

[root@freenas ~]# zpool status -v Volume1
pool: Volume1
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Feb 23 01:59:04 2016
285G scanned out of 4.51T at 20.5M/s, 60h3m to go
51.4G resilvered, 6.16% done
config:

NAME STATE READ WRITE CKSUM
Volume1 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/5b7041aa-75f3-11e4-a204-bc5ff4e7a6c6 ONLINE 0 0 0
gptid/5c0f2bc4-75f3-11e4-a204-bc5ff4e7a6c6 ONLINE 0 0 0
gptid/5cb4bcbb-75f3-11e4-a204-bc5ff4e7a6c6 ONLINE 0 0 0
gptid/7aaf0f06-d763-11e5-9cd6-bc5ff4e7a6c6 ONLINE 0 0 0 (resilvering)
gptid/5e048309-75f3-11e4-a204-bc5ff4e7a6c6 ONLINE 0 0 0

errors: No known data errors
[root@freenas ~]#
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Please refer to the forum rules, conveniently link at the top of every page in red, which request that you post a hardware manifest and complete description of your system. This information assists us in understanding your circumstances and is much more likely to result in some insight into your problem.
 

TeKEffect

Dabbler
Joined
Apr 9, 2015
Messages
17
I'm sorry parts as follows
Motherboard : ASRock E3C226D2I Mini ITX Server Motherboard
Processor: Intel Xeon E3-1220V3 Haswell 3.1GHz 8MB L3 Cache
Memory: 2 X Kingston 8GB 240-Pin DDR3 SDRAM ECC Unbuffered DDR3 1600
Hard Drive: 5 X WD Red 3TB NAS Desktop Hard Disk Drive
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Sounds like you might have another failing drive.
 

TeKEffect

Dabbler
Joined
Apr 9, 2015
Messages
17
I think your right I just got this message this morning

CRITICAL: Device: /dev/ada4, ATA error count increased from 0 to 1

It is still again trying to resilver. Should I restart the server or run any kind of diagnostic? Or just let it go thru its paces

note: the drive that is receiving that error is the new one I added to replace a bad drive
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Post the output of smartctl -x /dev/adawhatever for all drives. It could be a bad cable, too, and SMART will help narrow it down.
 

TeKEffect

Dabbler
Joined
Apr 9, 2015
Messages
17
I'm working now. I will and post the results when I'm home. Thank you for the response
 

TeKEffect

Dabbler
Joined
Apr 9, 2015
Messages
17
Here is the report for the drive reporting an error
Device State: Active (0)
Current Temperature: 35 Celsius
Power Cycle Min/Max Temperature: 23/36 Celsius
Lifetime Min/Max Temperature: 23/36 Celsius
Under/Over Temperature Limit Count: 0/0

SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (8)

Index Estimated Time Temperature Celsius
9 2016-02-24 07:12 ? -
... ..(468 skipped). .. -
0 2016-02-24 15:01 ? -
1 2016-02-24 15:02 23 ****
2 2016-02-24 15:03 34 ***************
3 2016-02-24 15:04 34 ***************
4 2016-02-24 15:05 31 ************
5 2016-02-24 15:06 35 ****************
6 2016-02-24 15:07 32 *************
7 2016-02-24 15:08 36 *****************
8 2016-02-24 15:09 35 ****************

SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)

Device Statistics (GP Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 14 Command failed due to ICRC error
0x0002 2 18688 R_ERR response for data FIS
0x0003 2 18688 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 46669 R_ERR response for non-data FIS
0x0006 2 46609 R_ERR response for device-to-host non-data FIS
0x0007 2 60 R_ERR response for host-to-device non-data FIS
0x0008 2 15758 Device-to-host non-data FIS retries
0x0009 2 38503 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 38495 Device-to-host register FISes sent due to a COMRESET
0x000b 2 55 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 55 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 429180 Vendor specific
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
That's missing a lot of stuff.

If you're using the webGUI shell, stop doing that and use SSH.

If you decide to ignore the above recommendation, at the very least, pipe the output into less so that you can get the whole output:

Code:
smartctl -x /dev/adawhatever | less
 

TeKEffect

Dabbler
Joined
Apr 9, 2015
Messages
17
Ok here is the report copied from PuTTy

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD30EFRX-68EUZN0
Serial Number: WD-WMC4N0438887
LU WWN Device Id: 5 0014ee 058fbf15b
Firmware Version: 82.00A82
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Feb 25 06:32:11 2016 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (40500) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 406) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 100 253 051 - 0
3 Spin_Up_Time POS--K 100 253 021 - 0
4 Start_Stop_Count -O--CK 100 100 000 - 1
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 100 100 000 - 134
10 Spin_Retry_Count -O--CK 100 253 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 1
192 Power-Off_Retract_Count -O--CK 200 200 000 - 0
193 Load_Cycle_Count -O--CK 200 200 000 - 2
194 Temperature_Celsius -O---K 119 113 000 - 31
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 14
200 Multi_Zone_Error_Rate ---R-- 100 253 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning

General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb7 GPL,SL VS 1 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer

ATA_READ_LOG_EXT (addr=0x03:0x00, page=0, n=1) failed: Input/output error
Read SMART Extended Comprehensive Error Log failed

SMART Error Log Version: 1
ATA Error Count: 1
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 103 hours (4 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
10 51 00 18 ab 3f 46 Error: IDNF at LBA = 0x063fab18 = 104835864

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 00 18 ab 3f 46 08 4d+07:13:21.167 WRITE DMA
ef 02 00 00 00 00 40 00 4d+07:13:21.167 SET FEATURES [Enable write cache]
ef aa 00 00 00 00 40 00 4d+07:13:21.166 SET FEATURES [Enable read look-ahead]

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 34 Celsius
Power Cycle Min/Max Temperature: 23/36 Celsius
Lifetime Min/Max Temperature: 23/36 Celsius
Under/Over Temperature Limit Count: 0/0

SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (9)

Index Estimated Time Temperature Celsius
10 2016-02-24 22:35 ? -
... ..(467 skipped). .. -
0 2016-02-25 06:23 ? -
1 2016-02-25 06:24 23 ****
2 2016-02-25 06:25 34 ***************
3 2016-02-25 06:26 34 ***************
4 2016-02-25 06:27 31 ************
5 2016-02-25 06:28 35 ****************
6 2016-02-25 06:29 32 *************
7 2016-02-25 06:30 36 *****************
8 2016-02-25 06:31 35 ****************
9 2016-02-25 06:32 34 ***************

SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)

Device Statistics (GP Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 14 Command failed due to ICRC error
0x0002 2 18718 R_ERR response for data FIS
0x0003 2 18718 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 56675 R_ERR response for non-data FIS
0x0006 2 56598 R_ERR response for device-to-host non-data FIS
0x0007 2 77 R_ERR response for host-to-device non-data FIS
0x0008 2 8393 Device-to-host non-data FIS retries
0x0009 2 41163 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 41154 Device-to-host register FISes sent due to a COMRESET
0x000b 2 72 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 72 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 484480 Vendor specific
 

TeKEffect

Dabbler
Joined
Apr 9, 2015
Messages
17
here is the report for all in txt. It exceeded the char limit for a post
 

Attachments

  • AdaALL.txt
    0 bytes · Views: 202

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
here is the report for all in txt. It exceeded the char limit for a post
Heads up, the file is empty. Also, when posting CLI outputs, please use CODE tags and not QUOTE tags, to preserve formatting.

In any case,...
Ok here is the report copied from PuTTy

14 UDMA CRC errors in 134 hours is definitely indicative of interface issues. Mostly bad cabling (including backplanes, if applicable), but it could be something nastier. Definitely start with the simple troubleshooting steps and replace the cable.
 

TeKEffect

Dabbler
Joined
Apr 9, 2015
Messages
17
Sorry I had a family emergency yesterday. I replaced the cable right now and after a restart most of the alerts have gone away. I'm going to watch the resilver and see if it improves. Here is the report on the drive with the new cable.

Code:
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WMC4N0EAJRL9
LU WWN Device Id: 5 0014ee 604faa554
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Feb 26 16:36:51 2016 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (40860) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 410) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x703d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
  3 Spin_Up_Time            POS--K   243   182   021    -    2841
  4 Start_Stop_Count        -O--CK   100   100   000    -    22
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   100   253   000    -    0
  9 Power_On_Hours          -O--CK   086   086   000    -    10844
10 Spin_Retry_Count        -O--CK   100   253   000    -    0
11 Calibration_Retry_Count -O--CK   100   253   000    -    0
12 Power_Cycle_Count       -O--CK   100   100   000    -    22
192 Power-Off_Retract_Count -O--CK   200   200   000    -    16
193 Load_Cycle_Count        -O--CK   200   200   000    -    179
194 Temperature_Celsius     -O---K   122   097   000    -    28
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   100   253   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   100   253   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb7  GPL,SL  VS       1  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    28 Celsius
Power Cycle Min/Max Temperature:     25/28 Celsius
Lifetime    Min/Max Temperature:      2/53 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (109)

Index    Estimated Time   Temperature Celsius
110    2016-02-26 08:39    35  ****************
...    ..( 52 skipped).    ..  ****************
163    2016-02-26 09:32    35  ****************
164    2016-02-26 09:33    34  ***************
...    ..( 39 skipped).    ..  ***************
204    2016-02-26 10:13    34  ***************
205    2016-02-26 10:14    33  **************
...    ..( 46 skipped).    ..  **************
252    2016-02-26 11:01    33  **************
253    2016-02-26 11:02    32  *************
...    ..(297 skipped).    ..  *************
  73    2016-02-26 16:00    32  *************
  74    2016-02-26 16:01     ?  -
  75    2016-02-26 16:02    25  ******
  76    2016-02-26 16:03    25  ******
  77    2016-02-26 16:04    26  *******
  78    2016-02-26 16:05    27  ********
  79    2016-02-26 16:06    27  ********
  80    2016-02-26 16:07    27  ********
  81    2016-02-26 16:08    28  *********
  82    2016-02-26 16:09    35  ****************
...    ..(  7 skipped).    ..  ****************
  90    2016-02-26 16:17    35  ****************
  91    2016-02-26 16:18    34  ***************
...    ..( 13 skipped).    ..  ***************
105    2016-02-26 16:32    34  ***************
106    2016-02-26 16:33    35  ****************
...    ..(  2 skipped).    ..  ****************
109    2016-02-26 16:36    35  ****************

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

Device Statistics (GP Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            2  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4          442  Vendor specific

[root@freenas] ~#
 

TeKEffect

Dabbler
Joined
Apr 9, 2015
Messages
17
Resilver completed!

I am getting this error though

CRITICAL: The volume Volume1 (ZFS) state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
To have a chance of helping, we need the smartctl output for all your drives. The file you posted before was empty.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I don't much like the look of ada1, but I think the error report was probably caused by ada4 (the new drive). Whether the source is the drive, or something else, is not obvious to me. Are all drives connected to the same controller?

EDIT: your drives are getting cooked at 53C.
 

TeKEffect

Dabbler
Joined
Apr 9, 2015
Messages
17
They are all connected to the same controller
I have them in a Lian Li PC-Q25B Mini ITX case. I'll try to figure out a creative cooling solution
 

TeKEffect

Dabbler
Joined
Apr 9, 2015
Messages
17
Just to update anyone watching the thread. After replacing the cables and a couple restarts everything is working perfectly. Thanks for the help. The last thing I would have have suspected was a cable
 
Status
Not open for further replies.
Top