Hard drive Failure? Help please.

Status
Not open for further replies.

eddie200112

Contributor
Joined
Mar 17, 2015
Messages
190
OK switched drive to a new sata cable and new power cable as well and the error message is no longer showing on the GUI. Should I consider this resolved if I don't see the error return in a couple days.
Below is the SMART output of the drive now that its on the new sata cable.

Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-RELEASE-p3 amd64] (local build)                                                        
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org                                                        
                                                                                                                                   
=== START OF INFORMATION SECTION ===                                                                                               
Model Family:     Seagate NAS HDD                                                                                                  
Device Model:     ST4000VN000-1H4168                                                                                               
Serial Number:    Z303A1YA                                                                                                         
LU WWN Device Id: 5 000c50 07a985576                                                                                               
Firmware Version: SC46                                                                                                             
User Capacity:    4,000,787,030,016 bytes [4.00 TB]                                                                                
Sector Sizes:     512 bytes logical, 4096 bytes physical                                                                           
Rotation Rate:    5900 rpm                                                                                                         
Form Factor:      3.5 inches                                                                                                       
Device is:        In smartctl database [for details use: -P show]                                                                  
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b                                                                              
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)                                                                           
Local Time is:    Wed Jun  8 21:11:21 2016 CDT                                                                                     
SMART support is: Available - device has SMART capability.                                                                         
SMART support is: Enabled                                                                                                          
AAM feature is:   Unavailable                                                                                                      
APM feature is:   Disabled                                                                                                         
Rd look-ahead is: Enabled                                                                                                          
Write cache is:   Enabled                                                                                                          
ATA Security is:  Disabled, frozen [SEC2]                                                                                          
Wt Cache Reorder: Enabled                                                                                                          
                                                                                                                                   
=== START OF READ SMART DATA SECTION ===                                                                                           
SMART overall-health self-assessment test result: PASSED                                                                           
                                                                                                                                   
General SMART Values:                                                                                                              
Offline data collection status:  (0x82) Offline data collection activity                                                           
                                        was completed without error.                                                               
                                        Auto Offline Data Collection: Enabled.                                                     
Self-test execution status:      (   0) The previous self-test routine completed                                                   
                                        without error or no self-test has ever                                                     
                                        been run.                                                                                  
Total time to complete Offline                                                                                                     
data collection:                (  117) seconds.                                                                                   
Offline data collection                                                                                                            
capabilities:                    (0x7b) SMART execute Offline immediate.                                                           
                                        Auto Offline data collection on/off support.                                               
                                        Suspend Offline collection upon new                                                        
                                        command.                                                                                   
                                        Offline surface scan supported.                                                            
                                        Self-test supported.                                                                       
                                        Conveyance Self-test supported.                                                            
                                        Selective Self-test supported.                                                             
SMART capabilities:            (0x0003) Saves SMART data before entering                                                           
                                        power-saving mode.                     
                                       Supports SMART auto save timer.                                                            
Error logging capability:        (0x01) Error logging supported.                                                                   
                                        General Purpose Logging supported.                                                         
Short self-test routine                                                                                                            
recommended polling time:        (   1) minutes.                                                                                   
Extended self-test routine                                                                                                         
recommended polling time:        ( 509) minutes.                                                                                   
Conveyance self-test routine                                                                                                       
recommended polling time:        (   2) minutes.                                                                                   
SCT capabilities:              (0x10bd) SCT Status supported.                                                                      
                                        SCT Error Recovery Control supported.                                                      
                                        SCT Feature Control supported.                                                             
                                        SCT Data Table supported.                                                                  
                                                                                                                                   
SMART Attributes Data Structure revision number: 10                                                                                
Vendor Specific SMART Attributes with Thresholds:                                                                                  
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE                                                             
  1 Raw_Read_Error_Rate     POSR--   114   100   006    -    75921096                                                              
  3 Spin_Up_Time            PO----   091   091   000    -    0                                                                     
  4 Start_Stop_Count        -O--CK   100   100   020    -    27                                                                    
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0                                                                     
  7 Seek_Error_Rate         POSR--   076   060   030    -    46006881                                                              
  9 Power_On_Hours          -O--CK   090   090   000    -    9544                                                                  
10 Spin_Retry_Count        PO--C-   100   100   097    -    0                                                                     
12 Power_Cycle_Count       -O--CK   100   100   020    -    21                                                                    
184 End-to-End_Error        -O--CK   100   100   099    -    0                                                                     
187 Reported_Uncorrect      -O--CK   100   100   000    -    0                                                                     
188 Command_Timeout         -O--CK   100   001   000    -    339                                                                   
189 High_Fly_Writes         -O-RCK   100   100   000    -    0                                                                     
190 Airflow_Temperature_Cel -O---K   075   066   045    -    25 (Min/Max 21/25)                                                    
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0                                                                     
192 Power-Off_Retract_Count -O--CK   100   100   000    -    19                                                                    
193 Load_Cycle_Count        -O--CK   100   100   000    -    27                                                                    
194 Temperature_Celsius     -O---K   025   040   000    -    25 (0 17 0 0 0)                                                       
197 Current_Pending_Sector  -O--C-   100   100   000    -    0                                                                     
198 Offline_Uncorrectable   ----C-   100   100   000    -    0                                                                     
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0                                                                     
                            ||||||_ K auto-keep                                                                                    
                            |||||__ C event count                                                                                  
                            ||||___ R error rate                                                                                   
                            |||____ S speed/performance                                                                            
                            ||_____ O updated online                                                                               
                            |______ P prefailure warning                                                                           
                                                                                                                                   
General Purpose Log Directory Version 1                                                                                            
SMART           Log Directory Version 1 [multi-sector log support]                                                                 
Address    Access  R/W   Size  Description                          
0x00       GPL,SL  R/O      1  Log Directory                                                                                       
0x01           SL  R/O      1  Summary SMART error log                                                                             
0x02           SL  R/O      5  Comprehensive SMART error log                                                                       
0x03       GPL     R/O      5  Ext. Comprehensive SMART error log                                                                  
0x06           SL  R/O      1  SMART self-test log                                                                                 
0x07       GPL     R/O      1  Extended self-test log                                                                              
0x09           SL  R/W      1  Selective self-test log                                                                             
0x10       GPL     R/O      1  SATA NCQ Queued Error log                                                                           
0x11       GPL     R/O      1  SATA Phy Event Counters log                                                                         
0x15       GPL     R/W      1  SATA Rebuild Assist log                                                                             
0x21       GPL     R/O      1  Write stream error log                                                                              
0x22       GPL     R/O      1  Read stream error log                                                                               
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log                                                                            
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log                                                                            
0xa1       GPL,SL  VS      20  Device vendor specific log                                           
 

eddie200112

Contributor
Joined
Mar 17, 2015
Messages
190
BTW this is the second bad cable connection ive had in a year of operation. Is that typical? Ive rarely ever had to replace a SATA cable on the desktops ive built over the years. Am I just buying crappy cables? Is there a brand or grade I should be buying that lasts longer?
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
For safe measure, I would recommend you run at least a SMART Short (maybe even a Long) test on that drive and then compare the output to the one you just posted. At the very least you do not want to see "188 Command_Timeout" go any higher than what it currently shows. I don't think that it will get reset, so you may want to note that somewhere. That way if you are ever troubleshooting this drive again you won't need to worry about that if the value is the same.

*** Maybe someone else can clarify if it may get reset (doubting it, but not 100% sure)?

BTW this is the second bad cable connection ive had in a year of operation. Is that typical?
Not that typical, but a whole lot better than having to replace an entire drive... Makes you wonder how many people may have replaced drives when it was just the cabling....
 

eddie200112

Contributor
Joined
Mar 17, 2015
Messages
190
While I have you all here I got this email 2 nights ago. Not sure what it means.

freenas.local kernel log messages:
> SMP: AP CPU #3 Launched!
> SMP: AP CPU #7 Launched!
> SMP: AP CPU #5 Launched!
> SMP: AP CPU #4 Launched!
> Timecounter "TSC-low" frequency 1750039400 Hz quality 1000
> vboxdrv: fAsync=0 offMin=0x405 offMax=0x122f
> epair0a: promiscuous mode enabled

-- End of security output --
 

eddie200112

Contributor
Joined
Mar 17, 2015
Messages
190
Arrgh OK this morning I logged into the GUI and now have this error.
"CRITICAL:June 9, 2016, 3:02 a.m. - The volume Ravenrock (ZFS) state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected."

And overnight I received these emails.
Code:
freenas.local kernel log messages:
> uhub0: 2 ports with 2 removable, self powered
> da4: Serial Number      WD-WMC5D0D287EP
> da4: 600.000MB/s transfers
> da4: Command Queueing enabled
> da4: 3815447MB (7814037168 512 byte sectors)
> da0 at mpr0 bus 0 scbus0 target 0 lun 0
> da5 at umass-sim0 bus 0 scbus12 target 0 lun 0
> da5: <MUSHKIN MKNUFDVS16GB PMAP> Removable Direct Access SPC-4 SCSI device
> da5: Serial Number 070B52306EA99E15
> da5: 400.000MB/s transfers
> da5: 15120MB (30965760 512 byte sectors)
> da5: quirks=0x2<NO_6_BYTE>
> ada4 at ahcich4 bus 0 scbus5 target 0 lun 0
> ada4: <ST4000VN000-1H4168 SC46> ACS-2 ATA SATA 3.x device
> ada4: Serial Number Z303A1YA
> ada4: Previously was known as ad12
> ada5 at ahcich5 bus 0 scbus6 target 0 lun 0
> ada5: Serial Number WD-WMC5D0D8LU59
> ada5: Previously was known as ad14
> ada6 at ahcich7 bus 0 scbus8 target 0 lun 0
> ada6: <WDC WD4003FZEX-00Z4SA0 01.01A01> ACS-2 ATA SATA 3.x device
> ada6: Serial Number WD-WMC5D0D41JZ8
> ada6: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
> ada6: Command Queueing enabled
> ada6: 3815447MB (7814037168 512 byte sectors)
> ada6: quirks=0x1<4K>
> ada6: Previously was known as ad18
> Timecounter "TSC-low" frequency 1750038856 Hz quality 1000
> vboxdrv: fAsync=0 offMin=0x383 offMax=0x12f2
> GEOM_ELI: Device ada6p1.eli created.
> GEOM_ELI: Device ada5p1.eli created.
> GEOM_ELI: Device ada4p1.eli created.
> igb0: link state changed to DOWN
> arp: 192.168.0.198 moved from 02:ff:20:00:07:0a to 00:25:90:fc:be:18 on epair2b
> igb0: link state changed to DOWN
> igb0: link state changed to UP

-- End of security output --

AND

"The volume Ravenrock (ZFS) state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected."

Is my problem not resolved?
 

eddie200112

Contributor
Joined
Mar 17, 2015
Messages
190
Ran a new long SMART test on that drive last night as well. Here are the results.
Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-RELEASE-p3 amd64] (local build)    
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org    
                                                                               
=== START OF INFORMATION SECTION ===                                           
Model Family:     Seagate NAS HDD                                              
Device Model:     ST4000VN000-1H4168                                           
Serial Number:    Z303A1YA                                                     
LU WWN Device Id: 5 000c50 07a985576                                           
Firmware Version: SC46                                                         
User Capacity:    4,000,787,030,016 bytes [4.00 TB]                            
Sector Sizes:     512 bytes logical, 4096 bytes physical                       
Rotation Rate:    5900 rpm                                                     
Form Factor:      3.5 inches                                                   
Device is:        In smartctl database [for details use: -P show]              
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b                          
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)                       
Local Time is:    Thu Jun  9 06:52:40 2016 CDT                                 
SMART support is: Available - device has SMART capability.                     
SMART support is: Enabled                                                      
AAM feature is:   Unavailable                                                  
APM feature is:   Disabled                                                     
Rd look-ahead is: Enabled                                                      
Write cache is:   Enabled          
ATA Security is:  Disabled, frozen [SEC2]                                      
Wt Cache Reorder: Enabled                                                      
                                                                               
=== START OF READ SMART DATA SECTION ===                                       
SMART overall-health self-assessment test result: PASSED                       
                                                                               
General SMART Values:                                                          
Offline data collection status:  (0x82) Offline data collection activity       
                                        was completed without error.           
                                        Auto Offline Data Collection: Enabled. 
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.                              
Total time to complete Offline                                                 
data collection:                (  117) seconds.                               
Offline data collection                                                        
capabilities:                    (0x7b) SMART execute Offline immediate.       
                                        Auto Offline data collection on/off supp
ort.                                                                           
                                        Suspend Offline collection upon new    
                                        command.                               
                                        Offline surface scan supported. 
             Self-test supported.                   
                                        Conveyance Self-test supported.        
                                        Selective Self-test supported.         
SMART capabilities:            (0x0003) Saves SMART data before entering       
                                        power-saving mode.                     
                                        Supports SMART auto save timer.        
Error logging capability:        (0x01) Error logging supported.               
                                        General Purpose Logging supported.     
Short self-test routine                                                        
recommended polling time:        (   1) minutes.                               
Extended self-test routine                                                     
recommended polling time:        ( 509) minutes.                               
Conveyance self-test routine                                                   
recommended polling time:        (   2) minutes.                               
SCT capabilities:              (0x10bd) SCT Status supported.                  
                                        SCT Error Recovery Control supported.  
                                        SCT Feature Control supported.         
                                        SCT Data Table supported.              
                                                                               
SMART Attributes Data Structure revision number: 10                            
Vendor Specific SMART Attributes with Thresholds:                              
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE    
1 Raw_Read_Error_Rate     POSR--   114   100   006    -    77070384          
  3 Spin_Up_Time            PO----   091   091   000    -    0                 
  4 Start_Stop_Count        -O--CK   100   100   020    -    27                
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0                 
  7 Seek_Error_Rate         POSR--   076   060   030    -    46052651          
  9 Power_On_Hours          -O--CK   090   090   000    -    9554              
10 Spin_Retry_Count        PO--C-   100   100   097    -    0                 
12 Power_Cycle_Count       -O--CK   100   100   020    -    21                
184 End-to-End_Error        -O--CK   100   100   099    -    0                 
187 Reported_Uncorrect      -O--CK   100   100   000    -    0                 
188 Command_Timeout         -O--CK   100   001   000    -    339               
189 High_Fly_Writes         -O-RCK   100   100   000    -    0                 
190 Airflow_Temperature_Cel -O---K   073   066   045    -    27 (Min/Max 21/28)
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0                 
192 Power-Off_Retract_Count -O--CK   100   100   000    -    19                
193 Load_Cycle_Count        -O--CK   100   100   000    -    27                
194 Temperature_Celsius     -O---K   027   040   000    -    27 (0 17 0 0 0)   
197 Current_Pending_Sector  -O--C-   100   100   000    -    0                 
198 Offline_Uncorrectable   ----C-   100   100   000    -    0                 
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0                 
                            ||||||_ K auto-keep                                
                            |||||__ C event count                              
                            ||||___ R error rate                      
|||____ S speed/performance                        
                            ||_____ O updated online                           
                            |______ P prefailure warning                       
                                                                               
General Purpose Log Directory Version 1                                        
SMART           Log Directory Version 1 [multi-sector log support]             
Address    Access  R/W   Size  Description                                     
0x00       GPL,SL  R/O      1  Log Directory                                   
0x01           SL  R/O      1  Summary SMART error log                         
0x02           SL  R/O      5  Comprehensive SMART error log                   
0x03       GPL     R/O      5  Ext. Comprehensive SMART error log              
0x06           SL  R/O      1  SMART self-test log                             
0x07       GPL     R/O      1  Extended self-test log                          
0x09           SL  R/W      1  Selective self-test log                         
0x10       GPL     R/O      1  SATA NCQ Queued Error log                       
0x11       GPL     R/O      1  SATA Phy Event Counters log                     
0x15       GPL     R/W      1  SATA Rebuild Assist log                         
0x21       GPL     R/O      1  Write stream error log                          
0x22       GPL     R/O      1  Read stream error log                           
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log                        
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log       
0xa1       GPL,SL  VS      20  Device vendor specific log                      
0xa2       GPL     VS    4496  Device vendor specific log                      
0xa8       GPL,SL  VS     129  Device vendor specific log                      
0xa9       GPL,SL  VS       1  Device vendor specific log                      
0xab       GPL     VS       1  Device vendor specific log                      
0xb0       GPL     VS    5176  Device vendor specific log                      
0xbe-0xbf  GPL     VS   65535  Device vendor specific log                      
0xc0       GPL,SL  VS       1  Device vendor specific log                      
0xc1       GPL,SL  VS      10  Device vendor specific log                      
0xc3       GPL,SL  VS       8  Device vendor specific log                      
0xc4       GPL,SL  VS       5  Device vendor specific log                      
0xd1       GPL,SL  VS       8  Device vendor specific log                      
0xe0       GPL,SL  R/W      1  SCT Command/Status                              
0xe1       GPL,SL  R/W      1  SCT Data Transfer                               
                                                                               
SMART Extended Comprehensive Error Log Version: 1 (5 sectors)                  
No Errors Logged                                                               
                                                                               
SMART Extended Self-test Log Version: 1 (1 sectors)                            
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA
_of_first_error                                                                
# 1  Extended offline    Completed without error       00%      9553         - 
# 2  Extended offline    Interrupted (host reset)      00%      9522         - 
# 3  Extended offline    Completed without error       00%      9472         - 
# 4  Short offline       Completed without error       00%      9429         - 
# 5  Short offline       Completed without error       00%      9237         - 
# 6  Short offline       Completed without error       00%      9045         - 
# 7  Short offline       Completed without error       00%      8853         - 
# 8  Extended offline    Completed without error       00%      8788         - 
# 9  Short offline       Completed without error       00%      8685         - 
#10  Short offline       Completed without error       00%      8517         - 
#11  Short offline       Completed without error       00%      8325         - 
#12  Short offline       Completed without error       00%      8133         - 
#13  Extended offline    Completed without error       00%      8068         - 
#14  Short offline       Completed without error       00%      7965         - 
#15  Short offline       Completed without error       00%      7773         - 
#16  Short offline       Completed without error       00%      7606         - 
#17  Short offline       Completed without error       00%      7415         - 
#18  Extended offline    Completed without error       00%      7327         - 
#19  Short offline       Completed without error       00%      7223         - 
                                                                               
SMART Selective self-test log data structure revision number 1                 
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS                    
1        0        0  Not_testing                                           
    2        0        0  Not_testing                                           
    3        0        0  Not_testing                                           
    4        0        0  Not_testing                                           
    5        0        0  Not_testing                                           
Selective self-test flags (0x0):                                               
  After scanning selected spans, do NOT read-scan remainder of disk.           
If Selective self-test is pending on power-up, resume after 0 minute delay.    
                                                                               
SCT Status Version:                  3                                         
SCT Version (vendor specific):       522 (0x020a)                              
SCT Support Level:                   1                                         
Device State:                        Active (0)                                
Current Temperature:                    28 Celsius                             
Power Cycle Min/Max Temperature:     21/28 Celsius                             
Lifetime    Min/Max Temperature:     17/49 Celsius                             
Under/Over Temperature Limit Count:   0/0                                      
                                                                               
SCT Temperature History Version:     2                                         
Temperature Sampling Period:         1 minute                                  
Temperature Logging Interval:        94 minutes                                
Min/Max recommended Temperature:      1/61 Celsius           
Min/Max Temperature Limit:            2/60 Celsius                             
Temperature History Size (Index):    128 (81)                                  
                                                                               
Index    Estimated Time   Temperature Celsius                                  
  82    2016-05-31 22:52    30  ***********                                    
...    ..(  4 skipped).    ..  ***********                                    
  87    2016-06-01 06:42    30  ***********                                    
  88    2016-06-01 08:16    31  ************                                   
  89    2016-06-01 09:50    28  *********                                      
  90    2016-06-01 11:24    27  ********                                       
  91    2016-06-01 12:58    28  *********                                      
  92    2016-06-01 14:32    28  *********                                      
  93    2016-06-01 16:06    29  **********                                     
...    ..(  5 skipped).    ..  **********                                     
  99    2016-06-02 01:30    29  **********                                     
100    2016-06-02 03:04    30  ***********                                    
101    2016-06-02 04:38    29  **********                                     
102    2016-06-02 06:12    28  *********                                      
103    2016-06-02 07:46    28  *********                                      
104    2016-06-02 09:20    29  **********                                     
105    2016-06-02 10:54    28  *********                                      
106    2016-06-02 12:28    29  **********                                     
...    ..(  5 skipped).    ..  **********    
74    2016-06-08 18:52     ?  -                                              
  75    2016-06-08 20:26    21  **                                             
  76    2016-06-08 22:00    26  *******                                        
  77    2016-06-08 23:34    27  ********                                       
  78    2016-06-09 01:08    28  *********                                      
...    ..(  2 skipped).    ..  *********                                      
  81    2016-06-09 05:50    28  *********                                      
                                                                               
SCT Error Recovery Control:                                                    
           Read: Disabled                                                      
          Write: Disabled                                                      
                                                                               
Device Statistics (GP/SMART Log 0x04) not supported                            
                                                                               
SATA Phy Event Counters (GP Log 0x11)                                          
ID      Size     Value  Description                                            
0x000a  2            3  Device-to-host register FISes sent due to a COMRESET   
0x0001  2            0  Command failed due to ICRC error                       
0x0003  2            0  R_ERR response for device-to-host data FIS             
0x0004  2            0  R_ERR response for host-to-device data FIS             
0x0006  2            0  R_ERR response for device-to-host non-data FIS         
0x0007  2            0  R_ERR response for host-to-device non-data FIS      
 

eddie200112

Contributor
Joined
Mar 17, 2015
Messages
190
Damn just got the email.
"The volume Ravenrock (ZFS) state is DEGRADED: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected."

Now what? replace the drive?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Lets take a step back for a second... How do you know the drive you are conducting the SMART test for is the drive causing the pool error? I know this thread started with an obvious drive error message but now since you have relocated the drive, and the fact that the errors have not increased as indicated by the SMART test, I'm not 100% positive it's the drive you are looking at. I see that you could purchase another drive and replace the suspect drive and cross your fingers that the problem goes away.

What email message are you getting (please cut and paste it). Also, list your zpool status again.
 

eddie200112

Contributor
Joined
Mar 17, 2015
Messages
190
Checking status of zfs pools:
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
Ravenrock 43.5T 32.0T 11.5T - 25% 73% 1.00x DEGRADED /mnt
freenas-boot 14.8G 3.74G 11.0G - - 25% 1.00x ONLINE -

pool: Ravenrock
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub repaired 0 in 14h9m with 0 errors on Thu Jun 2 17:09:07 2016
config:

NAME STATE READ WRITE CKSUM
Ravenrock DEGRADED 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/deb649ef-db4a-11e4-9f43-002590fcbe18 ONLINE 0 0 0
gptid/df087aaf-db4a-11e4-9f43-002590fcbe18 ONLINE 0 0 0
gptid/df5c5984-db4a-11e4-9f43-002590fcbe18 ONLINE 0 0 0
gptid/dfba7974-db4a-11e4-9f43-002590fcbe18 ONLINE 0 0 0
gptid/e00cf41a-db4a-11e4-9f43-002590fcbe18 ONLINE 0 0 0
gptid/e0704d7c-db4a-11e4-9f43-002590fcbe18 ONLINE 0 0 0
raidz2-1 DEGRADED 0 0 0
gptid/3990af5f-ed67-11e4-9656-002590fcbe18 ONLINE 0 0 0
gptid/39e8d931-ed67-11e4-9656-002590fcbe18 ONLINE 0 0 0
gptid/3a4548fa-ed67-11e4-9656-002590fcbe18 ONLINE 0 0 0
gptid/3a9e01b9-ed67-11e4-9656-002590fcbe18 ONLINE 0 0 0
gptid/3af2ef9e-ed67-11e4-9656-002590fcbe18 ONLINE 0 0 0
gptid/3b5e54e4-ed67-11e4-9656-002590fcbe18 DEGRADED 0 0 2.20K too many errors

errors: No known data errors

-- End of daily output --
 

eddie200112

Contributor
Joined
Mar 17, 2015
Messages
190
And this message a couple days before that right after I changed the cables around.

freenas.local kernel log messages:
> uhub0: 2 ports with 2 removable, self powered
> da4: Serial Number WD-WMC5D0D287EP
> da4: 600.000MB/s transfers
> da4: Command Queueing enabled
> da4: 3815447MB (7814037168 512 byte sectors)
> da0 at mpr0 bus 0 scbus0 target 0 lun 0
> da5 at umass-sim0 bus 0 scbus12 target 0 lun 0
> da5: <MUSHKIN MKNUFDVS16GB PMAP> Removable Direct Access SPC-4 SCSI device
> da5: Serial Number 070B52306EA99E15
> da5: 400.000MB/s transfers
> da5: 15120MB (30965760 512 byte sectors)
> da5: quirks=0x2<NO_6_BYTE>
> ada4 at ahcich4 bus 0 scbus5 target 0 lun 0
> ada4: <ST4000VN000-1H4168 SC46> ACS-2 ATA SATA 3.x device
> ada4: Serial Number Z303A1YA
> ada4: Previously was known as ad12
> ada5 at ahcich5 bus 0 scbus6 target 0 lun 0
> ada5: Serial Number WD-WMC5D0D8LU59
> ada5: Previously was known as ad14
> ada6 at ahcich7 bus 0 scbus8 target 0 lun 0
> ada6: <WDC WD4003FZEX-00Z4SA0 01.01A01> ACS-2 ATA SATA 3.x device
> ada6: Serial Number WD-WMC5D0D41JZ8
> ada6: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
> ada6: Command Queueing enabled
> ada6: 3815447MB (7814037168 512 byte sectors)
> ada6: quirks=0x1<4K>
> ada6: Previously was known as ad18
> Timecounter "TSC-low" frequency 1750038856 Hz quality 1000
> vboxdrv: fAsync=0 offMin=0x383 offMax=0x12f2
> GEOM_ELI: Device ada6p1.eli created.
> GEOM_ELI: Device ada5p1.eli created.
> GEOM_ELI: Device ada4p1.eli created.
> igb0: link state changed to DOWN
> arp: 192.168.0.198 moved from 02:ff:20:00:07:0a to 00:25:90:fc:be:18 on epair2b
> igb0: link state changed to DOWN
> igb0: link state changed to UP

-- End of security output --
 

eddie200112

Contributor
Joined
Mar 17, 2015
Messages
190
And this a couple days before that.
freenas.local kernel log messages:
> SMP: AP CPU #3 Launched!
> SMP: AP CPU #7 Launched!
> SMP: AP CPU #5 Launched!
> SMP: AP CPU #4 Launched!
> Timecounter "TSC-low" frequency 1750039400 Hz quality 1000
> vboxdrv: fAsync=0 offMin=0x405 offMax=0x122f
> epair0a: promiscuous mode enabled

-- End of security output --
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
Please post results of #
glabel status (post in code tags to keep formatting)

You need to find the label and serial number for your drive shown above as
gptid/3b5e54e4-ed67-11e4-9656-002590fcbe18

The glabel command will show this...
 

eddie200112

Contributor
Joined
Mar 17, 2015
Messages
190
Sorry. How do I do that?
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
From the CLI (command line interface)
 

eddie200112

Contributor
Joined
Mar 17, 2015
Messages
190
Sure but what is the command? just "glabel status" ?
 

eddie200112

Contributor
Joined
Mar 17, 2015
Messages
190
Code:
[root@freenas ~]# glabel status                                                
                                      Name  Status  Components                 
gptid/3990af5f-ed67-11e4-9656-002590fcbe18     N/A  da0p2                      
gptid/39e8d931-ed67-11e4-9656-002590fcbe18     N/A  da1p2                      
gptid/3a4548fa-ed67-11e4-9656-002590fcbe18     N/A  da2p2                      
gptid/3a9e01b9-ed67-11e4-9656-002590fcbe18     N/A  da3p2                      
gptid/3af2ef9e-ed67-11e4-9656-002590fcbe18     N/A  da4p2                      
gptid/74cc0087-daf8-11e4-b341-002590fcbe18     N/A  da5p1                      
gptid/74d85c83-daf8-11e4-b341-002590fcbe18     N/A  da5p2                      
gptid/dfba7974-db4a-11e4-9f43-002590fcbe18     N/A  ada0p2                     
gptid/df5c5984-db4a-11e4-9f43-002590fcbe18     N/A  ada1p2                     
gptid/e0704d7c-db4a-11e4-9f43-002590fcbe18     N/A  ada2p2                     
gptid/e00cf41a-db4a-11e4-9f43-002590fcbe18     N/A  ada3p2                     
gptid/3b5e54e4-ed67-11e4-9656-002590fcbe18     N/A  ada4p2                     
gptid/df087aaf-db4a-11e4-9f43-002590fcbe18     N/A  ada5p2                     
gptid/deb649ef-db4a-11e4-9f43-002590fcbe18     N/A  ada6p2                     
[root@freenas ~]#                                             
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
NAME STATE READ WRITE CKSUM
Ravenrock DEGRADED 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/deb649ef-db4a-11e4-9f43-002590fcbe18 ONLINE 0 0 0
gptid/df087aaf-db4a-11e4-9f43-002590fcbe18 ONLINE 0 0 0
gptid/df5c5984-db4a-11e4-9f43-002590fcbe18 ONLINE 0 0 0
gptid/dfba7974-db4a-11e4-9f43-002590fcbe18 ONLINE 0 0 0
gptid/e00cf41a-db4a-11e4-9f43-002590fcbe18 ONLINE 0 0 0
gptid/e0704d7c-db4a-11e4-9f43-002590fcbe18 ONLINE 0 0 0
raidz2-1 DEGRADED 0 0 0
gptid/3990af5f-ed67-11e4-9656-002590fcbe18 ONLINE 0 0 0
gptid/39e8d931-ed67-11e4-9656-002590fcbe18 ONLINE 0 0 0
gptid/3a4548fa-ed67-11e4-9656-002590fcbe18 ONLINE 0 0 0
gptid/3a9e01b9-ed67-11e4-9656-002590fcbe18 ONLINE 0 0 0
gptid/3af2ef9e-ed67-11e4-9656-002590fcbe18 ONLINE 0 0 0
gptid/3b5e54e4-ed67-11e4-9656-002590fcbe18 DEGRADED 0 0 2.20K too many errors

errors: No known data errors

-- End of daily output --
Your drive has now been identified as;
  • [root@freenas ~]# glabel status
  • Name Status Components
  • gptid/3990af5f-ed67-11e4-9656-002590fcbe18 N/A da0p2
  • gptid/39e8d931-ed67-11e4-9656-002590fcbe18 N/A da1p2
  • gptid/3a4548fa-ed67-11e4-9656-002590fcbe18 N/A da2p2
  • gptid/3a9e01b9-ed67-11e4-9656-002590fcbe18 N/A da3p2
  • gptid/3af2ef9e-ed67-11e4-9656-002590fcbe18 N/A da4p2
  • gptid/74cc0087-daf8-11e4-b341-002590fcbe18 N/A da5p1
  • gptid/74d85c83-daf8-11e4-b341-002590fcbe18 N/A da5p2
  • gptid/dfba7974-db4a-11e4-9f43-002590fcbe18 N/A ada0p2
  • gptid/df5c5984-db4a-11e4-9f43-002590fcbe18 N/A ada1p2
  • gptid/e0704d7c-db4a-11e4-9f43-002590fcbe18 N/A ada2p2
  • gptid/e00cf41a-db4a-11e4-9f43-002590fcbe18 N/A ada3p2
  • gptid/3b5e54e4-ed67-11e4-9656-002590fcbe18 N/A ada4p2
  • gptid/df087aaf-db4a-11e4-9f43-002590fcbe18 N/A ada5p2
  • gptid/deb649ef-db4a-11e4-9f43-002590fcbe18 N/A ada6p2
  • [root@freenas ~]#
From the GUI go to Storage> View Disks
Drive labeled as ada4 will now show it's serial number as the drive with the errors.
You can now be confident (after changing the cables/ports) you have identified the drive by serial number,
to make sure you are testing the correct drive.
 

eddie200112

Contributor
Joined
Mar 17, 2015
Messages
190
OK thanks so checking the GUI as you suggested ada4 is the same drive ive been looking at (Z303A1YA).

So the replacement drive came today, however I booted up the server and now when I check the alerts in the GUI I'm not getting an error saying the pool is degraded, just a notice that an update is available. Might it just take a while for it to detect the problem again or maybe the drive is intermittently crapping out?
 

eddie200112

Contributor
Joined
Mar 17, 2015
Messages
190
Also any good "how to" on replacing a drive. A short search I only found this comment from cyberjock.

Do a disk replacement like this:

1. Offline the failed disk per the GUI.
2. Physically remove the disk and install the new disk.
3. Run the command "service ix-multipath start"
4. Replace the disk per the GUI.

If you can't do a hotswap (or don't want to) you can skip step 3 as long as the new disk is in the system on bootup. The ix-multipath service runs on bootup and will create the proper multipath connections for you. Otherwise you must invoke it yourself manually as I described above.

But I would need more step by step help, like How to offline the disk, commands to enter into the GUI etc.
thanks.
 
Status
Not open for further replies.
Top