SOLVED Mixed messages in resilver

Status
Not open for further replies.

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
I had received some small SMART test errors over the last two weeks so just before my warranty expired I RMA ed a new drive. Did the WDidle3 to 300 seconds and even put back the returning drive to 8 seconds. Then carefully followed the instructions on resilvering.

Now the resilver is complete:
Code:
[root@freenas ~]# zpool status                                                                                                  
  pool: NAS                                                                                                                     
state: DEGRADED                                                                                                                
  scan: resilvered 169G in 10h0m with 0 errors on Thu Nov  6 02:25:52 2014                                                      
config:                                                                                                                         
                                                                                                                                
        NAME                                            STATE     READ WRITE CKSUM                                              
        NAS                                             DEGRADED     0     0     0                                              
          raidz2-0                                      DEGRADED     0     0     0                                              
            gptid/00780523-12d1-11e4-94f0-0030483594b4  ONLINE       0     0     0                                              
            gptid/0153dbc6-12d1-11e4-94f0-0030483594b4  ONLINE       0     0     0                                              
            gptid/023861ca-12d1-11e4-94f0-0030483594b4  ONLINE       0     0     0                                              
            gptid/02f20184-12d1-11e4-94f0-0030483594b4  ONLINE       0     0     0                                              
            replacing-4                                 UNAVAIL      0     0     0                                              
              8687065241295420711                       OFFLINE      0     0     0  was /dev/gptid/03d9e2a5-12d1-11e4-94f0-003048359
4b4                                                                                                                             
              11143691746629528015                      REMOVED      0     0     0  was /dev/gptid/00244d65-6543-11e4-9b22-003048359
4b4                                                                                                                             
            gptid/04a1784e-12d1-11e4-94f0-0030483594b4  ONLINE       0     0     0                                              
                                                                                                                                
errors: No known data errors                                                                                                    
[root@freenas ~]#   


But you can see things are not happy yet.

And in the GUI I get mixed messages too...


Mixed messages.jpg

If this is normal after a resilver the guide does not talk about it.

I feel like I just realized I'm standing on a frozen lake because I heard ice cracking under my feet.
What direction do I step next?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
No, what looks like happened is one disk was replaced with another disk, but the "new" disk has also been kicked from the pool.

There's a chance that if you run the command "camcontrol devlist" you might notice a disk is "missing". Anyway, you should take a look at your system, but it looks like the drive that is/was labeled as "ada1" is missing.

In any case, you are still down 1 disk in the pool plus you didn't finish the resilvering steps as the manual dictates. If you did the "removed" disk entry wouldn't be there.
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
I want to verify before I do anything further.

So my next step is to click the entry of the one showing as removed and use the “Detach” button to remove the disk from the list? I want to be sure which one to "Detach". One is marked removed the other is offline. And the entries under "Name" are long numbers that do not relate to serial numbers to identify which one is which.
camcontrol devlist shows all six as being there. Including one listed as ada1:

Code:
[root@freenas ~]# camcontrol devlist                                                                                               
<WDC WD30EZRX-00MMMB0 80.00A80>    at scbus8 target 0 lun 0 (pass0,ada0)                                                           
<WDC WD30EZRX-00D8PB0 80.00A80>    at scbus9 target 0 lun 0 (pass1,ada1)                                                           
<WDC WD30EZRX-00D8PB0 80.00A80>    at scbus10 target 0 lun 0 (pass2,ada2)                                                          
<WDC WD30EZRX-00MMMB0 80.00A80>    at scbus11 target 0 lun 0 (pass3,ada3)                                                          
<WDC WD30EZRX-00D8PB0 80.00A80>    at scbus12 target 0 lun 0 (pass4,ada4)                                                          
<WDC WD30EZRX-00D8PB0 80.00A80>    at scbus13 target 0 lun 0 (pass5,ada5)                                                          
<Generic USB  SD Reader 1.00>      at scbus15 target 0 lun 0 (pass6,da0)                                                           
[root@freenas ~]# 


The ice cracked again but I'm not moving until I'm sure what step to take.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You are correct. One will have a detach button and the other a remove (I think that's what it says). You need to detach the one to replace it and then use the replace button to replace the disk yet again. Then click the remove button to permanently remove the disk from the pool.
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
Confusing. They both have the "Detach" and "Replace" buttons. Yesterday as I detached the old drive it created the long number name before I shut down and installed the new drive. I should have written down the Name number and I would not be confused as to which is which.

I followed the steps exactly in the guide, and clearly I am at step "4. If the replaced disk continues to be listed after resilvering is complete, click its entry and use the
“Detach” button to remove the disk from the list."
My best guess is that it is the one listed as "removed", but I hate guessing with my data.

I went through steps 1 through 3. yesterday.
"3. Once the disk is showing as OFFLINE, click the disk again and then click its “Replace” button.
Select the replacement disk from the drop-down menu and click the “Replace Disk” button. If
the disk is a member of an encrypted ZFS pool, you will be prompted to input the passphrase
for the pool. Once you click the “Replace Disk” button, the ZFS pool will start to resilver. You
can use the zpool status command in Shell to monitor the status of the resilvering."

It all went as described. (It is not encrypted). And I even used the zpool status to monitor. Again I should have noted the Name number that my system created so that I would be sure to know which drive to apply step 4 to.

Can you confirm that it is the one listed as "Removed" that I "Detach" yet again?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah, if detach is the only option than it is the appropriate button. I had a long night last night so I might have the detach and remove buttons backwards in my head. Just FYI, you'll *never* be detaching AND removing at the same action in the WebGUI. ;)
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
Both detach and replace options are available on both the "removed" and the "offline" drives.

I appreciate your being here to help after having a long night.

I want to do this right so I dont' have a long month re ripping all of my data.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Not sure what you want/need answered. You should just replace the disks as the manual says.

Am I missing something?
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
I want to confirm that the next step is on the drive showing Removed, and that the action is Detach.

And it seems that the new drive is showing Offline. Will the detach command make the one showing offline to now appear on line or do I need to do some steps to it to make it appear online.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I want to confirm that the next step is on the drive showing Removed, and that the action is Detach.

And it seems that the new drive is showing Offline. Will the detach command make the one showing offline to now appear on line or do I need to do some steps to it to make it appear online.

That depends on the current situation. Can you provide a screenshot with each of the two questionable drives selected, their status (removed/etc.), and the buttons at the bottom of the WebGUI?
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
Gladly.

screen for removed.jpg


And:

screen for offline.jpg


I really appreciate your time and help with this.

Again I think I should have noted the Name number that was created as I went through the steps of the resilver.

The system only has 6 drives currently installed.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ok, so you should detach one of them (I don't care which). After the detach is done then you can do the "replace" of the other. After the resilvering is done you'll still have one weird entry. At that point you should detach the weird entry. ;)
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
OK. Strange because the resilvering was already done. It took 10 hours (no big deal), and was done by the book, was monitored through out, and led to where I'm at now. But if resilvering needs done again that is no problem.

I just seems that I ended up in a place that did not fit the guide exactly. So at that point I stopped and came here to be sure what my best course of action was.

I am going to detach the one listed as removed. And the replace of the one listed as Offline.. Also I'll track Name numbers to know which one is the "weird" entry.

I'll start at 8:00 PM mountain time tonight. So you'll have 45 minutes to respond "Oh god don't do that". :eek:
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Have you checked SMART on the disks? It seems like this would only happen with a failed/intermittently failing disk.
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
Just had an "Oh S***" moment. Since this resilver was all a result of a WD RMA drive arriving with a SMART error right out of the box two months ago. It is the one I'm sending back just days before it's 60 day warranty was up. And with your suggestion I just thought " I wonder if they sent me another bad one AGAIN".

No SMART errors on any of the 6 drives.

T minus 2 minutes.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Care to pastebin the output of smarctl -a /dev/XXX for your disks?
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
Sure.

Again I appreciate your time and concern.

Code:
[root@freenas ~]# smartctl -A /dev/ada0                                                                                            
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p12 amd64] (local build)                                                        
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org                                                        
                                                                                                                                   
=== START OF READ SMART DATA SECTION ===                                                                                           
SMART Attributes Data Structure revision number: 16                                                                                
Vendor Specific SMART Attributes with Thresholds:                                                                                  
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE                                   
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0                                           
  3 Spin_Up_Time            0x0027   153   143   021    Pre-fail  Always       -       9350                                        
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       387                                         
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0                                           
  7 Seek_Error_Rate         0x002e   200   199   000    Old_age   Always       -       0                                           
  9 Power_On_Hours          0x0032   093   093   000    Old_age   Always       -       5466                                        
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0                                           
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0                                           
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       209                                         
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       105                                         
193 Load_Cycle_Count        0x0032   166   166   000    Old_age   Always       -       102334                                      
194 Temperature_Celsius     0x0022   132   106   000    Old_age   Always       -       20                                          
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0                                           
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0                                           
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0                                           
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0                                           
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0                                           
                                                                                       


Code:
[root@freenas ~]# smartctl -A /dev/ada1                                                                                            
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p12 amd64] (local build)                                                        
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org                                                        
                                                                                                                                   
=== START OF READ SMART DATA SECTION ===                                                                                           
SMART Attributes Data Structure revision number: 16                                                                                
Vendor Specific SMART Attributes with Thresholds:                                                                                  
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE                                   
  1 Raw_Read_Error_Rate     0x002f   100   253   051    Pre-fail  Always       -       0                                           
  3 Spin_Up_Time            0x0027   204   189   021    Pre-fail  Always       -       4766                                        
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       7                                           
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0                                           
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0                                           
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       51                                          
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0                                           
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0                                           
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       7                                           
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       7                                           
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       14                                          
194 Temperature_Celsius     0x0022   132   120   000    Old_age   Always       -       18                                          
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0                                           
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0                                           
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0                                           
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0                                           
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0                                           
                                                                         


Code:
[root@freenas ~]# smartctl -A /dev/ada2                                                                                            
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p12 amd64] (local build)                                                        
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org                                                        
                                                                                                                                   
=== START OF READ SMART DATA SECTION ===                                                                                           
SMART Attributes Data Structure revision number: 16                                                                                
Vendor Specific SMART Attributes with Thresholds:                                                                                  
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE                                   
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0                                           
  3 Spin_Up_Time            0x0027   173   168   021    Pre-fail  Always       -       6333                                        
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       259                                         
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0                                           
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0                                           
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       4689                                        
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0                                           
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0                                           
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       93                                          
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       49                                          
193 Load_Cycle_Count        0x0032   168   168   000    Old_age   Always       -       98598                                       
194 Temperature_Celsius     0x0022   133   121   000    Old_age   Always       -       17                                          
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0                                           
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0                                           
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0                                           
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0                                           
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0                                           
                                                                               


Code:
[root@freenas ~]# smartctl -A /dev/ada3                                                                                            
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p12 amd64] (local build)                                                        
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org                                                        
                                                                                                                                   
=== START OF READ SMART DATA SECTION ===                                                                                           
SMART Attributes Data Structure revision number: 16                                                                                
Vendor Specific SMART Attributes with Thresholds:                                                                                  
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE                                   
  1 Raw_Read_Error_Rate     0x002f   200   185   051    Pre-fail  Always       -       0                                           
  3 Spin_Up_Time            0x0027   152   152   021    Pre-fail  Always       -       9358                                        
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       167                                         
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0                                           
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0                                           
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       1958                                        
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0                                           
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0                                           
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       27                                          
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       13                                          
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       5572                                        
194 Temperature_Celsius     0x0022   134   122   000    Old_age   Always       -       18                                          
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0                                           
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0                                           
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0                                           
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0                                           
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0                                           
                                                                                                 


Code:
[root@freenas ~]# smartctl -A /dev/ada4                                                                                            
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p12 amd64] (local build)                                                        
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org                                                        
                                                                                                                                   
=== START OF READ SMART DATA SECTION ===                                                                                           
SMART Attributes Data Structure revision number: 16                                                                                
Vendor Specific SMART Attributes with Thresholds:                                                                                  
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE                                   
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0                                           
  3 Spin_Up_Time            0x0027   178   177   021    Pre-fail  Always       -       6091                                        
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       260                                         
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0                                           
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0                                           
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       4690                                        
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0                                           
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0                                           
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       94                                          
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       48                                          
193 Load_Cycle_Count        0x0032   168   168   000    Old_age   Always       -       98736                                       
194 Temperature_Celsius     0x0022   132   113   000    Old_age   Always       -       18                                          
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0                                           
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0                                           
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0                                           
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0                                           
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0                                           
                                                                


Code:
[root@freenas ~]# smartctl -A /dev/ada5                                                                                            
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p12 amd64] (local build)                                                        
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org                                                        
                                                                                                                                   
=== START OF READ SMART DATA SECTION ===                                                                                           
SMART Attributes Data Structure revision number: 16                                                                                
Vendor Specific SMART Attributes with Thresholds:                                                                                  
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE                                   
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0                                           
  3 Spin_Up_Time            0x0027   165   162   021    Pre-fail  Always       -       6725                                        
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       261                                         
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0                                           
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0                                           
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       4689                                        
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0                                           
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0                                           
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       95                                          
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       49                                          
193 Load_Cycle_Count        0x0032   168   168   000    Old_age   Always       -       98202                                       
194 Temperature_Celsius     0x0022   133   115   000    Old_age   Always       -       17                                          
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0                                           
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0                                           
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0                                           
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0                                           
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0                                           
                                                                                
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Hmm.. those look okay. ;)
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
That's what I thought. Should I go ahead with the steps discussed? I'm in no hurry.
 
Status
Not open for further replies.
Top