Yet another person with drive replacement issues (ZPOOL DEGRADED)

Status
Not open for further replies.

sinnr

Cadet
Joined
Sep 20, 2012
Messages
1
Greetings All.

I am using Freenas 8.2.0 x64 w/8GB of RAM and an 11 x 2TB RaidZ1, and had a drive (ADA5) go south and went through the GUI replacement method using the same drive location:

1. Take drive "Offline"
2. Shut down Freenas. Replace bad drive with known good one
3. Bring Freenas backup. Select "Replace" and pick ada5 in the pulldown.
4. Zpool begins to resliver. Everything is looking good.
5. Resilver completed, but still showing permanent errors. No prob, I'll get to that since it seems like all the data is still good.
6. Try to "Detatch" the bad drive, and get told it was successfully detached.

However, the pool still shows as degraded, and the GUI still shows the bad drive as being UNAVAIL. Odd. So maybe it's a GUI issue and I go to a shell and do ye ol' zpool status (latest shown, but it's been like this for a good week or so despite multiple scrubs and such):

Code:
[root@freenas] ~# zpool status -v | more
  pool: MEDIA
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed after 22h19m with 448836 errors on Thu Sep 20 23:20:05 2012
config:

        NAME                                              STATE     READ WRITE CKSUM
        MEDIA                                             DEGRADED     0     0  438K
          raidz1                                          DEGRADED     0     0  877K
            gptid/f40f30a3-e1bd-11e1-a00e-001a4d4cb59b    ONLINE       0     0     0
            gptid/f4a0911a-e1bd-11e1-a00e-001a4d4cb59b    ONLINE       0     0     0
            gptid/f55231c6-e1bd-11e1-a00e-001a4d4cb59b    ONLINE       0     0     0
            gptid/f6048bd8-e1bd-11e1-a00e-001a4d4cb59b    ONLINE       0     0     0
            gptid/f6d35c5c-e1bd-11e1-a00e-001a4d4cb59b    ONLINE       0     0     0
            replacing                                     DEGRADED     0     0     0
              2601964793004239810                         UNAVAIL      0     0     0  was /dev/gptid/f758f793-e1bd-11e1-a00e-001a4d4cb59b
              gptid/79561e7f-ff9e-11e1-9a9a-001a4d4cb59b  ONLINE       0     0     0  1.00T resilvered
            gptid/f826f656-e1bd-11e1-a00e-001a4d4cb59b    ONLINE       0     0     0
            gptid/f8c889bc-e1bd-11e1-a00e-001a4d4cb59b    ONLINE       0     0     0
            gptid/f95bd681-e1bd-11e1-a00e-001a4d4cb59b    ONLINE       0     0     0  7.96M resilvered
            gptid/fa070a8e-e1bd-11e1-a00e-001a4d4cb59b    ONLINE       0     0     0
            gptid/fa93a8d3-e1bd-11e1-a00e-001a4d4cb59b    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        MEDIA:<0x6114>
        MEDIA:<0x6125>
        MEDIA:<0x61ac>
        MEDIA:<0x61bd>
        MEDIA:<0x60e0>
        MEDIA:<0x60ec>
        MEDIA:<0x60fb>
        MEDIA:<0x60fd>


Okay, so let's go ahead and detach from here, assuming it's a GUI issue:

[root@freenas] ~# zpool detach MEDIA 2601964793004239810
cannot detach 2601964793004239810: no valid replicas
[root@freenas] ~# zpool detach MEDIA /dev/gptid/f758f793-e1bd-11e1-a00e-001a4d4cb59b
cannot detach /dev/gptid/f758f793-e1bd-11e1-a00e-001a4d4cb59b: no valid replicas

So, this has continued for over a week with various attempts to try to get things resolved. It appears that I cannot detach the old drive, regardless of what I try, although by all accounts the new drive is online and working, and I have no data corruption issues. Based upon recommendations in the forums, all the following have been attempted, plus several other miscellaneous attempts to make some progress:

1. Doing an export then import zpool.
2. Booting with the alpha Freenas 8.3 and doing an import, scrub and then detach. Stil getting no valid replicas.
3. Replace ada5 with yet another new drive, allow it to resilver. This only results in me having two drives that are UNAVAIL and neither can be detached.
4. Put the old drive back in, and resilver again.

Here is all the usual additional info that is asked for on here:

Code:
[root@freenas] ~# gpart show
=>     63  7864227  da0  MBR  (3.8G)
       63  1930257    1  freebsd  [active]  (943M)
  1930320       63       - free -  (32K)
  1930383  1930257    2  freebsd  (943M)
  3860640     3024    3  freebsd  (1.5M)
  3863664    41328    4  freebsd  (20M)
  3904992  3959298       - free -  (1.9G)

=>      0  1930257  da0s1  BSD  (943M)
        0       16         - free -  (8.0K)
       16  1930241      1  !0  (943M)

=>        34  3907029101  ada0  GPT  (1.8T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  3902834703     2  freebsd-zfs  (1.8T)

=>        34  3907029101  ada1  GPT  (1.8T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  3902834703     2  freebsd-zfs  (1.8T)

=>        34  3907029101  ada2  GPT  (1.8T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  3902834703     2  freebsd-zfs  (1.8T)

=>        34  3907029101  ada3  GPT  (1.8T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  3902834703     2  freebsd-zfs  (1.8T)

=>        34  3907029101  ada4  GPT  (1.8T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  3902834703     2  freebsd-zfs  (1.8T)

=>        34  3907029101  ada5  GPT  (1.8T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  3902834703     2  freebsd-zfs  (1.8T)

=>        34  3907029101  ada6  GPT  (1.8T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  3902834703     2  freebsd-zfs  (1.8T)

=>        34  3907029101  ada7  GPT  (1.8T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  3902834703     2  freebsd-zfs  (1.8T)

=>        34  3907029101  ada8  GPT  (1.8T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  3902834703     2  freebsd-zfs  (1.8T)

=>        34  3907029101  ada9  GPT  (1.8T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  3902834703     2  freebsd-zfs  (1.8T)

=>        34  3907029101  ada10  GPT  (1.8T)
          34          94         - free -  (47K)
         128     4194304      1  freebsd-swap  (2.0G)
     4194432  3902834703      2  freebsd-zfs  (1.8T)


Code:
[root@freenas] ~# glabel status
                                      Name  Status  Components
                             ufs/FreeNASs3     N/A  da0s3
                             ufs/FreeNASs4     N/A  da0s4
                            ufs/FreeNASs1a     N/A  da0s1a
gptid/f40f30a3-e1bd-11e1-a00e-001a4d4cb59b     N/A  ada0p2
gptid/f4a0911a-e1bd-11e1-a00e-001a4d4cb59b     N/A  ada1p2
gptid/f55231c6-e1bd-11e1-a00e-001a4d4cb59b     N/A  ada2p2
gptid/f6048bd8-e1bd-11e1-a00e-001a4d4cb59b     N/A  ada3p2
gptid/f6d35c5c-e1bd-11e1-a00e-001a4d4cb59b     N/A  ada4p2
gptid/79561e7f-ff9e-11e1-9a9a-001a4d4cb59b     N/A  ada5p2
gptid/f826f656-e1bd-11e1-a00e-001a4d4cb59b     N/A  ada6p2
gptid/f8c889bc-e1bd-11e1-a00e-001a4d4cb59b     N/A  ada7p2
gptid/f95bd681-e1bd-11e1-a00e-001a4d4cb59b     N/A  ada8p2
gptid/fa070a8e-e1bd-11e1-a00e-001a4d4cb59b     N/A  ada9p2
gptid/fa93a8d3-e1bd-11e1-a00e-001a4d4cb59b     N/A  ada10p2
gptid/79308105-ff9e-11e1-9a9a-001a4d4cb59b     N/A  ada5p1


Code:
[root@freenas] ~# camcontrol devlist
<SAMSUNG HD204UI 1AQ10001>         at scbus0 target 0 lun 0 (pass0,ada0)
<Hitachi HDS5C3020ALA632 ML6OA180>  at scbus0 target 1 lun 0 (pass1,ada1)
<Hitachi HDS5C3020ALA632 ML6OA580>  at scbus0 target 2 lun 0 (pass2,ada2)
<SAMSUNG HD204UI 1AQ10001>         at scbus0 target 3 lun 0 (pass3,ada3)
<WDC WD20EARX-00PASB0 51.0AB51>    at scbus0 target 4 lun 0 (pass4,ada4)
<Port Multiplier 37261095 1706>    at scbus0 target 15 lun 0 (pass5,pmp0)
<WDC WD20EARS-00J2GB0 80.00A80>    at scbus1 target 0 lun 0 (pass6,ada5)
<WDC WD20EARS-00MVWB0 51.0AB51>    at scbus3 target 0 lun 0 (pass7,ada6)
<SAMSUNG HD204UI 1AQ10001>         at scbus4 target 0 lun 0 (pass8,ada7)
<Hitachi HDS5C3020ALA632 ML6OA580>  at scbus6 target 0 lun 0 (pass9,ada8)
<SAMSUNG HD204UI 1AQ10001>         at scbus7 target 0 lun 0 (pass10,ada9)
<Hitachi HDS5C3020ALA632 ML6OA5C0>  at scbus8 target 0 lun 0 (pass11,ada10)
<Generic Card-Reader 1.03>         at scbus9 target 0 lun 0 (pass12,da0)


Thoughts on what approach I can take to get this resolved, short of toasting the entire pool and re-loading 10+ TB of data????
 

ikillcopiers

Cadet
Joined
Oct 9, 2012
Messages
1
I have a simliar problem, zpool with 2 "ghost" devices from previous drive replacements,
having followed a fairly similar "solution" path, now on FreeNAS 8.3 RC1.

There are some data errors also showing up under zpool status.

I've tried zpool clear to clear the errors, no dice.

zpool offline, detach also do nothing, giving the "no valid replicas" response.

Have you had any luck with this problem or did you just make a new pool ?

Is there any way to forcibily remove the disks from the pool, is there
maybe a configuration file that can be edited ??
 

joshg678

Explorer
Joined
Sep 27, 2012
Messages
52
You may have more then one failed drive.

Seems strange that the other drive had 7.96M resilvered as well.
 

astuckey

Cadet
Joined
Feb 17, 2013
Messages
1
I just ran into the same problem.
FreeNAS-8.3.0-RELEASE-p1-x64
Replacing with bigger drives (2T -> 3T). Offlined 2T drive ok, replaced with 3T ok, resilvered ok, but cannot detach the 2T drive with same error "no valid replicas".
GUI says detach worked ok, but command line shows the error.
During resilvering, two other drives also decided to resilver a small amount which is very odd.
I am 2 drives in to a 6 drive replacement, and as you could imagine, I don't want to continue with the upgrade until this is sorted out.

sinnr / ikillcopiers - did you manage to work out a way forward?
 

dmt0

Dabbler
Joined
Oct 28, 2011
Messages
47
So has anyone been able to fix this? I got the same issue...
 

dmt0

Dabbler
Joined
Oct 28, 2011
Messages
47
Also, I'm getting
"Permanent errors have been detected in the following files"
followed by a list of addresses in hexadecimal.
I presume that those might be the reason why I can't detach, but how do I fix them? If I at least had filenames, I'd delete them, no data - no problem :). But what to do now?

Update: did a scrub from the GUI, left it running overnight. Permanent errors disappeared. Could detach with a CLI command without issues. Now everything is online. Done.
 
Status
Not open for further replies.
Top