How to replace a RaidZ disk with itself

Status
Not open for further replies.

tzd

Dabbler
Joined
Dec 22, 2013
Messages
12
I am running FreeNAS-8.3.0-RELEASE-p1-x64 (r12825)

I have three hard disk drives in a Raid-Z1 volume. All three disks are new and have no defects.

Tonight, through the FreeNAS web interface, I went to the Volume Status tab and clicked 'Offline' on the first disk in the volume.

My intention was to put the volume into degraded mode, and then re-attach the same disk back to get it to re-silver. However, I am unable to successfully re-attach the disk.

This is the error message:
Dec 22 22:29:57 freenas manage.py: [middleware.exceptions:38] [MiddlewareError: Disk replacement failed: "invalid vdev specification, use '-f' to override the following errors:, /dev/gptid/a44bcc94-6b9b-11e3-b72e-e4115b12c0d0 is part of active pool 'nas_vol01', "]

This is my zpool status output:
Code:
[root@freenas] ~# zpool status
  pool: nas_vol01
state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
  see: http://www.sun.com/msg/ZFS-8000-2Q
  scan: scrub repaired 0 in 4h45m with 0 errors on Sat Dec 14 18:49:52 2013
config:
 
    NAME                                            STATE    READ WRITE CKSUM
    nas_vol01                                      DEGRADED    0    0    0
      raidz1-0                                      DEGRADED    0    0    0
        14951475343142292485                        UNAVAIL      0    0    3  was /dev/gptid/29cdff86-6b97-11e3-a9fe-e4115b12c0d0
        gptid/25030dad-546a-11e3-9477-e4115b12c0d0  ONLINE      0    0    0
        gptid/aa288cf8-6482-11e3-a9fe-e4115b12c0d0  ONLINE      0    0    0
 
errors: No known data errors


I tried several ways to wipe the detached disk (/dev/ada0) and try to re-attach it, but to no avail. I have replaced disks in the past, with a different disk, and that has always worked. But this is the first time I am trying to detach and re-attach the same disk.

As you can see I am at a loss now as to what to do, and my disk array is running in degraded mode.

I appreciate it if someone can help guide me through re-attaching this disk. Thanks!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I believe what you need to do is a reboot, then a quick wipe, then perhaps another reboot. Then you can add it back to the pool.

Alternatively you can pull the disk from the system and wipe the drive.
 

tzd

Dabbler
Joined
Dec 22, 2013
Messages
12
Thanks for your response. I just tried what you suggested - reboot, Quick Wipe, reboot, then tried to replace the disk - all from the web interface. But, still get the same error when trying to Replace:

Dec 22 23:49:49 freenas manage.py: [middleware.exceptions:38] [MiddlewareError: Disk replacement failed: "invalid vdev specification, use '-f' to override the following errors:, /dev/gptid/cc710661-6ba6-11e3-9d45-e4115b12c0d0 is part of active pool 'nas_vol01', "]
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Did the quick wipe successfully complete? This seems odd to me.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
I've had this problem before. Replacing a disk with a disk that has previously been used in the same pool.

If I remember right, you have to wipe the first few megabytes of the actual zfs partition. This was after I had done a quick wipe, so that may be required too, I'm not sure.

Something like this worked for me: (Assuming the default 2 gig swap size)

Code:
dd if=/dev/zero of=/dev/adaX bs=1m count=2 seek=2048


Of course, a full surface wipe of the disk would work too, but will take a few hours.
 

jyavenard

Patron
Joined
Oct 16, 2013
Messages
361
Use the command

zpool labelclear /dev/whatever

In a shell

This will remove all information about the zpool on your disk and you can then reuse it in that pool.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
Ahh, didn't know about the zpool labelclear command.

That's much better than manually dd'ing over the drive.
 

jyavenard

Patron
Joined
Oct 16, 2013
Messages
361
If you do intend to use dd; be aware that the zpool metadata aren't stored on just the beginning of the drive; but also on the last sector of the disk (or partition). So your dd command wouldn't have been enough to clear the whole disk for use in the zpool.

So in the end; nothing short of clearing the whole disk is sufficient with dd. as you said: it will take a while (unless of course you use dd to specifically write on the right spot)
 

tzd

Dabbler
Joined
Dec 22, 2013
Messages
12
Thanks for all the help!

After reading the above, here is the sequence of what I tried:

1. First I did 'zpool labelclear /dev/ada0', then reboot, then tried to replace the disk. I still get the same error!

2. So, I rebooted again, then ran 'dd if=/dev/zero of=/dev/ada0 bs=1m count=2 seek=2048'. For good measure, I rebooted, and ran zpool labelclear again, then rebooted again.

3. Then I tried to replace the disk, and success!

It is resilvering now:

Code:
[root@freenas] ~# zpool status
  pool: nas_vol01
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Dec 23 08:39:07 2013
        9.35G scanned out of 1.29T at 31.0M/s, 12h1m to go
        3.11G resilvered, 0.71% done
config:
 
    NAME                                              STATE    READ WRITE CKSUM
    nas_vol01                                        DEGRADED    0    0    0
      raidz1-0                                        DEGRADED    0    0    0
        replacing-0                                  OFFLINE      0    0    0
          14951475343142292485                        OFFLINE      0    0    0  was /dev/gptid/a44bcc94-6b9b-11e3-b72e-e4115b12c0d0
          gptid/bcd3ebb2-6bf0-11e3-a153-e4115b12c0d0  ONLINE      0    0    0  (resilvering)
        gptid/25030dad-546a-11e3-9477-e4115b12c0d0    ONLINE      0    0    0
        gptid/aa288cf8-6482-11e3-a9fe-e4115b12c0d0    ONLINE      0    0    0
 
errors: No known data errors


So it looks like the dd has some effect.
 

jyavenard

Patron
Joined
Oct 16, 2013
Messages
361
Did you offline the disk first?

The steps should simply be:

Offline the disk
Zpool clearlabel

Then on-line the disk.

No need to reboot or anything like that.
 

tzd

Dabbler
Joined
Dec 22, 2013
Messages
12
It was Offline. zpool status was what's showing in my first post above.

Things got a little messy last night though with all the commands I tried on ada0.

When ada0 finishes resilvering I will try again with ada1 and post my definitive results of what works.

Thanks!



Sent from my iPad using Tapatalk
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
zpool clear won't solve the problem. FreeNAS by design protects disks that have zpool data or a partition table. In essence, if you aren't doing something from the GUI you can expect it to be blocked. Also FreeNAS maintain's its protected media on reboot and refreshes that list on some GUI commands. It doesn't refresh on CLI since you generally shouldn't be doing CLI stuff. As you probably realize if you had done a disk replacement with a different disk you wouldn't have had this problem.

I'm a bit confused on why the quickwipe didn't work. I could have sworn that the quickwipe did a dd wipe of the partition table and first 5 GB and the end of the disk(which negates the intended function of zpool clear) or something similar to that. The reboot is supposed to reset the list of protected media(since its generated on bootup) in FreeNAS thereby allowing you to resilver properly.

Bit confusing, but at least you got it figured out.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
As I understand it, a 'quickwipe' clears the start and end of the *disk*. If swap size is set to anything except 0, then this won't clear the start of the zfs partiton, which leaves zfs labels. Which I think is why the replacement fails. I think it was wg on irc that explained it to me that way.

To be clear, we're talking about zpool clearlabel and not zpool clear which does a completely different thing. A zpool clearlabel should be all that's needed ontop of a quickerase which clears gpt partition information.
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
As I understand it, a 'quickwipe' clears the start and end of the *disk*. If swap size is set to anything except 0, then this won't clear the start of the zfs partiton, which leaves zfs labels.
The GUI quickwipe is actually quite thorough, it first wipes the first megabyte and last 4 megabytes of every partition, then destroys the partition table and then wipes the first megabyte and last 4 megabytes of the device.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
The GUI quickwipe is actually quite thorough, it first wipes the first megabyte and last 4 megabytes of every partition, then destroys the partition table and then wipes the first megabyte and last 4 megabytes of the device.

Hmm. I wonder if this has changed at some point. I do recall being told in irc that a quickerase in the gui simply dd zero'd the start and end of the disk, not anything to do with partitions.

If that's the case, then the 'zpool clearlabel' is clearly needed to completely get rid of the old zfs partition.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
If that's the case, then the 'zpool clearlabel' is clearly needed to completely get rid of the old zfs partition.

Which is why I'm puzzled as to why the quick wipe didn't "just work". :P
 

tzd

Dabbler
Joined
Dec 22, 2013
Messages
12
I have performed more experimentation tonight and confirmed that only the ‘dd’ command works. At least for the version of FreeNAS that I’m running FreeNAS-8.3.0-RELEASE-p1-x64 (r12825).

Here is what I did tonight:

To start off, I have disks ada0, ada1 and ada2. ada0 was successfully re-attached this morning, and my RAID-Z1 volume was back to HEALTHY status. Tonight I experimented with ada1.

1. In FreeNAS web interface, I went to the “Volume Status” tab and “Offline”ed ada1p2:

Code:
[root@freenas] ~# zpool status
  pool: nas_vol01
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Online the device using 'zpool online' or replace the device with
    'zpool replace'.
  scan: resilvered 439G in 5h46m with 0 errors on Mon Dec 23 14:25:11 2013
config:
 
    NAME                                            STATE    READ WRITE CKSUM
    nas_vol01                                      DEGRADED    0    0    0
      raidz1-0                                      DEGRADED    0    0    0
        gptid/bcd3ebb2-6bf0-11e3-a153-e4115b12c0d0  ONLINE      0    0    0
        2478308224592308645                        OFFLINE      0    0    0  was /dev/gptid/25030dad-546a-11e3-9477-e4115b12c0d0
        gptid/aa288cf8-6482-11e3-a9fe-e4115b12c0d0  ONLINE      0    0    0
 
errors: No known data errors


2. I went to the “View Disks” tab, and clicked on “Wipe” and then selected “Quick” for a quick wipe of ada1

Code:
Dec 23 20:36:21 freenas notifier: ada1 destroyed
Dec 23 20:36:21 freenas notifier: ada1 created
Dec 23 20:36:21 freenas notifier: ada1 destroyed


3. Went back to the “Volume Status” tab and tried to Replace the missing ada1. FAILED.

Code:
Dec 23 20:37:22 freenas notifier: 1+0 records in
Dec 23 20:37:22 freenas notifier: 1+0 records out
Dec 23 20:37:22 freenas notifier: 1048576 bytes transferred in 0.006459 secs (162343454 bytes/sec)
Dec 23 20:37:22 freenas notifier: dd: /dev/ada1: short write on character device
Dec 23 20:37:22 freenas notifier: dd: /dev/ada1: end of device
Dec 23 20:37:22 freenas notifier: 5+0 records in
Dec 23 20:37:22 freenas notifier: 4+1 records out
Dec 23 20:37:22 freenas notifier: 4284416 bytes transferred in 0.068658 secs (62402397 bytes/sec)
Dec 23 20:37:22 freenas manage.py: [middleware.exceptions:38] [MiddlewareError: Disk replacement failed: "invalid vdev specification, use '-f' to override the following errors:, /dev/gptid/1497fc1c-6c55-11e3-a153-e4115b12c0d0 is part of active pool 'nas_vol01', "]


4. Rebooted, and then tried again to Replace ada1. FAILED.

5. In the shell, I ran ‘zpool labelclear /dev/ada1’. Then tried again to Replace ada1. FAILED.

6. Rebooted, tried again to Replace ada1. FAILED.

7. Tried one more time to Quick Wipe ada1, reboot, and Replace ada1. FAILED.

8. Tried one more time to ‘zpool labelclear /dev/ada1’, reboot, and Replace ada1. FAILED.

9. Finally, I ran the command ‘dd if=/dev/zero of=/dev/ada1 bs=1m count=2 seek=2048’, rebooted and Replace ada1. SUCCESS!

Code:
Dec 23 21:14:37 freenas notifier: 1+0 records in
Dec 23 21:14:37 freenas notifier: 1+0 records out
Dec 23 21:14:37 freenas notifier: 1048576 bytes transferred in 0.006457 secs (162391408 bytes/sec)
Dec 23 21:14:37 freenas notifier: dd: /dev/ada1: short write on character device
Dec 23 21:14:37 freenas notifier: dd: /dev/ada1: end of device
Dec 23 21:14:37 freenas notifier: 5+0 records in
Dec 23 21:14:37 freenas notifier: 4+1 records out
Dec 23 21:14:37 freenas notifier: 4284416 bytes transferred in 0.072893 secs (58776667 bytes/sec)
Dec 23 21:14:45 freenas notifier: swapon: /dev/ada1p1: device already in use
 
[root@freenas] ~# zpool status
  pool: nas_vol01
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Dec 23 21:14:39 2013
        27.6M scanned out of 1.29T at 523K/s, (scan is slow, no estimated time)
        8.45M resilvered, 0.00% done
config:
 
    NAME                                              STATE    READ WRITE CKSUM
    nas_vol01                                        DEGRADED    0    0    0
      raidz1-0                                        DEGRADED    0    0    0
        gptid/bcd3ebb2-6bf0-11e3-a153-e4115b12c0d0    ONLINE      0    0    0
        replacing-1                                  OFFLINE      0    0    0
          2478308224592308645                        OFFLINE      0    0    0  was /dev/dsk/gptid/53c24fad-6c56-11e3-99f4-e4115b12c0d0
          gptid/490e26ae-6c5a-11e3-bfb1-e4115b12c0d0  ONLINE      0    0    0  (resilvering)
        gptid/aa288cf8-6482-11e3-a9fe-e4115b12c0d0    ONLINE      0    0    0
 
errors: No known data errors
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
Hmm. I wonder if this has changed at some point. I do recall being told in irc that a quickerase in the gui simply dd zero'd the start and end of the disk, not anything to do with partitions.
Checked the history and the wipe-every-partition code was added for 9.1.0: https://github.com/freenas/freenas/commit/ef4a353fe6f2dccc18d21a564aec3003f6b252cf
I have performed more experimentation tonight and confirmed that only the ‘dd’ command works. At least for the version of FreeNAS that I’m running FreeNAS-8.3.0-RELEASE-p1-x64 (r12825)
8.3.0 is too old, it does not include the "advanced" quick wipe algorithm.
 
  • Like
Reactions: tzd

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
Ahh, that makes sense. The one time I had this problem, it was still an 8.3 world.

Nice to know it should be unnecessary with 9.1+.
 
  • Like
Reactions: tzd

tzd

Dabbler
Joined
Dec 22, 2013
Messages
12
Thanks all! And btw no reboot is needed. Just need to Offline the disk, run 'dd', and then Replace the disk and it works.
 
Status
Not open for further replies.
Top