Error replacing disk in a ZFS mirror

Status
Not open for further replies.

whistlepigger

Dabbler
Joined
Jun 9, 2011
Messages
15
Running FreeNAS-8.0.4-RELEASE-x64 (10351). I have a ZFS mirror, and one of the disks went bad. Following the instruction on this forum, I powered off, replaced the failed drive, powered on, fount the drive in the "view disks" section of the gui, and clicked replace.

Sadly, I got "An Error Occurred". The log shows:

Jun 5 16:57:38 edenserver freenas[1966]: Popen()ing: /sbin/sysctl -n kern.disks
Jun 5 16:57:38 edenserver freenas[1966]: Popen()ing: /usr/sbin/diskinfo ada2
Jun 5 16:57:38 edenserver freenas[1966]: Popen()ing: /usr/sbin/diskinfo ada1
Jun 5 16:57:38 edenserver freenas[1966]: Popen()ing: /usr/sbin/diskinfo ada0
Jun 5 16:57:38 edenserver freenas[1966]: Popen()ing: /usr/sbin/diskinfo da0
Jun 5 16:57:56 edenserver freenas[1966]: Popen()ing: /sbin/sysctl -n kern.disks
Jun 5 16:57:56 edenserver freenas[1966]: Popen()ing: /usr/sbin/diskinfo ada2
Jun 5 16:57:56 edenserver freenas[1966]: Popen()ing: /usr/sbin/diskinfo ada1
Jun 5 16:57:56 edenserver freenas[1966]: Popen()ing: /usr/sbin/diskinfo ada0
Jun 5 16:57:56 edenserver freenas[1966]: Popen()ing: /usr/sbin/diskinfo da0
Jun 5 16:57:56 edenserver freenas[1966]: Popen()ing: zpool status vol1
Jun 5 16:57:56 edenserver freenas[1966]: Popen()ing: zpool status vol1
Jun 5 16:57:56 edenserver freenas[1966]: Executing: dd if=/dev/zero of=/dev/ada0 bs=1m count=1
Jun 5 16:57:57 edenserver freenas: 1+0 records in
Jun 5 16:57:57 edenserver freenas: 1+0 records out
Jun 5 16:57:57 edenserver freenas: 1048576 bytes transferred in 0.010975 secs (95543242 bytes/sec)
Jun 5 16:57:57 edenserver freenas[1966]: Executing: dd if=/dev/zero of=/dev/ada0 bs=1m oseek=`diskinfo ada0 | awk '{print int($3 / (1024*1024)) - 4;}'`
Jun 5 16:57:57 edenserver freenas: dd: /dev/ada0: short write on character device
Jun 5 16:57:57 edenserver freenas: dd: /dev/ada0: end of device
Jun 5 16:57:57 edenserver freenas: 5+0 records in
Jun 5 16:57:57 edenserver freenas: 4+1 records out
Jun 5 16:57:57 edenserver freenas: 4218880 bytes transferred in 0.042101 secs (100208769 bytes/sec)
Jun 5 16:57:57 edenserver freenas[1966]: Popen()ing: gpart create -s gpt /dev/ada0
Jun 5 16:57:57 edenserver freenas[1966]: Popen()ing: gpart add -b 128 -t freebsd-swap -s 4194304 ada0
Jun 5 16:57:57 edenserver freenas[1966]: Popen()ing: gpart add -t freebsd-zfs ada0
Jun 5 16:57:57 edenserver freenas[1966]: Popen()ing: gpart bootcode -b /boot/pmbr-datadisk /dev/ada0
Jun 5 16:57:57 edenserver freenas[1966]: Executing: /sbin/swapon /dev/ada0p1
Jun 5 16:57:57 edenserver freenas[1966]: Popen()ing: /sbin/zpool replace vol1 577894466352 ada0p2

If i repeat the replace, I get a bunch of "Operation not permitted" errors.

If i remove the new disk and wipe it clean, I see the same error.

After reading this forum, I see that many people have errors trying to do this, and I'm puzzled as to why it's so difficult to replace a failed disk on a mirror.

ANy help would be appreciated.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
From an SSH session, as root, what is the output of:
Code:
zpool status -v

camcontrol devlist

gpart show
Do me a favor and throw [code] [/code] tags around the output. It will keep the formatting.
See above.

How many drive bays do you have anyway?
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
See above.

How many drive bays do you have anyway?

He has a mirror, so probably 2 ;)


It would be interesting/helpful to see any other messages following the ones you posted. You're the first person I've seen that has had trouble replacing a drive in a mirror. You said repeated attempts gave you an error, did you do those from the GUI or the command line?

You could try
/sbin/zpool replace vol1 577894466352 ada0p2
from the command line if you haven't already and post the output.


(And yes PaleoN, I am away on my vacation, but can't tear myself away from the forums at the moment ;) )
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
He has a mirror, so probably 2 ;)
Likely, but the ada0-2 is throwing me off. Not to mention, I have a mirror and have more than 2 ;)

You could try
/sbin/zpool replace vol1 577894466352 ada0p2
from the command line if you haven't already and post the output.
Ah, so in this case ada0 is the new replacement drive. I couldn't tell on first reading through.
 

whistlepigger

Dabbler
Joined
Jun 9, 2011
Messages
15
Thanks for the replies.

Yes, vol1 is a mirror of 500gb drives ad0 and ad1... and ad0 went bad. vol2 is a single 640gb drive ad2.
When I first got this error, I thought maybe I needed to be sure the replacement drive was clean (no partitions), so I ran it through gparted and removed any. But I still get the errors.

I've provided the zpool status, camcontrol and gpart outputs, as well as the attempt at doing the "zpool replace" manually.

---

[root@edenserver] ~# zpool status -v
pool: vol1
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-2Q
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
vol1 DEGRADED 0 0 0
mirror DEGRADED 0 0 0
577894466352 UNAVAIL 0 0 0 was /dev/gptid/0d14b876-9088-11e0-9cca-001cc0fc7be0
gptid/0d69d39f-9088-11e0-9cca-001cc0fc7be0 ONLINE 0 0 0

errors: No known data errors

pool: vol2
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
vol2 ONLINE 0 0 0
ada2p2 ONLINE 0 0 0

errors: No known data errors

---

[root@edenserver] ~# camcontrol devlist
<ST3500630AS 3.AAK> at scbus1 target 0 lun 0 (ada0,pass0)
<ST3500630AS 3.AAK> at scbus2 target 0 lun 0 (ada1,pass1)
<WDC WD6401AALS-00E3A0 05.01D05> at scbus3 target 0 lun 0 (ada2,pass2)
<CENTON DS Pro 1100> at scbus8 target 0 lun 0 (da0,pass3)

--

[root@edenserver] ~# gpart show
=> 63 3913623 da0 MBR (1.9G)
63 1930257 1 freebsd (943M)
1930320 63 - free - (32K)
1930383 1930257 2 freebsd [active] (943M)
3860640 3024 3 freebsd (1.5M)
3863664 41328 4 freebsd (20M)
3904992 8694 - free - (4.2M)

=> 0 1930257 da0s1 BSD (943M)
0 16 - free - (8.0K)
16 1930241 1 !0 (943M)

=> 0 1930257 da0s2 BSD (943M)
0 16 - free - (8.0K)
16 1930241 1 !0 (943M)

=> 34 976773101 ada1 GPT (466G)
34 94 - free - (47K)
128 4194304 1 freebsd-swap (2.0G)
4194432 972578703 2 freebsd-zfs (464G)

=> 34 1250263661 ada2 GPT (596G)
34 94 - free - (47K)
128 4194304 1 freebsd-swap (2.0G)
4194432 1246069263 2 freebsd-zfs (594G)

=> 34 976773101 ada0 GPT (466G)
34 94 - free - (47K)
128 4194304 1 freebsd-swap (2.0G)
4194432 972578703 2 freebsd-zfs (464G)

---

[root@edenserver] ~# /sbin/zpool replace vol1 577894466352 ada0p2
invalid vdev specification
use '-f' to override the following errors:
/dev/ada0p2 is part of potentially active pool 's500'
 

whistlepigger

Dabbler
Joined
Jun 9, 2011
Messages
15
Forgot to answer this:
>>did you do those from the GUI or the command line?

I wanted to do everything from the GUI - which is where I got the "An Error Occurred" message.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
[root@edenserver] ~# /sbin/zpool replace vol1 577894466352 ada0p2
invalid vdev specification
use '-f' to override the following errors:
/dev/ada0p2 is part of potentially active pool 's500'
Where did drive ada0 come from? What exactly is pool 's500'?
 

whistlepigger

Dabbler
Joined
Jun 9, 2011
Messages
15
Where did drive ada0 come from? What exactly is pool 's500'?

THANK YOU. Thank You. thank you.

The replacement drive ad0 came from a test system with freenas installed! It seems that I did not zero out the drive successfully. Removed the drive, zeroed it out successfully this time - and then everything worked.

So the take-away is that 1. the documentation should specify that the drive must be zeroed out ... or 2. freenas should be able to zero this out for me. It could prompt me with something like "It appears that this drive has been previously formatted - do you really want to ..."
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
[root@edenserver] ~# /sbin/zpool replace vol1 577894466352 ada0p2
invalid vdev specification
use '-f' to override the following errors:
/dev/ada0p2 is part of potentially active pool 's500'

It looks to me like the disk you are using as a replacement was part of another pool before? s500?

If that's the case, you probably need to wipe it before trying to use it as a replacement. I think 8.2 is supposed to have a feature for wiping disks, but IF it was part of another pool before and you want to wipe it, take a look at the FAQ.
 
Status
Not open for further replies.
Top