Unable to replace disk for ZFS Mirror

Status
Not open for further replies.

MyLittlePWNie5

Dabbler
Joined
Dec 12, 2012
Messages
21
Noob here, hoping for some help and willing to try and learn whatever I can.

I guess I'll start with my setup:

- 2x 3tb Western Digital Reds (connected to SATA0 & 1 on the mobo)
- 4gb RAM
- Intel G630t

I had everything up and working today, but I wanted to see how removing a drive from a ZFS mirror would fare, and ever since I offlined (ada1) of (ada0 & ada1), the volume has been in a degraded state with one drive "Online" and the other "Offline." I detached the volume and Auto Imported it back in after wiping data from "ada1" to try and have it resilver data from "ada0." I'm not worried about losing data from any of this, but I am trying to do this as if the data was important, since I will be migrating all of my data to this setup soon.

When I go back to View Volume>Volume Status, the following shows:

Mirror1 - Degraded
mirror-0 - Degraded
ada0p2 - Online (Edit, Replace, Offline)
6626756291593342731 - Offline (Replace)

I try to hit the "replace" button via webgui on "6626756291593342731," and for some reason on the "Disk Replacement" window, it lists "ada1" as an option, but when I select it, it still gives me a "Replacing disk NONE." I hit "replace disk" and it doesn't do anything to the volume at all. I checked "zpool status" in the command line, and it lists the same info as the webgui.

I'm scratching my head with where to even start. I've been reading around, googling, and reading the documentation, and with the limited info out there, I still couldn't fix the issue. I am highly doubtful that the drive that was working moments before, is all the sudden a bad hard drive after "offlining" the disk (ada1), although, working in IT, it's not impossible either.

If anyone could give me a place to start, I'd be willing to give it a try. Thanks in advanced.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
What version of FreeNAS are you using?
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
This may be fixed with a reboot.

Usually when you do replace disk and it "doesn't do anything," there's a quick error that appears at the top of the GUI that may provide a clue. Did you see anything there?

I doubt if the drive is bad. My guess is some sort of inconsistency in the name of the drive in the database and in the zpool -- this happened to me a couple of times when I created an array with 4K blocks.
 

MyLittlePWNie5

Dabbler
Joined
Dec 12, 2012
Messages
21
I'm using 8.3.0, latest release.

I've also tried rebooting a few times. The name inconsistency sounds like it would make sense. Would the name have changed on its own without asking me before naming itself? It still shows up as ada1 on the list though. You're right, there was an error message, and I'll check it tomorrow.

That's what I love about this community, such quick responses. Thanks for taking the time so far.
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
FreeNAS references the drives by gptid, so if you do a zpool status, you may see something like this instead of adax:
Code:
	NAME                                            STATE     READ WRITE CKSUM
	tank                                            ONLINE       0     0     0
	  raidz1-0                                      ONLINE       0     0     0
	    gptid/244462df-4490-11e2-831c-00259095ca38  ONLINE       0     0     0
	    gptid/24bb7e7e-4490-11e2-831c-00259095ca38  ONLINE       0     0     0
	    gptid/2535906a-4490-11e2-831c-00259095ca38  ONLINE       0     0     0
	    gptid/258adce8-4490-11e2-831c-00259095ca38  ONLINE       0     0     0


I had two instances that I could not reproduce where at pool creation, the gptid got out of sync with the database. I don't know if this is related to what you're seeing or not, but the error message the GUI displays could be a clue.
 

MyLittlePWNie5

Dabbler
Joined
Dec 12, 2012
Messages
21
FreeNAS references the drives by gptid, so if you do a zpool status, you may see something like this instead of adax:
Code:
	NAME                                            STATE     READ WRITE CKSUM
	tank                                            ONLINE       0     0     0
	  raidz1-0                                      ONLINE       0     0     0
	    gptid/244462df-4490-11e2-831c-00259095ca38  ONLINE       0     0     0
	    gptid/24bb7e7e-4490-11e2-831c-00259095ca38  ONLINE       0     0     0
	    gptid/2535906a-4490-11e2-831c-00259095ca38  ONLINE       0     0     0
	    gptid/258adce8-4490-11e2-831c-00259095ca38  ONLINE       0     0     0


I had two instances that I could not reproduce where at pool creation, the gptid got out of sync with the database. I don't know if this is related to what you're seeing or not, but the error message the GUI displays could be a clue.


That's actually quite similar to the issue I'm having. The Webui is not consistent with the command line zpool status.

Zpool status lists a long gptid, as the one you have showing above; whereas, the webui shows what I listed above:

Code:
Mirror1 - Degraded
mirror-0 - Degraded
ada0p2 - Online (Edit, Replace, Offline)
6626756291593342731 - Offline (Replace)



I'm at work right now, so I'm trying to gather as much info as I can, so I can shotgun it at home. I have limited time this week because of the whole Christmas thing. I will get you the error message shown on the GUI when I hit the "replace" button tonight. Thanks for the follow-up.
 

MyLittlePWNie5

Dabbler
Joined
Dec 12, 2012
Messages
21
This is the following error that's given when I try to use the "replace" button on the WebGUI.

Code:
Error: Disk replacement failed: "invalid vdev specification, use '-f' to override the following errors:, /dev/gptid/de4d8fed-4596-11e2-88a6-bc5ff467211f is part of active pool 'Mirror1', "
 

Stephens

Patron
Joined
Jun 19, 2012
Messages
496

bollar

Patron
Joined
Oct 28, 2012
Messages
411

MyLittlePWNie5

Dabbler
Joined
Dec 12, 2012
Messages
21
Exactly what bollar said. That was one of the documentation I used originally.

Did the info I provided help in any way?
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
This does look similar to the problem I saw, but I don't know the solution. There were some changes to the database code in ticket 1717 that will be in 8.3.1 (coming soon) that make some changes to the way names are handled in the database and this may help.

It's not a satisfactory answer, but since my array was new, I wound up wiping it and trying again. I didn't have the problem a second time.
 

MyLittlePWNie5

Dabbler
Joined
Dec 12, 2012
Messages
21
Yikes. That worries me because I'm going to be migrating all my data as soon as I am comfortable that it'll all be in tact. This is more of a learning experience than anything, and I hate to come out of situation by just shrugging it off.
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
I understand. I think with a bit more research, I could have resolved the issue from the CLI, but since I wasn't in production, this was an easier solution.
 

MyLittlePWNie5

Dabbler
Joined
Dec 12, 2012
Messages
21
This just makes me hesitant to use FreeNAS as primary storage until I know that a bad drive that gets pulled out, correctly, doesn't have issues. I might have to wait for 8.3.1 as you suggested. Until then. I'll just have my data where it currently is, without redundancy.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,402
This just makes me hesitant to use FreeNAS as primary storage until I know that a bad drive that gets pulled out, correctly, doesn't have issues.
Your "test' is hardly a bad drive or simulation of one. FreeNAS likely could handle this a bit better, e.g. let you online the drive via the GUI.

I might have to wait for 8.3.1 as you suggested. Until then. I'll just have my data where it currently is, without redundancy.
8.3.1 won't fix your "problem". Being non-redundant is rather dumb.

You administratively offlined the drive. It will not come back by itself unless you explicitly online the drive. Which currently you need to do via the CLI:
Code:
zpool status -v

Online whatever the above says is offline.  EG:

zpool online ada1p2


Alternatively to simulate replacing a failed disk with a replacement disk, wipe ada1 first and then try hitting replace.
 

MyLittlePWNie5

Dabbler
Joined
Dec 12, 2012
Messages
21
I have it where it is only because I know it's ok there. I'm afraid of having it in an unreliable place since I changed something in the configuration.

I see what you mean. I didn't know taking a disk offline was different from wiping it. I thought FreeNAS saw it as the same. That being said, by administratively "offlining" my drive, what exactly did I do, just for the sake of understanding?

I'll have to start from scratch again and give it a whirl. Thanks for the FYI paleon.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,402
That being said, by administratively "offlining" my drive, what exactly did I do, just for the sake of understanding?
You, as the administrator, told the ZFS to not use this drive at all unless/until you to it tell it to in the future. Among other things this is useful for partially failing drives. Currently, it still retains all the ZFS information & data at the point you offlined it, including the fact that it is offline.

I'll have to start from scratch again and give it a whirl. Thanks for the FYI paleon.
If by start from scratch you mean to online the drive and let it resilver any changes then sure. No need to redo the array.
 

MyLittlePWNie5

Dabbler
Joined
Dec 12, 2012
Messages
21
I'll give that a try. Is there no way to online a drive via the WebGUI? Thanks for taking the time.
 

MyLittlePWNie5

Dabbler
Joined
Dec 12, 2012
Messages
21
I got everything working again with the suggestion that was given by paleoN. I had to use the CLI and online the drive that was in the offline status with:

Code:
zpool online 6626756291593342731


and

Code:
zpool replace 6626756291593342731 ada1


What I think that is baffling is that you can offline a disk via the WebGUI but not online the same disk with it. You can only online a disk via the CLI. After that was done, it automatically resilvered the 2nd disk and the volume was healthy again. Also, when I offlined the disk, it changed the name from "ada1" to "6626756291593342731" without my consent. I'm unsure why it did that at all, but after the learning experience, I'm very pleased by the performance and I'm in the process of migrating all my data over.

Thank you very much for every one of you that took the time to help in suggesting anything, even if it didn't solve the issue. Much appreciated.
 

herby

Dabbler
Joined
Oct 12, 2011
Messages
10
Also, when I offlined the disk, it changed the name from "ada1" to "6626756291593342731" without my consent. I'm unsure why it did that at all, but after the learning experience, I'm very pleased by the performance and I'm in the process of migrating all my data over.

I would say it's a gptid but that doesn't look right.

As I understand it: since 8.2.0-RC1 Freenas uses gptid numbers instead of ada0, ada1, etc to label drives. Apparently since these ada assignments can change for various reasons it is better to use gptid numbers to identify drives as they do not change.

That said a gptid number should look like:
Code:
gptid/615fd428-3abc-11e2-befe-003067329074


/noob speaking, correct me if I'm wrong
 
Status
Not open for further replies.
Top