SOLVED Mixed messages in resilver

Status
Not open for further replies.

mjws00

Guru
Joined
Jul 25, 2014
Messages
798
What happens if you unplug that drive (ada1) and reboot. It should show up as unavail instead of removed. I'd pull it and set it aside. Reboot and ensure you have everything showing up as anticipated. Then shutdown and reinstall your spare. You should be able to wipe that spare from the gui, and use it to replace the unavail disk from the gui.

I apologize if I'm not following something. But I don't see where you ever dealt with disk 587~ I see a replace of disk 868~. It's a shame the resilver takes so long. Makes it hard to follow through and try things quickly.
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
I followed your steps but as before the unavail status only offers a replace option and when you try to replace there is nothing to replace it with. So shutdown and install hard drive and when you reboot it automatically starts the resilver with no input from me. At this point everything is perfect. All drives are Online under volume status tab, the pool is healthy under storage tab, and all data fields are full and normal under view discs tab. Zpool status shows a normal happy resilver underway. But when it is finished I'm back to removed on the drive that was added.
 

mjws00

Guru
Joined
Jul 25, 2014
Messages
798
It starts the resilver with the spare drive? Seems like the spare didn't get wiped properly.

So one approach would be to pull the 5 drives you know are good, install your 2 non-contributing drives and then zero them out. We want to make sure that no type of automatic resilver can happen, and that the drive is recognized as clean. Then when you reinstall the good drives and remount your pool, you will have devices available that you can use to replace the unavailable device. It may even be as simple as swapping a sata cable, zfs will recognize your pool drives by gptid, the OS should see a different ada(x) device available to use as a replacement.

You could also detach the pool (be careful not to erase when you do that) and then auto import and see if the funkiness goes away. Sometimes an export from the cli and then an autoimport works.

I tend to be paranoid with data, so I'd pull my 5 good drives so they aren't just suffering mindless resilvering, and a potential user error. That gives you 2 drives, Wipe em. Put them in a temp pool. Then detach and wipe. It should be impossible for the original zpool to recognize them at that point. Worst case, zero them out with dban or on another system, whatever is easiest.

Nothing stopping you from running a fresh install of FreeNAS, importing the pool, then using one of your wiped and/or zeroed drives to replace the listed unavailable device.

Unfortunately you have no choice but to deal with your pool. But since it is still available, you can back things up at your leisure. None of this stuff is ever pleasant. I don't think there is anything really broken, the web gui just needs a different trigger to let you use the device you're adding as the replacement.
 
Last edited:

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
OK. I'm working on your recommendations.

I'm starting by trying the drive that was going back to WD on the RMA. It is currently resilvering so I'll know in the AM if that worked.

As far as wiping the drives I'm having no luck. I have tried dban (free) and about 8 other HDD and erase tools including WD data lifeguard and they all say no device found. So unless freeNas has a way to wipe drives I'm at a loss here.

So in the AM I'll see if I can figure out how to wipe drives with FreeNAS. I will pull the 5 good ones. And then I'll move on to your other suggestions. Although I have no idea how to detach a pool, I'll see what I can learn from the guide.

There is no backing up of my data since I have no where to put 7TB of data. It is backed up on hundreds of optical discs that I have spent the past 3 months ripping to my system. So I'm really really hoping to find a solution with out losing all those months of work.
 

mjws00

Guru
Joined
Jul 25, 2014
Messages
798
You can wipe and zero the drives from the GUI in FreeNAS. Click Storage on the top Menu, View Disks, Highlight the disk. Click the Wipe button at the bottom. Select Quick. You can Fill it with Zeros if that doesn't work.

To detach. Click storage. Highlight the Pool. Click Detach Volume. DO NOT MARK AS NEW.

I'm not sure what to make of 'No device found'. Dban should be able to kill anything, so that is weird. Maybe you have another issue with the device? Could just be a cable or something, not enough info to go on.
 
Last edited:

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
Progress!

I was able to return my pool to Healthy by re installing the old drive and resilvering it. It is the drive with one bad sector and will be returned to WD in a day or two.

I also have pulled all drives, installed the new drive and it is currently being wiped. Once that is done, I'm hoping that I can once again try to replace the one with the bad sector.

If anyone has any "This time be sure to... " tips I would appreciate the education.
 

mjws00

Guru
Joined
Jul 25, 2014
Messages
798
The be sure to tip. Make sure you have WIPED the replacement drive you are going to install. Offline the drive FIRST in the gui. Shutdown. Pull it. And install the replacement. Reboot. Select the unavail drive in the storage gui. Click Replace. Choose your new clean drive. Done.

Edit: Wait 12 more hours for a resilver ;) The good news is you are burning in those drives nicely. :)
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
Replacement is still being wiped.
There is no Offline button so you mean to say "Detach" the drive first. which will show it as Offline. Shutdown. Pull it. Install replacement. Reboot. ect.
It is also important to note the Name number created by the system. Because last time I could not tell the new one from the old because they had these long Name numbers and it was confusing as to which listing was which because I had not noted the numbers as the process went along.
 

mjws00

Guru
Joined
Jul 25, 2014
Messages
798
Nope I meant offline. You should be able to use the serial to identify the drive you wish to remove.

If you see something different. Post a quick screenshot.
Offline disk.PNG
 

JackShine

Dabbler
Joined
Nov 13, 2014
Messages
27
Id physical remove the HDD with the “maybe” some issues.
Shutdown NAS.
Then chuck in a good HDD.
Start up NAS, it will automatically resilver.
All this “detach” and “replace” malarkey is rubbish.
What’s the point in messing about with a bad drive.
And if it turns out the replaced drive is good, then you have a spare.
 

mjws00

Guru
Joined
Jul 25, 2014
Messages
798
Sure. WTF do we know. Just throw the drive in and it will all work out. Like it did before.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Id physical remove the HDD with the “maybe” some issues.
Shutdown NAS.
Then chuck in a good HDD.
Start up NAS, it will automatically resilver.
All this “detach” and “replace” malarkey is rubbish.
What’s the point in messing about with a bad drive.
And if it turns out the replaced drive is good, then you have a spare.

Not sure if trolling...

The manual includes a drive replacement section for a reason.
 

JackShine

Dabbler
Joined
Nov 13, 2014
Messages
27
OK, say if one HDD has a head crash, complete fail.
Are you going to mess about with it? NO, cause you cant.
So pretend a slight fail is a massive fail….remove the HDD and shove a good one in.
Why over complicate things.
If the one you replaced turns out to be good, that’s great.
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
My system is returned to Healthy with the replacement drive!!

Some key things. After the first resilver did not work for whatever reason trying again and again with out wiping the drive seemed to be futile. Until directed there by mjws00 I had not seen the wipe button. But it turned out to be a key step after a failed resilver. Thanks mjws00! And whether it was needed or not I did the FULL wipe not just the directories and MBR.

Not sure why the first attempt failed (I'm pretty sure I followed the steps in the guide exactly) but if you look at the first photo I posted, the failed attempt resulted in the drives being identified by long Name numbers and no serial number displayed in any of the view tabs. So understanding which entry needed which response was confusing. This was my fault for not tracking which name number was created at different steps.

I was getting worried that I was going to have to return a RMA drive that had just arrived and having to try to justify that with WD. As my finances improve I plan on having a spare drive on the shelf ready. That should speed up recovery whether things hit a snag or not.

Thanks again to syberjock and mjws00 who put in so much time trying to help me with problem. Months of data transfer have been saved.
 

mjws00

Guru
Joined
Jul 25, 2014
Messages
798
OK, say if one HDD has a head crash, complete fail.
Are you going to mess about with it? NO, cause you cant.
So pretend a slight fail is a massive fail….remove the HDD and shove a good one in.
Why over complicate things.
If the one you replaced turns out to be good, that’s great.
Lol. If you are a troll. You're a good one. Of course he was RMA'ing the bad drive. There is no such thing as pull it and shove in a good one. This isn't a fkn drobo. Things went sideways on him.

Maybe read the thread, and perhaps learn something about the product before spewing nonsense.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Yeah, FreeNAS's unease about drives with data on them strikes again.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The thing is, the fact that it started resilvering was my hint that wiping the drive shouldn't have affected anything. I'm still a bit confused about what was wrong. Nonetheless its fixed which is all that matters.

Normally if FreeNAS is angry because a disk appears to have data it doesn't resilver and immediately throws an error. That didn't happen in this case which... interests me!
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
The thing is, the fact that it started resilvering was my hint that wiping the drive shouldn't have affected anything. I'm still a bit confused about what was wrong. Nonetheless its fixed which is all that matters.

Normally if FreeNAS is angry because a disk appears to have data it doesn't resilver and immediately throws an error. That didn't happen in this case which... interests me!

I've lost track of the details, but my theory is that early on the resilver started and then failed in a way that allowed ZFS to recognize the drive as supposedly part of the pool, but not actually properly resilver.
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
The thing is, the fact that it started resilvering was my hint that wiping the drive shouldn't have affected anything. I'm still a bit confused about what was wrong. Nonetheless its fixed which is all that matters.

Normally if FreeNAS is angry because a disk appears to have data it doesn't resilver and immediately throws an error. That didn't happen in this case which... interests me!
I've lost track of the details, but my theory is that early on the resilver started and then failed in a way that allowed ZFS to recognize the drive as supposedly part of the pool, but not actually properly resilver.

One note that may apply to both of your responses is that I did the FULL wipe not the quick index/directory wipe. I don't know much about this but it could be that the resilver started and then failed because the problem was further "out" in some later part of my drive.

Just a thought.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
Sounds like the replacement drive was dropping out. Either a bad drive, or a bad port. If the original drive was dropped the same way, it is probably the port or cable for that port, or power supply.

I think that is a key thing to keep in mind for any case where a replacement/resilver drops out during or after the resilver.

I also agree with cyberjock's skepticism about a data wipe being needed once a resilver is able to start. At that point, only failure will stop it.
 
Status
Not open for further replies.
Top