Lost a disk, can't online new disk

Status
Not open for further replies.

PierceIt

Dabbler
Joined
Dec 25, 2013
Messages
28
I had a disk go bad
I followed the instructions in the guide (detached the bad disk, shutdown, unplugged it, added the new disk, booted, resilvered the new disk, removed the old one from the list, etc.) but the new drive won't go online after running the zpool online command.

This is a zfs raidz1 pool with 5 disks in volume1

Here's a zpool status:

[root@freenas ~]# zpool status -v
pool: volume1
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: resilvered 9.50G in 5h58m with 0 errors on Fri Oct 10 22:23:59 2014
config:

NAME STATE READ WRITE CKSUM
volume1 DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
gptid/f2253c27-69e1-11e3-b48b-001ec94b7aa1 ONLINE 0 0 0
gptid/f29c6d25-69e1-11e3-b48b-001ec94b7aa1 ONLINE 0 0 0
gptid/f3231c1b-69e1-11e3-b48b-001ec94b7aa1 ONLINE 0 0 0
gptid/f3a51865-69e1-11e3-b48b-001ec94b7aa1 ONLINE 0 0 0
9205217750355284360 UNAVAIL 0 0 0 was /dev/gptid/585851c6-4cb9-11e4-a926-001ec94b7
aa1

errors: No known data errors

here is a camcontrol devlist:

[root@freenas ~]# camcontrol devlist
<ST31000528AS CC3E> at scbus1 target 0 lun 0 (ada0,pass0)
<WDC WD10EZEX-00RKKA0 80.00A80> at scbus2 target 0 lun 0 (ada1,pass1)
<WDC WD10EACS-00ZJB0 01.01B01> at scbus3 target 0 lun 0 (ada2,pass2)
<WDC WD10EACS-00ZJB0 01.01B01> at scbus4 target 0 lun 0 (ada3,pass3)
<Hitachi HDS722020ALA330 JKAOA28A> at scbus6 target 0 lun 0 (ada4,pass4)
<SanDisk Cruzer Fit 1.26> at scbus8 target 0 lun 0 (da0,pass5)

The first 4 disks listed above are in the raidz1 volume1 pool. My new disk is the 5th disk on the volume1 pool and the disk that went bad which i am trying to replace with this new one used to be ada4.

However since the new drive is not coming online as ada4, you can see the UFS Hitachi backup disk (an external esata drive) is currently ada4, and the SanDisk on da0 is my boot USB stick.

Running the zpool online command does this:
[root@freenas ~]# zpool online volume1 9205217750355284360
warning: device '9205217750355284360' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present

which leaves me with the new drive still unavailable.
(i've rebooted tried again, re-run replace command, resilvered, and still can't get it to join the pool)

When trying to use the GUI to add the disk into the pool with the REPLACE button, I'm unable to do so because the member disks drop down list is empty (see attached image)

Any ideas at this point are appreciated
 

Attachments

  • Screen Shot 2014-10-10 at 11.39.37 PM.png
    Screen Shot 2014-10-10 at 11.39.37 PM.png
    52.5 KB · Views: 234

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I don't know, but onlining a disk from the command line is a good way to end up in some weird condition that FreeNAS isn't designed to be in. To be honest, it makes me wonder what other commands you may have done that may be making this problem worse. 99% of the time people don't make a CLI command mistake just once. You'll notice that nowhere in the FreeNAS documentation does it tell you to do a zpool online. ;) For the record, replacing a failed disk should require a total of *zero* commands from the command line. So if you used *any* commands from the command line you are already in a bad way with your pool and FreeNAS' config file.

After resilvering is complete the new disk will already be online (the only way a disk will be active in a pool is in its online status). So maybe you should be looking into why the disk is going offline.

Your comments are a little confusing. You mention UFS for some reason as well as esata. It's entirely possible that your esata is causing communications problems resulting in ZFS failing the disk due to an excessive number of errors. Esata is something that in theory works fine, but in practice is not something many will actively recommend.

Anyway, your confusing comments may invalidate one or more parts of my comments as I'm not 100% sure I even understand what you are saying. To boot, your zpool status output is borked because you didn't put it in CODE tags.
 

PierceIt

Dabbler
Joined
Dec 25, 2013
Messages
28
i guess i'm too confused to know what part is confusing ;)

I didn't enter any CLI commands during the process whatsoever other than the zpool online command when i saw that even after the resilvering that the new disk was still UNAVAILABLE (n an attempt to bring it online - I mean, the zpool status message even said "Attach the missing device and online it using 'zpool online'." -- so that's what I did. But it did nothing.

Any idea why the GUI's drop list for member disks is blank? (see attached pic) Or any idea how to add a 5th disk back into my 5 disk raidz1 pool?

(the external esata disk is just a 2TB drive I use to make a backup of the important stuff on my raidz1 volume -- the backup drive is just a UFS volume. I do this because contrary to popular belief RAID is not a backup strategy. I want a backup of my content that's all, so i run a cron job with a rysnch command each night to copy stuff over to the esata drive.)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
i guess i'm too confused to know what part is confusing ;)

I didn't enter any CLI commands during the process whatsoever other than the zpool online command when i saw that even after the resilvering that the new disk was still UNAVAILABLE (n an attempt to bring it online - I mean, the zpool status message even said "Attach the missing device and online it using 'zpool online'." -- so that's what I did. But it did nothing.

Ok that makes sense. Of course the zpool online command shouldn't have been necessary. And even if it had been necessary you shouldn't have run it from the CLI. But it sounds like there isn't any lasting problems from what you did.

Any idea why the GUI's drop list for member disks is blank? (see attached pic) Or any idea how to add a 5th disk back into my 5 disk raidz1 pool?

There's a bunch of reasons. They range from FreeNAS thinking the disk is part of a pool already, the disk isn't detected, FreeNAS thinks the disk is not appropriate for the pool, or the disk already has a valid partition table and partitions. Those aren't all of the reasons either.

(the external esata disk is just a 2TB drive I use to make a backup of the important stuff on my raidz1 volume -- the backup drive is just a UFS volume. I do this because contrary to popular belief RAID is not a backup strategy. I want a backup of my content that's all, so i run a cron job with a rysnch command each night to copy stuff over to the esata drive.)

So if I'm getting you straight, you have 6 disks attached to your server; 5 internal and the 2TB external (plus the USB, but I don't consider that a "disk"). If so, then I would say that the 5th disk is missing because it's not being detected by the system. The command "camcontrol devlist" lists all currently available disks. Seeing as you have a Seagate, 3 WDs, a Hitachi, and your USB I don't see a 6th disk. So either you have another disk failing/failed or something else is wrong. I don't know if your hardware suupports hotswap as you didn't list your hardware or FreeNAS version.

Can you attach the debug file from your FreeNAS server?

Right now, it just looks like a disk is not being detected by your system. Now whether it was detected and later disconnected is one of several questions I have. The debug will tell me what your server saw at bootup and what has changed. So that should give some clues.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Oh, and a word of advice. Do *not* just run CLI commands like you did because something said to. That can end very badly. Quite a few commands are such that once done can not be undone and many people have permanently lost their data. ;)
 

PierceIt

Dabbler
Joined
Dec 25, 2013
Messages
28
Yep, you've got it right about the 6 disks and the missing disk being the one i am trying to bring online but I just want to point out for clarity that there are two volumes: one called "volume1" with 5 disks (Raidz1) that are all SATA connected to the motherboard and another volume called "backup" with 1 disk (UFS) that is eSATA connected

I'm running:
FreeNAS-9.2.1.7-RELEASE-x64
Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
8105MB (it's not ECC - i wish)
It's a Dell XPS 420 desktop

thanks for the help - the debug file is attached here
 

Attachments

  • debug-freenas-20141011004816.tar.gz
    315.5 KB · Views: 209

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
So what I'd do is figure out which disk isn't being detected in your FreeNAS box and test it on another box. Maybe check the power and data cables and make sure they are snug and stuff.
 

PierceIt

Dabbler
Joined
Dec 25, 2013
Messages
28
So i've heard stories of lightening striking twice before and two drives going out at once or a new batch of drives being bad on delivery but I never really believed the odds were such that half those stories could ever be real. As a career IT guy I'm always suspicious that the real reason is because someone didn't know their shit somewhere and made a bad change as being the real root cause and they eventually got lucky by replacing stuff until the problem was gone.

Well lightening struck twice here and I'm now a believer that is can happen.

The NEW drive I was using to replace my old one was bad from the get-go.

I discovered this after trying everything i could think of (taking the drive out and putting it back in, replacing my SATA cable, and a million other things), i finally took the NEW drive out of the machine and hooked it up to another PC in the house and ran the Open Source tool "TestDisk" and a few other tools against it and discovered MANY sectors were unusable leaving it with far less disk space than the 1TB advertised (a little over 400 GB usable on a brand new 1TB drive which had to be totally was screwing up the resilvering process even though freenas completed the resilvering on it every time without error. It just wouldn't online the drive afterwards.)

I took the drive back to Best Buy and exchanged it and after bringing home the 2nd NEW drive and plugging it into my freeness and letting it resilver it came right online and my volume is healthy and happy again - zero data loss.

So, the disk in the freeness went bad, the new disk i bought to replace the bad one was Dead On Arrival, and the 3rd disk was fine and good and freeness worked exactly as advertised (albeit it would have been nice to get a message telling me that the amount of usable space on the drive was less than required for that pool, but i suspect the drive was telling freenas it was 1tb even when it wasn't)

again - I'm Raidz1 (i know, i know) and I lost a drive and was running one drive down on my volume for several days and in the end had zero data loss - so I'm feeling lucky about that.

(next up - buy bigger drives and figuring out how to move my content to a Raidz2 volume instead of Raidz1 - this all made me way to nervous)

Cyberjock - thanks again
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Now you see why I publicly admonish anyone that goes with RAIDZ1? People always want to play the "but I'm better than the odds card" but you aren't. The universe doesn't favor one person over another "just because". It's dumb luck, and all too often dumb luck bites people in the booty. ;)
 
Status
Not open for further replies.
Top