Replaced disk, pool still says 'replacing' and is degraded

StephenFry · Apr 10, 2012

In my testing of a three drive RAIDZ, I had big trouble using a hdd that was previously in a zpool as replacement drive for a (simulated) failed disk.

So I went out and bought a brand-new, totally empty drive.

This time things went much more smoothly, but only up to a certain point.

Here is what I did:
I offlined the disk that was to be replaced. I did this in the CLI
Then I shut down, and physically replaced the old with the new drive.
Booted up and in the GUI clicked Replace next to the old drive and selected the new drive.
Again in the GUI, I detached the old drive.
This all went very smoothly, and I didn't get the weird responses like before.

Using the zpool status on the CLI, I monitored the process of resilvering, which took 3m on my small amount of testdata.

However, the pool is still now still DEGRADED:

NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
- raidz1 DEGRADED 0 0 0
-- replacing ONLINE 0 0 0
--- ada1p2 ONLINE 0 0 0
--- ada0p2 ONLINE 0 0 0
-- ada2p2 ONLINE 0 0 0
-- ada1p2 OFFLINE 0 0 0

errors: No known data errors
[root@freenas] ~#

(edited to add hyphens to show the right indents)

And when I go in the GUI and click on zpool status, I get "Sorry, an error occured".

I don't even know how to interpret the status data from the CLI, because it seems to me there are *TWO* ada1p2 drives. So when I online ada1p2, even though I can hear the drive being accessed, nothing actualy happens.

Please please please what do I do to get this pool out of degraded state? ZFS is doing my head in :(

StephenFry · Apr 10, 2012

Anyone have an idea? I have no clue how to get this pool in a healthy state.

ProtoSD · Apr 10, 2012

Can you post the output of "gpart show" ?

StephenFry · Apr 11, 2012

[root@freenas] ~# zpool status
pool: tank
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
- raidz1 DEGRADED 0 0 0
-- replacing ONLINE 0 0 0
--- ada1p2 ONLINE 0 0 0
--- ada0p2 ONLINE 0 0 0
-- ada2p2 ONLINE 0 0 0
-- ada1p2 OFFLINE 0 0 0

errors: No known data errors
[root@freenas] ~# gpart show
=> 34 3907029101 ada0 GPT (1.8T)
34 94 - free - (47K)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834703 2 freebsd-zfs (1.8T)

=> 34 3907029101 ada1 GPT (1.8T)
34 94 - free - (47K)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834703 2 freebsd-zfs (1.8T)

=> 34 3907029101 ada2 GPT (1.8T)
34 94 - free - (47K)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834703 2 freebsd-zfs (1.8T)

=> 63 31277169 ada3 MBR (15G)
63 1930257 1 freebsd [active] (943M)
1930320 63 - free - (32K)
1930383 1930257 2 freebsd (943M)
3860640 3024 3 freebsd (1.5M)
3863664 41328 4 freebsd (20M)
3904992 27372240 - free - (13G)

=> 0 1930257 ada3s1 BSD (943M)
0 16 - free - (8.0K)
16 1930241 1 !0 (943M)

[root@freenas] ~#

Note: As you can see, I currently boot FreeNAS from a 16GB SSD, so that plus the three 2TB drives are now in the machine.

StephenFry · Apr 11, 2012

Nothing I've tried fixes this issue, which is worrying me in case something like this should happen on a production machine.

Ultfris101 · Apr 11, 2012

Have you tried "zpool export tank" , "zpool import tank" from CLI

Or export from the GUI without destroying data and then do Auto import. Does that change?

I've had some similar types of issues when replacing drives and some combination of these things has worked.

For what it's worth, when I proactively replace a drive, I've been physically replacing it BEFORE removing the disk in the GUI. Seems to work better.

ProtoSD · Apr 11, 2012

Which was the disk that originally had trouble and you replaced?

I can understand your frustration, if it were my pool full of data I'd be a bit concerned. Maybe @louisk or @williamg will see this and have some ideas. You could try PCBSD or FreeBSD 9 which has a newer version of ZFS and import it there and see what it comes up with (export your pool first).

StephenFry · Apr 11, 2012

Ultfris101 said:
Have you tried "zpool export tank" , "zpool import tank" from CLI

Or export from the GUI without destroying data and then do Auto import. Does that change?

I've had some similar types of issues when replacing drives and some combination of these things has worked.

For what it's worth, when I proactively replace a drive, I've been physically replacing it BEFORE removing the disk in the GUI. Seems to work better.

Thank you for these ideas. I've tried all these things and without success. I think I will stop by the computer store tomorrow and buy (yet :) ) another drive and replace ada1p2, as it *seems* that is the drive behaving oddly. See what happens.
(This business of having (some) trouble identifying disks can be mitigated by glabel, I recently learned - which is something I will def use in a production NAS.)

Replacing without informing the GUI is a scenario I tried, (and will try again) but because I replaced the disk with "itself" (I erased the drive), things went horribly wrong due to there being 512b of data being left at the end of the disk that ZFS picked up and 'recognized' as belonging to the pool I was trying to degrade. But that's a weird situation and things worked fine when I tried the same with a brand new drive; the current situation worries me more.

@protosd I just downloaded an ISO for ZFSGuru, is that something worth trying or is that a dirty word here? ;)
It is a shame that FreeNAS isn't very robust when it comes to handling problems. It feels to me as a Windows admin like trying to fight the blue screens of early, barely networkable Windows versions versus a solid Server 2003 or 2008 system.

ProtoSD · Apr 11, 2012

StephenFry said:
@protosd I just downloaded an ISO for ZFSGuru, is that something worth trying or is that a dirty word here? ;)
It is a shame that FreeNAS isn't very robust when it comes to handling problems. It feels to me as a Windows admin like trying to fight the blue screens of early, barely networkable Windows versions versus a solid Server 2003 or 2008 system.

Nothing is a dirty word if it helps you solve your problem. I've heard mixed reviews of ZFSGuru also, but it does have ZFS v28 I think, which might help in sorting out your problem. I wouldn't do a "zpool upgrade", but you should be able to do an import/scrub and then if things look better, export it and go back to FreeNAS. Once 8.2 is ready and has ZFS v28, hopefully things will work a little better with FreeNAS. I think these kinds of issues should have had priority over Plugins, but so many people were screaming "I want plugins" or just trying to install packages from the command line and screwing things up, that I think the devs went with "oiling the squeaky wheel"....

StephenFry · Apr 11, 2012

Frustrating or not, overall I'm still having fun and with a forum like this to bounce ideas off, I have full confidence in having a rock-solid system soon as well as a good understanding of the basics. My prediction is that in two or three years, there will be ZFS implementations *everywhere*, so easy and comfortable that they can be used by our grannies.

It's about midnight here, so my NAS adventures for today are over, but tomorrow's a new day and I'll report back with my results.

StephenFry · Apr 12, 2012

Hm. This is odd.

I had already exported/imported the pool, but did it again, just because I have no idea what else to do and now this is the result:

[root@freenas] ~# zpool export tank
[root@freenas] ~# zpool status
no pools available
[root@freenas] ~# zpool import tank
cannot mount '/tank': failed to create mountpoint
cannot mount '/tank/JR': failed to create mountpoint
cannot mount '/tank/SR': failed to create mountpoint

If I go to the GUI and click on the now yellow Alert light, it says "WARNING: The volume tank (ZFS) is" - which is quite a useless message.

The Volumes are, as expected after the failure to create mountpoints, not available:

tank -- /mnt/tank -- None (Error) -- Error getting available space -- Error getting total space -- DEGRADED

What might be of interest though, is that in the GUI's View Disk screen, I get this numbering (in this order):
ID1 ada1
ID2 ada2
ID4 ada0

Is skipping ID3 something that should happen?

StephenFry · Apr 12, 2012

CANCEL ;)

I've decided to give up, destroy this pool and restart my testing. My other drives came in and now I will have two or three pools going at the same time. This should help me to quickly learn the difference between various actions.

Most important lesson learned: use the CLI for anything directly related to ZFS pools, the RAID layout and the drive geometry.

I'll be reporting back soon.

sivabalan · Dec 10, 2012

Hello StephenFry,

Here is my story.

http://forums.freenas.org/showthrea...in-Freenas-8-0-4&highlight=replace+hard+disks

after resilvering i am in a same situation as you.
I would like to get your inputs in this situation.

Pranam
Siva Balan

Important Announcement for the TrueNAS Community.

Replaced disk, pool still says 'replacing' and is degraded

StephenFry

Contributor

StephenFry

Contributor

ProtoSD

MVP

StephenFry

Contributor

StephenFry

Contributor

Ultfris101

Cadet

ProtoSD

MVP

StephenFry

Contributor

ProtoSD

MVP

StephenFry

Contributor

StephenFry

Contributor

StephenFry

Contributor

sivabalan

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Replaced disk, pool still says 'replacing' and is degraded

Contributor

Contributor

MVP

Contributor

Contributor

Cadet

MVP

Contributor

MVP

Contributor

Contributor

Contributor

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Replaced disk, pool still says 'replacing' and is degraded"

Similar threads