Degraded Status

Status
Not open for further replies.

tom__w

Explorer
Joined
Mar 26, 2013
Messages
87
I have a RAIDZ1-0 pool that is showing DEGRADED. In Volume Status I see a drive is UNAVAIL.

I have numerous open drive slots so I populated them with similar drives. When I select the UNAVAIL drive and then select Replace, no other drives show up as being available.

I am running 9.1.0 (I know .. I know .. I am going to upgrade soon).

I am trying to avoid restarting the FreeNAS OS as I have numerous VMs attached via ISCSCI.

Am I missing a step?

T
 

tom__w

Explorer
Joined
Mar 26, 2013
Messages
87
Update:

I started a scrub while the UNAVAIL drive was still in. It is slated to run for 30+ more hours.

I have physically replaced the bad drive with a new one and I want to do a replace. Is it safe to stop the scrub or should I wait for it to finish before doing the replace?
 

tom__w

Explorer
Joined
Mar 26, 2013
Messages
87
* Saw that it was UNAVAIL
* Started scrub
* Physically pulled drive and physically replaced with new drive (same slot)
* DID NOT run 'Replace Drive'
* Volume Status takes FOREVER and sometimes does not respond (i.e. constant 'Loading....')

Scrub is still running.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Can I get a zpool status?

I'm a bit concerned as you mentioned this is a RAIDZ1 vdev so you effectively have no fault-tolerance any longer.

(I'm also concerned that you might be running VMs on RAIDZ1. That's not good for performance even when the pool is healthy.)
 

tom__w

Explorer
Joined
Mar 26, 2013
Messages
87
I know that I do not have any redundancy now. I am looking to rectify that quickly...

[root@FreeNAS ~]# zpool status
pool: ZFS1
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: scrub in progress since Thu Jul 5 19:28:17 2018
2.25T scanned out of 5.26T at 32.5M/s, 27h0m to go
0 repaired, 42.71% done
config:

NAME STATE READ WRITE CKS
UM
ZFS1 DEGRADED 0 0
0
raidz1-0 DEGRADED 0 0
0
gptid/f1b710f8-9fe8-11e3-a3f8-0022192c137f ONLINE 0 0
0
gptid/f3e25521-9fe8-11e3-a3f8-0022192c137f ONLINE 0 0
0
gptid/f5557a55-9fe8-11e3-a3f8-0022192c137f ONLINE 0 0
0
gptid/f726437d-9fe8-11e3-a3f8-0022192c137f ONLINE 0 0
0
319796514190350543 UNAVAIL 0 0
0 was /dev/gptid/f8fb335e-9fe8-11e3-a3f8-0022192c137f

errors: No known data errors
 

tom__w

Explorer
Joined
Mar 26, 2013
Messages
87
I am ready to 'Replace Drive' but I am pretty sure the scrub is taking all my resources, hence my desire to cancel the scrub and do the replace.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I am ready to 'Replace Drive' but I am pretty sure the scrub is taking all my resources, hence my desire to cancel the scrub and do the replace.
My vote is to cancel the scrub and do the REPLACE now. Even if the scrub finds bad data, it won't be able to repair it as it doesn't have the necessary parity data.
 

tom__w

Explorer
Joined
Mar 26, 2013
Messages
87
Stopped the scrub and replaced the drive. It's doing its thing. I think it will take forever...

Is it really going to take 300 hours??

pool: ZFS1
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Jul 6 15:56:27 2018
1.44G scanned out of 5.27T at 4.94M/s, 310h42m to go
262M resilvered, 0.03% done
config:

NAME STATE READ WRITE C
KSUM
ZFS1 DEGRADED 0 0
0
raidz1-0 DEGRADED 0 0
0
gptid/f1b710f8-9fe8-11e3-a3f8-0022192c137f ONLINE 0 0
0
gptid/f3e25521-9fe8-11e3-a3f8-0022192c137f ONLINE 0 0
0
gptid/f5557a55-9fe8-11e3-a3f8-0022192c137f ONLINE 0 0
0
gptid/f726437d-9fe8-11e3-a3f8-0022192c137f ONLINE 0 0
0
replacing-4 UNAVAIL 0 0
0
319796514190350543 UNAVAIL 0 0
0 was /dev/gptid/f8fb335e-9fe8-11e3-a3f8-0022192c137f
gptid/2068c46a-8135-11e8-bff6-00074306cd37 ONLINE 0 0
0 (resilvering)

errors: No known data errors
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Is it really going to take 300 hours??
Depending on how full your pool is and the speed of your drives, yes. RAIDZ vdevs take significantly longer to resilver than mirrors, and you have other concurrent I/O (numerous VMs attached via iSCSI) competing for disk time.

Keep a close eye on your zpool status as well as your SMART reports for the drive temperatures during the resilver process, then kick off a scrub afterwards to make sure things are good.

Once that's done, maybe we can talk about getting your VMs off of RAIDZ. ;)
 

tom__w

Explorer
Joined
Mar 26, 2013
Messages
87
OK. Thanks.

Can I also now detach the old drive? It still shows up in Volume Status.
 

tom__w

Explorer
Joined
Mar 26, 2013
Messages
87
Thanks. And I keep my server room around 58 - 60 degrees so I think the drive temps should be fine.

:smile:
 
Status
Not open for further replies.
Top