Degraded Status

tom__w · Jul 5, 2018

I have a RAIDZ1-0 pool that is showing DEGRADED. In Volume Status I see a drive is UNAVAIL.

I have numerous open drive slots so I populated them with similar drives. When I select the UNAVAIL drive and then select Replace, no other drives show up as being available.

I am running 9.1.0 (I know .. I know .. I am going to upgrade soon).

I am trying to avoid restarting the FreeNAS OS as I have numerous VMs attached via ISCSCI.

Am I missing a step?

T

tom__w · Jul 6, 2018

Update:

I started a scrub while the UNAVAIL drive was still in. It is slated to run for 30+ more hours.

I have physically replaced the bad drive with a new one and I want to do a replace. Is it safe to stop the scrub or should I wait for it to finish before doing the replace?

HoneyBadger · Jul 6, 2018

tom__w said:
I am running 9.1.0

Jinkies, Scoob.

Did you OFFLINE the disk in the GUI before pulling it and hitting REPLACE, or did you pull it in an UNAVAIL state?

tom__w · Jul 6, 2018

* Saw that it was UNAVAIL
* Started scrub
* Physically pulled drive and physically replaced with new drive (same slot)
* DID NOT run 'Replace Drive'
* Volume Status takes FOREVER and sometimes does not respond (i.e. constant 'Loading....')

Scrub is still running.

HoneyBadger · Jul 6, 2018

Can I get a zpool status?

I'm a bit concerned as you mentioned this is a RAIDZ1 vdev so you effectively have no fault-tolerance any longer.

(I'm also concerned that you might be running VMs on RAIDZ1. That's not good for performance even when the pool is healthy.)

tom__w · Jul 6, 2018

I know that I do not have any redundancy now. I am looking to rectify that quickly...

[root@FreeNAS ~]# zpool status
pool: ZFS1
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: scrub in progress since Thu Jul 5 19:28:17 2018
2.25T scanned out of 5.26T at 32.5M/s, 27h0m to go
0 repaired, 42.71% done
config:

NAME STATE READ WRITE CKS
UM
ZFS1 DEGRADED 0 0
0
raidz1-0 DEGRADED 0 0
0
gptid/f1b710f8-9fe8-11e3-a3f8-0022192c137f ONLINE 0 0
0
gptid/f3e25521-9fe8-11e3-a3f8-0022192c137f ONLINE 0 0
0
gptid/f5557a55-9fe8-11e3-a3f8-0022192c137f ONLINE 0 0
0
gptid/f726437d-9fe8-11e3-a3f8-0022192c137f ONLINE 0 0
0
319796514190350543 UNAVAIL 0 0
0 was /dev/gptid/f8fb335e-9fe8-11e3-a3f8-0022192c137f

errors: No known data errors

tom__w · Jul 6, 2018

I am ready to 'Replace Drive' but I am pretty sure the scrub is taking all my resources, hence my desire to cancel the scrub and do the replace.

HoneyBadger · Jul 6, 2018

tom__w said:
I am ready to 'Replace Drive' but I am pretty sure the scrub is taking all my resources, hence my desire to cancel the scrub and do the replace.

My vote is to cancel the scrub and do the REPLACE now. Even if the scrub finds bad data, it won't be able to repair it as it doesn't have the necessary parity data.

tom__w · Jul 6, 2018

Stopped the scrub and replaced the drive. It's doing its thing. I think it will take forever...

Is it really going to take 300 hours??

pool: ZFS1
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Jul 6 15:56:27 2018
1.44G scanned out of 5.27T at 4.94M/s, 310h42m to go
262M resilvered, 0.03% done
config:

NAME STATE READ WRITE C
KSUM
ZFS1 DEGRADED 0 0
0
raidz1-0 DEGRADED 0 0
0
gptid/f1b710f8-9fe8-11e3-a3f8-0022192c137f ONLINE 0 0
0
gptid/f3e25521-9fe8-11e3-a3f8-0022192c137f ONLINE 0 0
0
gptid/f5557a55-9fe8-11e3-a3f8-0022192c137f ONLINE 0 0
0
gptid/f726437d-9fe8-11e3-a3f8-0022192c137f ONLINE 0 0
0
replacing-4 UNAVAIL 0 0
0
319796514190350543 UNAVAIL 0 0
0 was /dev/gptid/f8fb335e-9fe8-11e3-a3f8-0022192c137f
gptid/2068c46a-8135-11e8-bff6-00074306cd37 ONLINE 0 0
0 (resilvering)

errors: No known data errors

HoneyBadger · Jul 6, 2018

tom__w said:
Is it really going to take 300 hours??

Depending on how full your pool is and the speed of your drives, yes. RAIDZ vdevs take significantly longer to resilver than mirrors, and you have other concurrent I/O (numerous VMs attached via iSCSI) competing for disk time.

Keep a close eye on your zpool status as well as your SMART reports for the drive temperatures during the resilver process, then kick off a scrub afterwards to make sure things are good.

Once that's done, maybe we can talk about getting your VMs off of RAIDZ. ;)

tom__w · Jul 6, 2018

OK. Thanks.

Can I also now detach the old drive? It still shows up in Volume Status.

HoneyBadger · Jul 6, 2018

Wait for the resilver to complete; the old drive should be automatically detached when it finishes, assuming you kicked off the replacement via the GUI.

tom__w · Jul 6, 2018

Thanks. And I keep my server room around 58 - 60 degrees so I think the drive temps should be fine.

HoneyBadger · Jul 6, 2018

Just as long as that's "degrees Fahrenheit"

tom__w · Jul 6, 2018

Important Announcement for the TrueNAS Community.

Degraded Status

tom__w

Explorer

tom__w

Explorer

HoneyBadger

actually does care

tom__w

Explorer

HoneyBadger

actually does care

tom__w

Explorer

tom__w

Explorer

HoneyBadger

actually does care

tom__w

Explorer

HoneyBadger

actually does care

tom__w

Explorer

HoneyBadger

actually does care

tom__w

Explorer

HoneyBadger

actually does care

tom__w

Explorer

Similar threads

Important Announcement for the TrueNAS Community.

Degraded Status

Explorer

Explorer

actually does care

Explorer

actually does care

Explorer

Explorer

actually does care

Explorer

actually does care

Explorer

actually does care

Explorer

actually does care

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Degraded Status"

Similar threads