volume (ZFS) state is DEGRADED - Failed Drive ?

chasbaci · Jul 1, 2016

Hello Everyone,

This morning I received a few emails from my server, I am needing guidance, not sure if the drive has failed or just the volume and it is asking to be rebuilt. My assumption would be that the disk is failing. Any help would be greatly appreciated. The following is a snap shot of the emails, the second email shows the dive that I am assuming needs to be replaced.

Thanks for looking and your support

Charles.

First Email:

Checking status of zfs pools:
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
TANK1 16.2T 4.85T 11.4T - 11% 29% 1.00x DEGRADED /mnt
freenas-boot 55.5G 5.24G 50.3G - - 9% 1.00x ONLINE -

pool: TANK1
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: scrub repaired 0 in 3h20m with 0 errors on Sun Jun 5 03:20:31 2016
config:

NAME STATE READ WRITE CKSUM
TANK1 DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
gptid/a761b713-c428-11e4-b25c-002590fcbe82 ONLINE 0 0 0
14402093455005627706 UNAVAIL 0 0 0 was /dev/gptid/a7dec147-c428-11e4-b25c-002590fcbe82
gptid/a869d614-c428-11e4-b25c-002590fcbe82 ONLINE 0 0 0
gptid/a8e28a35-c428-11e4-b25c-002590fcbe82 ONLINE 0 0 0
gptid/a9604c17-c428-11e4-b25c-002590fcbe82 ONLINE 0 0 0
gptid/a9dee6b5-c428-11e4-b25c-002590fcbe82 ONLINE 0 0 0

errors: No known data errors

-- End of daily output --

Second Email:

freenas.workgroup changes in mounted filesystems:
0a1
> /mnt/TANK1/Data /mnt/TANK1/jails/WebServer/media nullfs rw 0 0
1a3
> /mnt/TANK1/VMStorage/WebServer/UbuntuServer /mnt/TANK1/jails/WebServer/mnt nullfs rw 0 0
24a27
> devfs /mnt/TANK1/jails/WebServer/dev devfs rw,multilabel 0 0
29a33
> procfs /mnt/TANK1/jails/WebServer/proc procfs rw 0 0

freenas.workgroup kernel log messages:
> (aprobe1:ahcich1:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
> (aprobe1:ahcich1:0:0:0): CAM status: ATA Status Error
> (aprobe1:ahcich1:0:0:0): ATA status: 71 (DRDY DF SERV ERR), error: 04 (ABRT )
> (aprobe1:ahcich1:0:0:0): RES: 71 04 00 00 00 40 00 00 00 00 00
> (aprobe1:ahcich1:0:0:0): Retrying command
> (aprobe1:ahcich1:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
> (aprobe1:ahcich1:0:0:0): CAM status: ATA Status Error
> (aprobe1:ahcich1:0:0:0): ATA status: 71 (DRDY DF SERV ERR), error: 04 (ABRT )
> (aprobe1:ahcich1:0:0:0): RES: 71 04 00 00 00 40 00 00 00 00 00
> (aprobe1:ahcich1:0:0:0): Error 5, Retries exhausted
> ada1 at ahcich2 bus 0 scbus2 target 0 lun 0
> ada1: Serial Number Z500EMAY
> ada1: Previously was known as ad8 <------- THIS IS THE BAD DRIVE ????
> ada2 at ahcich4 bus 0 scbus4 target 0 lun 0
> ada2: Serial Number Z500GQ1B
> ada2: Previously was known as ad12
> ada3 at ahcich5 bus 0 scbus5 target 0 lun 0
> ada3: Serial Number Z500ENP1
> ada3: Previously was known as ad14
> ada4 at ahcich6 bus 0 scbus6 target 0 lun 0
> ada4: Serial Number Z500EMZM
> ada4: Previously was known as ad16
> ada5 at ahcich7 bus 0 scbus7 target 0 lun 0
> ada5: <Corsair Force LS SSD S9FM02.0> ACS-3 ATA SATA 3.x device
> ada5: Serial Number 15058168000101673023
> ada5: 57241MB (117231408 512 byte sectors)
> ada5: Previously was known as ad18
> SMP: AP CPU #11 Launched!
> SMP: AP CPU #5 Launched!
> SMP: AP CPU #10 Launched!
> SMP: AP CPU #6 Launched!
> Timecounter "TSC-low" frequency 1750034232 Hz quality 1000
> vboxdrv: fAsync=0 offMin=0x372 offMax=0x1288

-- End of security output --

Nick2253 · Jul 1, 2016

What makes you think ada1 is the bad drive?

chasbaci said:
14402093455005627706 UNAVAIL 0 0 0 was /dev/gptid/a7dec147-c428-11e4-b25c-002590fcbe82

Your problem is that this drive is no longer detected by the system (unavailable). You are still running (but in a degraded state), and need to either reattach that drive (assuming it's still good), or replace it with a suitable spare.

If that drive failed, the biggest concern is that another drive is right behind it. Have you been running periodic SMART tests on your drives? Have they been passing?

BigDave · Jul 1, 2016

First out of the gate is the fact that you are using RAIDz1

You need to be cautious and make sure you have a backup
of your data before attempting to replace a failed drive.

You do have a backup right?

chasbaci · Jul 1, 2016

No I have not setup SMART testing. Also, its more likely that it is a drive issue, when I built this machine i used cheep Seagate drives in order to purchase a better quality motherboard and memory, So now I plan on purchasing some WD reds.

Thanks for the help.

chasbaci · Jul 1, 2016

No backup, in the process of doing that as we speak, I am also going to finally break down and purchase a cheep WD NAS for routine backup's.

chasbaci · Jul 1, 2016

Here is my game play, I will be attempting to do a complete backup of the volume, take down the server replace the drives with the new WD drives.

BigDave · Jul 1, 2016

chasbaci said:
No backup, in the process of doing that as we speak.

When your backup is secured, from the CLI see the results of # glabel status

This command prints out gptid and da/ada labels for all drives, including
spares. Since your drive is not availble it will be absent from the list and
using serial numbers shown by smartctl, identify the culprit causing the degraded state. The manual has step by step instructions for replacing
a failed drive.

chasbaci · Jul 1, 2016

Thank you

chasbaci · Jul 3, 2016

Needing some more advise, Finally finished backing up the 5 tb of data off the Degraded volume, and now the failed drive has been replaced and my server is resilvering the new drive. My question is there a way to completely copy or export the volume to another computer? I want to replace the remaining drives with the WD reds.

Again thanks for all the support

Robert Trevellyan · Jul 4, 2016

chasbaci said:
is there a way to completely copy or export the volume to another computer?

You can send the output of zfs send to another system via SSH, assuming there is enough storage available, then use zfs receive to bring it back later.

EDIT: however, this is not necessary if your goal is merely to replace the drives while retaining your existing vdev layout. Just follow the directions for replacing drives to grow your pool.

chasbaci · Jul 6, 2016

Update - Problem completely resolved, I ended up removing and re-silvering each new drive, which took 3 days. Now I am running WD Red Pro's.

Thanks for the Community Support.

Stux · Jul 11, 2016

Now that you have a backup, you should consider switching to raidz2 so you don't have to sweat bullets when you lose s drive ;)

Ie backup the zpool, then recreate it in raidz2, then restore the zpool

You can use the old seagtes in your backup system if Younhave sufficient redundancy

chasbaci · Jul 22, 2016

Now I have two pools using my old drives, but last night I received an alert that my drivers are running too hot, its time for a new case. So after a lot of debate here is the new case that will be arriving on Friday next week:

http://www.supermicro.com/products/chassis/4U/747/SC747TG-R1400-SQ.cfm

Also added to the order 2 additional case fans. Now I need to plan the migration from old case to new over the next week, wish me luck.

Robert Trevellyan · Jul 25, 2016

chasbaci said:
wish me luck.

Good luck ;)

Important Announcement for the TrueNAS Community.

volume (ZFS) state is DEGRADED - Failed Drive ?

chasbaci

Dabbler

Nick2253

Wizard

BigDave

FreeNAS Enthusiast

chasbaci

Dabbler

chasbaci

Dabbler

chasbaci

Dabbler

BigDave

FreeNAS Enthusiast

chasbaci

Dabbler

chasbaci

Dabbler

Robert Trevellyan

Pony Wrangler

chasbaci

Dabbler

Stux

MVP

chasbaci

Dabbler

Robert Trevellyan

Pony Wrangler

Similar threads

Important Announcement for the TrueNAS Community.

volume (ZFS) state is DEGRADED - Failed Drive ?

Dabbler

Wizard

FreeNAS Enthusiast

Dabbler

Dabbler

Dabbler

FreeNAS Enthusiast

Dabbler

Dabbler

Pony Wrangler

Dabbler

MVP

Dabbler

Pony Wrangler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "volume (ZFS) state is DEGRADED - Failed Drive ?"

Similar threads