volume (ZFS) state is DEGRADED - Failed Drive ?

Status
Not open for further replies.

chasbaci

Dabbler
Joined
Feb 9, 2015
Messages
17
Hello Everyone,

This morning I received a few emails from my server, I am needing guidance, not sure if the drive has failed or just the volume and it is asking to be rebuilt. My assumption would be that the disk is failing. Any help would be greatly appreciated. The following is a snap shot of the emails, the second email shows the dive that I am assuming needs to be replaced.

Thanks for looking and your support

Charles.

First Email:

Checking status of zfs pools:
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
TANK1 16.2T 4.85T 11.4T - 11% 29% 1.00x DEGRADED /mnt
freenas-boot 55.5G 5.24G 50.3G - - 9% 1.00x ONLINE -

pool: TANK1
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: scrub repaired 0 in 3h20m with 0 errors on Sun Jun 5 03:20:31 2016
config:

NAME STATE READ WRITE CKSUM
TANK1 DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
gptid/a761b713-c428-11e4-b25c-002590fcbe82 ONLINE 0 0 0
14402093455005627706 UNAVAIL 0 0 0 was /dev/gptid/a7dec147-c428-11e4-b25c-002590fcbe82
gptid/a869d614-c428-11e4-b25c-002590fcbe82 ONLINE 0 0 0
gptid/a8e28a35-c428-11e4-b25c-002590fcbe82 ONLINE 0 0 0
gptid/a9604c17-c428-11e4-b25c-002590fcbe82 ONLINE 0 0 0
gptid/a9dee6b5-c428-11e4-b25c-002590fcbe82 ONLINE 0 0 0

errors: No known data errors

-- End of daily output --


Second Email:

freenas.workgroup changes in mounted filesystems:
0a1
> /mnt/TANK1/Data /mnt/TANK1/jails/WebServer/media nullfs rw 0 0
1a3
> /mnt/TANK1/VMStorage/WebServer/UbuntuServer /mnt/TANK1/jails/WebServer/mnt nullfs rw 0 0
24a27
> devfs /mnt/TANK1/jails/WebServer/dev devfs rw,multilabel 0 0
29a33
> procfs /mnt/TANK1/jails/WebServer/proc procfs rw 0 0

freenas.workgroup kernel log messages:
> (aprobe1:ahcich1:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
> (aprobe1:ahcich1:0:0:0): CAM status: ATA Status Error
> (aprobe1:ahcich1:0:0:0): ATA status: 71 (DRDY DF SERV ERR), error: 04 (ABRT )
> (aprobe1:ahcich1:0:0:0): RES: 71 04 00 00 00 40 00 00 00 00 00
> (aprobe1:ahcich1:0:0:0): Retrying command
> (aprobe1:ahcich1:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
> (aprobe1:ahcich1:0:0:0): CAM status: ATA Status Error
> (aprobe1:ahcich1:0:0:0): ATA status: 71 (DRDY DF SERV ERR), error: 04 (ABRT )
> (aprobe1:ahcich1:0:0:0): RES: 71 04 00 00 00 40 00 00 00 00 00
> (aprobe1:ahcich1:0:0:0): Error 5, Retries exhausted
> ada1 at ahcich2 bus 0 scbus2 target 0 lun 0
> ada1: Serial Number Z500EMAY
> ada1: Previously was known as ad8 <------- THIS IS THE BAD DRIVE ????
> ada2 at ahcich4 bus 0 scbus4 target 0 lun 0
> ada2: Serial Number Z500GQ1B
> ada2: Previously was known as ad12
> ada3 at ahcich5 bus 0 scbus5 target 0 lun 0
> ada3: Serial Number Z500ENP1
> ada3: Previously was known as ad14
> ada4 at ahcich6 bus 0 scbus6 target 0 lun 0
> ada4: Serial Number Z500EMZM
> ada4: Previously was known as ad16
> ada5 at ahcich7 bus 0 scbus7 target 0 lun 0
> ada5: <Corsair Force LS SSD S9FM02.0> ACS-3 ATA SATA 3.x device
> ada5: Serial Number 15058168000101673023
> ada5: 57241MB (117231408 512 byte sectors)
> ada5: Previously was known as ad18
> SMP: AP CPU #11 Launched!
> SMP: AP CPU #5 Launched!
> SMP: AP CPU #10 Launched!
> SMP: AP CPU #6 Launched!
> Timecounter "TSC-low" frequency 1750034232 Hz quality 1000
> vboxdrv: fAsync=0 offMin=0x372 offMax=0x1288

-- End of security output --
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
What makes you think ada1 is the bad drive?

14402093455005627706 UNAVAIL 0 0 0 was /dev/gptid/a7dec147-c428-11e4-b25c-002590fcbe82

Your problem is that this drive is no longer detected by the system (unavailable). You are still running (but in a degraded state), and need to either reattach that drive (assuming it's still good), or replace it with a suitable spare.

If that drive failed, the biggest concern is that another drive is right behind it. Have you been running periodic SMART tests on your drives? Have they been passing?
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
First out of the gate is the fact that you are using RAIDz1

You need to be cautious and make sure you have a backup
of your data before attempting to replace a failed drive.

You do have a backup right?
 

chasbaci

Dabbler
Joined
Feb 9, 2015
Messages
17
No I have not setup SMART testing. Also, its more likely that it is a drive issue, when I built this machine i used cheep Seagate drives in order to purchase a better quality motherboard and memory, So now I plan on purchasing some WD reds.

Thanks for the help.
 

chasbaci

Dabbler
Joined
Feb 9, 2015
Messages
17
No backup, in the process of doing that as we speak, I am also going to finally break down and purchase a cheep WD NAS for routine backup's.
 

chasbaci

Dabbler
Joined
Feb 9, 2015
Messages
17
Here is my game play, I will be attempting to do a complete backup of the volume, take down the server replace the drives with the new WD drives.
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
No backup, in the process of doing that as we speak.
When your backup is secured, from the CLI see the results of # glabel status

This command prints out gptid and da/ada labels for all drives, including
spares. Since your drive is not availble it will be absent from the list and
using serial numbers shown by smartctl, identify the culprit causing the degraded state. The manual has step by step instructions for replacing
a failed drive.
 

chasbaci

Dabbler
Joined
Feb 9, 2015
Messages
17
Thank you
 

chasbaci

Dabbler
Joined
Feb 9, 2015
Messages
17
Needing some more advise, Finally finished backing up the 5 tb of data off the Degraded volume, and now the failed drive has been replaced and my server is resilvering the new drive. My question is there a way to completely copy or export the volume to another computer? I want to replace the remaining drives with the WD reds.

Again thanks for all the support
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
is there a way to completely copy or export the volume to another computer?
You can send the output of zfs send to another system via SSH, assuming there is enough storage available, then use zfs receive to bring it back later.

EDIT: however, this is not necessary if your goal is merely to replace the drives while retaining your existing vdev layout. Just follow the directions for replacing drives to grow your pool.
 

chasbaci

Dabbler
Joined
Feb 9, 2015
Messages
17
Update - Problem completely resolved, I ended up removing and re-silvering each new drive, which took 3 days. Now I am running WD Red Pro's.

Thanks for the Community Support.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Now that you have a backup, you should consider switching to raidz2 so you don't have to sweat bullets when you lose s drive ;)

Ie backup the zpool, then recreate it in raidz2, then restore the zpool

You can use the old seagtes in your backup system if Younhave sufficient redundancy
 

chasbaci

Dabbler
Joined
Feb 9, 2015
Messages
17
Now I have two pools using my old drives, but last night I received an alert that my drivers are running too hot, its time for a new case. So after a lot of debate here is the new case that will be arriving on Friday next week:

http://www.supermicro.com/products/chassis/4U/747/SC747TG-R1400-SQ.cfm

Also added to the order 2 additional case fans. Now I need to plan the migration from old case to new over the next week, wish me luck.
 
Status
Not open for further replies.
Top