URGENT! One of my disks has failed, and not sure how to replace!

Status
Not open for further replies.

danzg

Contributor
Joined
Jun 18, 2011
Messages
105
Please help!

I'm using FreeNAS-8.0.4-RELEASE-p2-x64 (11367)

Console gave me this alert:

Code:
CRITICAL: The volume raid-5x3 (ZFS) status is DEGRADED


First, I'm wondering why didn't I get an email from the server when this condition first occurred??

I went to this FAQ: http://forums.freenas.org/faq.php?faq=general_freenas#faq_bad_drive

When I do :
Code:
zpool status


I get:
Code:
  pool: raid-5x3
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
     corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
     entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed after 3h25m with 7607009 errors on Sun Aug 12 06:26:44 2012
config:

     NAME                      STATE     READ WRITE CKSUM
     raid-5x3                  DEGRADED     0     0 7.29M
       raidz1                  DEGRADED     0     0 14.7M
         ada0p2                ONLINE       0     0     0
         10739480653363274060  FAULTED      0     0     0  was /dev/ada1p2
         ada2p2                ONLINE       0     0     0
         ada3p2                ONLINE       0     0     3  254M resilvered
         ada1p2                ONLINE       0     0     0

errors: 7607009 data errors, use '-v' for a list


Then when I do:
Code:
zpool offline raid-5x3 /dev/ada1p2


I get:
Code:
cannot offline /dev/ada1p2: no valid replicas


On the other hand, the freenas docs at http://doc.freenas.org/index.php/Volumes
say to go to Volume Status and click OFFLINE … but that's not even an option in my console!
Just "edit" and "replace"!

HELP!
 

ben

FreeNAS GUI Developer
Joined
May 24, 2011
Messages
373
The docs refer to 8.2.0 right now. Use the docs for 8.0.3 listed on the front page of doc.freenas.org, those will be the relevant ones to you.

Edit: Looks like you found the right docs. You need to plug the new disk in first and select that from the dropdown.
 

danzg

Contributor
Joined
Jun 18, 2011
Messages
105
The docs refer to 8.2.0 right now. Use the docs for 8.0.3 listed on the front page of doc.freenas.org, those will be the relevant ones to you.

Edit: Looks like you found the right docs. You need to plug the new disk in first and select that from the dropdown.

So you're saying I don't need to do anything prior to shutting down?
Just shut down, replace bad disk, then replace?

Thanks!
 

danzg

Contributor
Joined
Jun 18, 2011
Messages
105
OMG, now I'm scared!

I did a zpool status -v, and after a minute, the system said
"Permanent errors have been detected in the following files:"

and went on to list 2,660 files ... looks like just about EVERYTHING.

What happened? Is everything pooched?

I thought raidz was supposed to prevent this!

Also, there's things in there like:
raid-5x3/alpha:<0x0>
raid-5x3/alpha:<0xf5ec>
raid-5x3/alpha:<0xf5ea>
 

ben

FreeNAS GUI Developer
Joined
May 24, 2011
Messages
373
Er, that's not exactly what I meant. Ideally you'd have plugged the new disk in to another slot if you had one. It's possible for lots of errors to exist if you never performed a scrub while the array was intact and other write problems happened, though it would not normally be EVERYTHING.
 

danzg

Contributor
Joined
Jun 18, 2011
Messages
105
So what do you suggest I do now?
Add another disk?
Is our data recoverable?
 

danzg

Contributor
Joined
Jun 18, 2011
Messages
105
Also, I noticed I was getting these daily emails "daily run output", until about a month ago.

The last one on 7/6 said:
"Scrubbing of zfs pools:
skipping scrubbing of pool 'raid-5x3':
last scrubbing is 34 days ago, threshold is set to 35 days"

So I'm guessing the next day it did a scrub? Since then I stopped getting those emails...
 

ben

FreeNAS GUI Developer
Joined
May 24, 2011
Messages
373
Return to the previous state if you can. If you have an additional port for the new drive, use that.
 

ben

FreeNAS GUI Developer
Joined
May 24, 2011
Messages
373
Put it back how it was when it was only "degraded".
 

ben

FreeNAS GUI Developer
Joined
May 24, 2011
Messages
373
Oh, I thought you had switched out the dead disk. If you didn't change anything... what's the zpool status now?
 

danzg

Contributor
Joined
Jun 18, 2011
Messages
105
No I haven't done anything. Waiting for guidance. All the console outputs from my original post still apply...

Code:
zpool status
  pool: raid-5x3
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed after 3h25m with 7607009 errors on Sun Aug 12 06:26:44 2012
config:

	NAME                      STATE     READ WRITE CKSUM
	raid-5x3                  DEGRADED     0     0 7.29M
	  raidz1                  DEGRADED     0     0 14.7M
	    ada0p2                ONLINE       0     0     0
	    10739480653363274060  FAULTED      0     0     0  was /dev/ada1p2
	    ada2p2                ONLINE       0     0     0
	    ada3p2                ONLINE       0     0     3  254M resilvered
	    ada1p2                ONLINE       0     0     0

errors: 7607009 data errors, use '-v' for a list
 

ben

FreeNAS GUI Developer
Joined
May 24, 2011
Messages
373
So, plug in the new drive to a spare port, and then follow the directions for replacing a drive. That's why there was nothing in the field for replacement drive.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
No I haven't done anything. Waiting for guidance. All the console outputs from my original post still apply...
How many physical disks do you have in your NAS?

Code:
zpool status
  pool: raid-5x3
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed after 3h25m with 7607009 errors on Sun Aug 12 06:26:44 2012
config:

	NAME                      STATE     READ WRITE CKSUM
	raid-5x3                  DEGRADED     0     0 7.29M
	  raidz1                  DEGRADED     0     0 14.7M
	    ada0p2                ONLINE       0     0     0
	    10739480653363274060  FAULTED      0     0     0  was /dev/ada1p2
	    ada2p2                ONLINE       0     0     0
	    ada3p2                ONLINE       0     0     3  254M resilvered
	    ada1p2                ONLINE       0     0     0

errors: 7607009 data errors, use '-v' for a list
You have a 5 disk raidz1 array and it's DEGRADED. You can't offline ada1p2 as it's active in the array.
 

danzg

Contributor
Joined
Jun 18, 2011
Messages
105
How many physical disks do you have in your NAS?

...You have a 5 disk raidz1 array and it's DEGRADED. You can't offline ada1p2 as it's active in the array.

There are 5 disks in the NAS.
Sorry, why would I want to offline ada1p2?
 

danzg

Contributor
Joined
Jun 18, 2011
Messages
105
I'm still not clear what happened here ... is this a disk failure? Or data corruption? What could be the cause? Have we lost everything? How to prevent in future???

I'm kinda losing faith in this whole ZFS/FreeNas thing ... I don't see what we did wrong ... have we just toasted 8TB of data??
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Sorry, why would I want to offline ada1p2?
  • Then when I do:
    Code:
    zpool offline raid-5x3 /dev/ada1p2


    I get:
    Code:
    cannot offline /dev/ada1p2: no valid replicas
First, you can't offline it and second I don't know why you tried to.

From a SSH session as root paste the output of:
Code:
zpool status -v

camcontrol devlist

gpart show

glabel status
Throw the output inside of some [code][/code] tags as it will preserve the formatting and keep my eyes from crossing.
 

danzg

Contributor
Joined
Jun 18, 2011
Messages
105
Oh, I see what you're referring too. That 'zpool offline' was from that FAQ I read earlier, and I got 'ada1p2' because the 'failed' disk said "was /dev/ada1p2" next to it.

We just turned the server off; we'll reboot and run those diagnostics.
Thanks.
 
Status
Not open for further replies.
Top