Uh-oh - Need to replace drive...please help

Status
Not open for further replies.

SilverJS

Patron
Joined
Jun 28, 2011
Messages
255
Hi there,

OK - so, I was gone for the weekend, and when I came back tonight, had this nasty surprise in my daily freenas e-mail :

Checking status of zfs pools:
pool: Raidz2
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: scrub completed after 8h0m with 0 errors on Wed Feb 22 03:20:22 2012
config:

NAME STATE READ WRITE CKSUM
Raidz2 ONLINE 0 0 0
raidz2 ONLINE 0 0 0
ada0p2 ONLINE 0 0 0
ada1p2 ONLINE 0 0 0
ada3p2 ONLINE 0 0 0
ada4p2 ONLINE 0 0 0
ada5p2 ONLINE 0 0 0
ada6p2 ONLINE 3 813K 0
spares
ada7p2 AVAIL

errors: No known data errors

Had the same thing the day before, with the same drive being affected (ada6p2), with very slightly different numbers. Everything still works. So, I'm guessing faulty drive. That's fine...but, the security output goes like this :

freenas.local kernel log messages:
+++ /tmp/security.dmGAPA7C 2012-03-18 03:01:10.000000000 -0600
+xptioctl: pass driver is not in the kernel
+xptioctl: put "device pass" in your kernel config file
+xptioctl: pass driver is not in the kernel
+xptioctl: put "device pass" in your kernel config file
+xptioctl: pass driver is not in the kernel
+xptioctl: put "device pass" in your kernel config file
+xptioctl: pass driver is not in the kernel

...on and on and on, for several lines. Is this also related to the (assumed) faulty drive?

So - assuming it's a faulty drive, is there somewhere that relates specifically how to replace a drive? Also, is there a quick way to find out which of my 7 drives ada6p2 is?

Thanks! =)
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Silver,

Here's a link to the documentation for replacing a drive: http://doc.freenas.org/index.php/Volumes#Replacing_a_Failed_Drive

The screenshot for viewing your volumes shows the serial numbers next to each disk, write that down for your ada6 disk, and then look for the serial number on the stickers of each disk until you find the one that matches.

It it possible the GUI will fail to replace the disk and not notify you or give you any clue it has failed.

What version are you using?
What version were you using when you created your pool?
If there's not a lot of disk activity after you click "replace", then it might have failed.
In my case I created my pool on version 8.0 and the swap partition was 1GB, but when I recently replaced my disk the swap size was 2GB. So the GUI created the 2GB swap partition and then couldn't continue because there wasn't enough space to create the parititon for ZFS. I had to complete it manually from the command line. I'm not sure if this bug has been fixed yet.

Post back if you have trouble and me or someone else can help.

-- Proto

PS - Here's a little tutorial I wrote with different commands for managing disks in FreeNAS. It doesn't specifically cover disk replacement, but it's a useful reference:

http://protosd.blogspot.com/2011/12/useful-commands-for-diagnosingmanaging.html
 

SilverJS

Patron
Joined
Jun 28, 2011
Messages
255
Silver,

Here's a link to the documentation for replacing a drive: http://doc.freenas.org/index.php/Volumes#Replacing_a_Failed_Drive

The screenshot for viewing your volumes shows the serial numbers next to each disk, write that down for your ada6 disk, and then look for the serial number on the stickers of each disk until you find the one that matches.

It it possible the GUI will fail to replace the disk and not notify you or give you any clue it has failed.

What version are you using?
What version were you using when you created your pool?
If there's not a lot of disk activity after you click "replace", then it might have failed.
In my case I created my pool on version 8.0 and the swap partition was 1GB, but when I recently replaced my disk the swap size was 2GB. So the GUI created the 2GB swap partition and then couldn't continue because there wasn't enough space to create the parititon for ZFS. I had to complete it manually from the command line. I'm not sure if this bug has been fixed yet.

Post back if you have trouble and me or someone else can help.

-- Proto

PS - Here's a little tutorial I wrote with different commands for managing disks in FreeNAS. It doesn't specifically cover disk replacement, but it's a useful reference:

http://protosd.blogspot.com/2011/12/useful-commands-for-diagnosingmanaging.html

Hi Protos,

Much appreciated.

However, before we go any further : Slight problem with identifying the disk. The first five drives' serial numbers are identified, but the last two are not - presumably because they are plugged in a PCI-E SATA expansion card. I guess that, if worst comes to worst, I could replace them both....?

In any case - to answer your questions, I created the pool with 8.0.1, I believe, and am currently using 8.0.2. I do not have an "offline" button next to all of my disks as per the documentation - so I do not know how to take a device offline via GUI. Is this a big deal? I'm sure there is a CLI equivalent, but I am hesitant to upgrade while the pool is in a degraded state.

(BTW - I am now officially in a "Degraded" state with the pool. Tried a zpool clear Raidz2 ada6p2 command, and ada6p2 was not available.)

So - the whole thing about :

+xptioctl: put "device pass" in your kernel config file
+xptioctl: pass driver is not in the kernel

is a red herring, is it? Is it related to the drive failure?

Thanks!
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
However, before we go any further : Slight problem with identifying the disk. The first five drives' serial numbers are identified, but the last two are not - presumably because they are plugged in a PCI-E SATA expansion card.

You could try from the command line "smartctl -i /dev/ada6" and look for the serial number in the output

I do not have an "offline" button next to all of my disks as per the documentation - so I do not know how to take a device offline via GUI. Is this a big deal?

I would take it offline just to be safe. You can do "zpool offline Raidz2 /dev/ada6"



So - the whole thing about :

+xptioctl: put "device pass" in your kernel config file
+xptioctl: pass driver is not in the kernel

is a red herring, is it? Is it related to the drive failure?

Thanks!

I'd have to search on that, I'm not really sure.
 

SilverJS

Patron
Joined
Jun 28, 2011
Messages
255
OK. So, basically, if I understand correctly, I would offline ada6, and then open up the computer, take it out and replace it, and then online it with the same command (replace "offline" with "online", I guess?). Is this right?

(I think ada6 is not available - the computer can't seem to access it. But, I guess that if I have serial numbers for the first five drives from the GUI, and then I can get the 7th using your command line, then I get get to ada6 by elimination.)

What about the degraded status of the pool then? How would the new hard drive be integrated? A simple scrub?

Also - I have a hot spare. ada7. How do I activate it? Is this not the time to use this?

Thanks!
 

SilverJS

Patron
Joined
Jun 28, 2011
Messages
255
OK. So just tried to "see" my drives using the smartctl -i /dev/ada/6 and /dev/ada7 command - but, I got this! :

+xptioctl: put "device pass" in your kernel config file
+xptioctl: pass driver is not in the kernel

So, looks like that has some role to play - but what?
 

SilverJS

Patron
Joined
Jun 28, 2011
Messages
255
OK, so all is well, I believe. The drive failed entirely, easily allowing me to see it in the BIOS (SATA5, in my case). Replaced it - forgot to offline it, actually! - and re-booted. zpool status showed a drive at ada6p2 showing the full serial number (or some long 10-digit number, anyways) with the comment "was /dev/ada6". In the GUI, I simply clicked on the "Replace" button for the new disk, which seemed to add it to the pool successfully. All that was left then was to detach the old one (the 10-digit number).

It's currently re-silvering. Sweet! =)

Sorry for the rambling. Posted all this half for others who might eventually stumble upon this, and half for myself as a future placeholder. =)

Cheers!
 

lrusak

Explorer
Joined
Dec 20, 2011
Messages
56
I'm just curious how much data you have on your Z2 array and how long it took to resliver the new drive? and perhaps what CPU you are using and how much RAM?

Does it notify you what reslivering is finished?
 

SilverJS

Patron
Joined
Jun 28, 2011
Messages
255
Hey there,

The pool is 10.7 TB big (that's with 6 X 3TB hard drives in a RAIDZ2 array, plus a hot spare - total 7 drives), and I'm currently using a bit over 4 TB of it. Resilvering and scrubbing looks like they take similar amounts of time - 8 or so hours. I'm using this with an Asus M4A88T-M motherboard, 8 Gigs of RAM, Athlon X2 255 processor.

I don't know if I'll get an actual notification when it's finished - probably not, I'd assume. Also, as far as I know, there's no way to check the resilvering process through the GUI - I have to go check it at the actual computer using command line.

I'm still waiting for a day when we'll be able to issue command line commands through the GUI. =)

Hope that answers your questions.
 
Status
Not open for further replies.
Top