Spare Drive?

Status
Not open for further replies.

kspare

Guru
Joined
Feb 19, 2015
Messages
508
What is the purpose of being able to assign a spare drive?

It seems like when a drive fails it freenas doesn't use this spare. I see that there has been some talk about this happening and may come in freenas 10, but if it doesn't work why is there an option to mark a drive as a spare?
 
D

dlavigne

Guest
In theory, it was added to 9.3 a few weeks ago and is in the latest STABLE. I haven't tried it yet or heard of anyone who has so it hasn't been added to the docs yet. Let us know if you give it a shot.
 

kspare

Guru
Joined
Feb 19, 2015
Messages
508
I just did. I put a drive offline, and the spare didn't do anything.
 

kspare

Guru
Joined
Feb 19, 2015
Messages
508
I also had a drive fail out this morning, and it sat. I had to remove the drive from being a spare and use the replace function.
 
D

dlavigne

Guest
Are you updated to the latest SU? If so, please create a bug report at bugs.freenas.org and post the issue number here.
 

kspare

Guru
Joined
Feb 19, 2015
Messages
508
I updated to the very latest. I was only a few weeks out before, maybe I missed it. Once it's done resilvering i'll make one drive a spare and try taking a drive offline again and see what it does.

that should be enough to make the spare being active isn't it?
 

kspare

Guru
Joined
Feb 19, 2015
Messages
508
FreeNAS-9.3-STABLE-201503071634 was the build I was running this morning.

freenas has a drive fail due to problems with writes and it never rebuilt it automatically. Nor did it go after I removed the drive. I had to remove the spare drive and use the replace function. which then worked.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Has someone put a bug ticket in?
 

Cymike

Staphylococcus
Joined
Jun 24, 2013
Messages
13
My test system is running FreeNAS-9.3-STABLE-201503200528

I just created a pool with 2 hot spares. I then used the dd command to get the activity light going on one of them to help identify it. I then pulled it from the chassis mid-write. The system replaced it and resilvered in one of the spares.

Code:
Mar 25 11:31:42 philosoraptor kernel:   (da19:mps2:0:27:0): WRITE(10). CDB: 2a 00 00 74 f7 80 00 00 08 00 length 4096 SMID 660 termi
nated ioc 804b scsi 0 state c xfer 0                                                                                               
Mar 25 11:31:42 philosoraptor kernel:   (da19:mps2:0:27:0): READ(10). CDB: 28 00 02 db 4a 00 00 01 00 00 length 131072 SMID 621 term
inated ioc 804b scsi 0 state c xfer 106496                                                                                         
Mar 25 11:31:43 philosoraptor kernel: (da19:mps2:0:27:0): WRITE(10). CDB: 2a 00 00 74 f7 80 00 00 08 00                            
Mar 25 11:31:43 philosoraptor kernel: (da19:mps2:0:27:0): CAM status: CCB request aborted by the host                              
Mar 25 11:31:43 philosoraptor kernel: (da19:mps2:0:27:0): Retrying command                                                         
Mar 25 11:31:43 philosoraptor kernel: (da19:mps2:0:27:0): READ(10). CDB: 28 00 02 db 4a 00 00 01 00 00                             
Mar 25 11:31:43 philosoraptor kernel: (da19:mps2:0:27:0): CAM status: CCB request aborted by the host                              
Mar 25 11:31:43 philosoraptor kernel: (da19:mps2:0:27:0): Retrying command                                                         
Mar 25 11:31:43 philosoraptor kernel: da19 at mps2 bus 0 scbus2 target 27 lun 0                                                    
Mar 25 11:31:43 philosoraptor kernel: da19: <ATA WL600GLSA32100 5G04> s/n              LP70808 detached                            
Mar 25 11:31:48 philosoraptor zfsd: Replacing vdev(tank/16751431492801886184) with /dev/gptid/4e73d91e-d1cb-11e4-8676-00074308547a S
tate now REMOVED.                                                                                                                  
Mar 25 11:31:48 philosoraptor kernel: <118>Mar 25 11:31:48 philosoraptor zfsd: Replacing vdev(tank/16751431492801886184) with /dev/g
ptid/4e73d91e-d1cb-11e4-8676-00074308547a State now REMOVED.


Note the resilver time:s

Code:
# zpool status tank                                                                                          
  pool: tank                                                                                                                       
state: ONLINE                                                                                                                     
  scan: resilvered 6.09M in 0h0m with 0 errors on Wed Mar 25 11:31:48 2015                                                         
config:                                                                                                                            
                                                                                                                                   
        NAME                                            STATE     READ WRITE CKSUM                                                 
        tank                                            ONLINE       0     0     0                                                 
          raidz2-0                                      ONLINE       0     0     0                                                 
            gptid/4cabba92-d1cb-11e4-8676-00074308547a  ONLINE       0     0     0                                                 
            gptid/4d0e8f58-d1cb-11e4-8676-00074308547a  ONLINE       0     0     0                                                 
            gptid/4d6d94f2-d1cb-11e4-8676-00074308547a  ONLINE       0     0     0                                                 
            gptid/4e73d91e-d1cb-11e4-8676-00074308547a  ONLINE       0     0     0                                                 
        spares                                                                                                                     
          gptid/fa1157a7-d257-11e4-8676-00074308547a    AVAIL                                                                      
                                                                                                                                   
errors: No known data errors  


If you're still having an issue with this, please file a bug.
 

Cymike

Staphylococcus
Joined
Jun 24, 2013
Messages
13
Heh... that's my 2U freenas test box. I name my freenas mini just "raptor" but did not name the pool clevergirl.

... I'll let myself out now.
 

pjc

Contributor
Joined
Aug 26, 2014
Messages
187
It's my understanding that zfsd should take care of this and it's been in for a few weeks now
What inspired the migration of this over from TrueNAS? I'm happy to see it, and slightly amazed that you have it at all, since the last I heard from mainline FreeBSD on it was in August.

The bigger question for me is: does it do anything if you don't have a spare? I'd love to get immediate alerts on drive failure, regardless of whether the pool has a spare.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
What inspired the migration of this over from TrueNAS? I'm happy to see it, and slightly amazed that you have it at all, since the last I heard from mainline FreeBSD on it was in August.
spare.

It was always promised to be made available for FreeNAS. It was an issue of not being ready for deployment in a production environment (read: probably buggy and/or needed more dev work).

If it appeared in TrueNAS first, that's a little odd, as FreeNAS usually seems to be the testing ground for TrueNAS.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
In theory, it was added to 9.3 a few weeks ago and is in the latest STABLE. I haven't tried it yet or heard of anyone who has so it hasn't been added to the docs yet. Let us know if you give it a shot.

I've had a spare in some pools for a long time. I always wanted to be able to command a rebuild from remote for important pools. A pool I built out of 4TB SATA desktop drives in 2013 just experienced a failure, and zfsd swooped into action.

Code:
kernel:         (da4:mps0:0:9:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 798 command timeout cm 0xffffff8000989e70 ccb 0xfffffe02470d5000
kernel:         (noperiph:mps0:0:4294967295:0): SMID 1 Aborting command 0xffffff8000989e70
kernel:         (da4:mps0:0:9:0): WRITE(10). CDB: 2a 00 f1 b5 3e 98 00 00 08 00 length 4096 SMID 84 terminated ioc 804b scsi 0 state c xfer 0
kernel: mps0: IOCStatus = 0x4b while resetting device 0xb
kernel: (da4:mps0:0:9:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
kernel: (da4:mps0:0:9:0): CAM status: Command timeout
kernel: (da4:mps0:0:9:0): Retrying command
kernel: da4 at mps0 bus 0 scbus3 target 9 lun 0
kernel: da4: <ATA ST4000DM000-1F21 CC51> s/n             Z3004T9A detached
zfsd: Replacing vdev(something/6215010294716060749) with /dev/gptid/7b94555a-28b6-11e3-a1b2-005056b349b2 State now REMOVED.
zfsd: Replacing vdev(something/6215010294716060749) with /dev/gptid/7b94555a-28b6-11e3-a1b2-005056b349b2 State now REMOVED.


So, yay, it works. Unyay, dead drive.
 
Status
Not open for further replies.
Top