Why hot spare doesn't work?

Status
Not open for further replies.

Lucas Rey

Contributor
Joined
Jul 25, 2011
Messages
180
Dear community,
I have the following situation on my pool, with one missing disk. The pools has also a hot spare disk, but why this spare doesn't automatically replace the faulty disk? I tried to reboot but FreeNAS still not rebuild the pool using the spare. What am I missing? o_O

Code:
[root@freenas] ~# zpool status -v
  pool: tanktest
state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
  see: http://illumos.org/msg/ZFS-8000-2Q
  scan: none requested
config:
 
        NAME                                            STATE    READ WRITE CKSUM
        tanktest                                        DEGRADED    0    0    0
          raidz1-0                                      DEGRADED    0    0    0
            7037681134776256371                        UNAVAIL      0    0    0  was /dev/gptid/e1be5b90-7337-11e3-bf7e-000c294f851d
            gptid/e1ec0f48-7337-11e3-bf7e-000c294f851d  ONLINE      0    0    0
            gptid/e2163d88-7337-11e3-bf7e-000c294f851d  ONLINE      0    0    0
        spares
          gptid/e23dfd9d-7337-11e3-bf7e-000c294f851d    AVAIL 
 
errors: No known data errors
 
[root@freenas] ~# zpool online tanktest gptid/e23dfd9d-7337-11e3-bf7e-000c294f851d
cannot online gptid/e23dfd9d-7337-11e3-bf7e-000c294f851d: device is reserved as a hot spare
 

survive

Behold the Wumpus
Moderator
Joined
May 28, 2011
Messages
875
Hi Lucas,

Hot spare don't work in FreeNAS. There's a project to develop a process that will make hot spares work called "zfsd" that will do all the dirty work of detecting the failed drive & replacing it, but it's not done yet.

In your case, you are running raidz with a hot spare when you could have just done a raidz2 and gotten all the additional benefits of the "extra" parity protection.

-Will
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Actually, hot spares don't work in FreeBSD yet. It's expected in 10.0, but it was promised in 9.x before, then pushed back.
 

Lucas Rey

Contributor
Joined
Jul 25, 2011
Messages
180
Thanks both for reply and for clarification.
I setup the hot spare since I don't have space in my box for the 4th disk, so for security reason I just add an USB hot spare. BTW, at this point I'll remove the HS and then change the faulty disk. Hope this feature will be implemented soon! ;)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah.. USB is its own kind of "fail".
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
Actually, hot spares don't work in FreeBSD yet. It's expected in 10.0, but it was promised in 9.x before, then pushed back.
Don't hold your breath :(. While it is mentioned in the unofficial What's new for FreeBSD 10 wiki page ("ZFS fault monitoring and management daemon" in the Other changes section), FreeBSD 10 is already at RC4 and it was not merged yet. I don't expect it to be merged now, so late in the release cycle.
zfsd project codeline (zpool, zfs and zfsd): http://svnweb.freebsd.org/base/projects/zfsd/head/cddl/sbin/
stable/10 codeline (notice that zfsd is missing here): http://svnweb.freebsd.org/base/stable/10/cddl/sbin/
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Oh, i'm not holding my breath. I've read that it was promised in 8.x, then pushed to 9.x, now pushed to 10.x. Even in January 2013 I was telling people not to be too hopeful for it in 10.x either(and I'm not).

Personally, I'm not a fan of hot spares. Either I want control of my server or I don't. In a home setting, I want to be in control when stuff happens automatically. In a business setting I expect the administrators to be trained and experienced enough to choose the proper time to do a resilvering. They kill pool performance and I don't see why people would want a system to automatically throw in a resilvering task in what could be the busiest time of the day. Yes, I know its all about uptime, blah blah blah. But that's what RAIDZ3 and triple disk mirrors are for if you are that worried about it.
 

ZFS Noob

Contributor
Joined
Nov 27, 2013
Messages
129
Personally, I'm not a fan of hot spares. Either I want control of my server or I don't. In a home setting, I want to be in control when stuff happens automatically. In a business setting I expect the administrators to be trained and experienced enough to choose the proper time to do a resilvering. They kill pool performance and I don't see why people would want a system to automatically throw in a resilvering task in what could be the busiest time of the day. Yes, I know its all about uptime, blah blah blah. But that's what RAIDZ3 and triple disk mirrors are for if you are that worried about it.
I host my servers in a datacenter a day's drive away. The last time I had a drive failure, the EqualLogic sent me an e-mail, then automatically rebuilt the drive from one of two hot spares in the chassis (it's RAID10, so no real performance impact, and the rebuild was done very quickly). When I submitted my ticket to the datacenter support team I knew I had another hot spare waiting, and things weren't in a degraded state, so I told the on-site staff there was no rush, so I didn't pay a fee for the service of having them swap the failed drive for me.

I'd love for hot spares to work in FreeNAS. Of course, I'll be building arrays of identical drives using mirrored vdevs (RAID10-ish) so there's no complexity to my installation and the performance implications of a drive rebuild are significantly less than in parity raid configurations...

Instead I'll be using warm spares and performing failed drive replacement manually.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I host my servers in a datacenter a day's drive away. The last time I had a drive failure, the EqualLogic sent me an e-mail, then automatically rebuilt the drive from one of two hot spares in the chassis (it's RAID10, so no real performance impact, and the rebuild was done very quickly). When I submitted my ticket to the datacenter support team I knew I had another hot spare waiting, and things weren't in a degraded state, so I told the on-site staff there was no rush, so I didn't pay a fee for the service of having them swap the failed drive for me.

I'd love for hot spares to work in FreeNAS. Of course, I'll be building arrays of identical drives using mirrored vdevs (RAID10-ish) so there's no complexity to my installation and the performance implications of a drive rebuild are significantly less than in parity raid configurations...

But, there's no reason why a hotspare is necessary though. You could have simply installed spare disks in the machine that were powered up, then simply did a remote connection and done the disk offline and start the resilving without the hotspare function that everyone wants. To me, all that the "hotspare" really means is it "takes a spare disk and adds it to the pool automatically in place of a failed disk". I just don't always want that kind of automation at all times.
 

ZFS Noob

Contributor
Joined
Nov 27, 2013
Messages
129
To me, all that the "hotspare" really means is it "takes a spare disk and adds it to the pool automatically in place of a failed disk". I just don't always want that kind of automation.
Right, but if you don't want it you don't need to specify a disk as a hot spare, just like in a RAID controller. Most folks don't choose to use them, and that works fine.

But I see no need to extend my exposure to a the "if a second drive fails I'm screwed" scenario. I know that RAIDZ3 totally fixes this issue, but it's also much slower than a mirrored solution of the same capacity. Sometimes hot swapping is a useful tool, and it's something that's been around for decades. I'd think it should be part of the standard toolkit in FreeNAS.

That's all I'm saying. :)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
And I totally agree that the "option" should be there. I'd never use it, but that's my choice.

My problem is that the zfsd has to be programmed to be smart enough to not drop out a failed disk when there's too few errors or a temporary situation that will resolve itself in a few seconds. I think the problem with zfsd is that its very hard to make something smart enough to not drop out disks inappropriately.

Remember, there's a fine line between keeping a failed/failing disk in a vdev and removing it to replace it. I don't like the idea of trusting some software to make that decision for me. It may inappropriately kick out good disks from the pool and it may put itself in a worse position than letting an admin that can interpret the data and understand the situation would do. That's all I'm saying.

If people want to trust the zfsd to do the right thing, that's great. Good for them. I'm someone that likes to have some control over certain aspects of reliability, and I trust myself more than an algorithm that might make the wrong choice. I'd rather lose my data because of my own stupidity than lose it because zfsd decided every disk I had was failing and kept dropping disks from my pool until nothing was left. ;)
 

mintra

Cadet
Joined
Nov 7, 2013
Messages
6
I sort of agree that I like to be the one to choose to resilver, I am reading this as I am configuring my system and thinking about spares. It has 16 3tB drives and I have done seven mirrors in one volume, This is as I expect to be using iSCSI onto one volume I can see that I can not use hot spares at all, but what should I do with the two spare drives, just leave them unconfigured, and then use them if something fails?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
That's what I would do.
 
Status
Not open for further replies.
Top