One pool fails = system dies and corrupts all pools?

Status
Not open for further replies.

pjc

Contributor
Joined
Aug 26, 2014
Messages
187
This comment from @jpaetzel in bug report <https://bugs.freenas.org/issues/6788#note-2> made my hair stand on end:
By default we set the failmode to continue. If the system is under any write load when the pool goes unavailable it will likely start a death spiral towards a deadlock.
To reinforce that, I found a similar sentiment echoed in the forums:
...the default is continue. But it doesn't do quite what it sounds like from your post. Asking around a year or so ago the only answer I've really gotten back is "it doesn't do what we had hoped and it never will without restructuring ZFS". I could find almost no useful documentation on this property nor could I find an example where it actually worked. So I have to think that it is useless. Do you have any detailed documentation on the property or a link to someone that used it successfully?
The available options for failmode are "panic", "continue", and "wait". As I understand it, "panic" reboots the machine when the pool fails, "continue" lets reads continue but returns EIO for any writes, and "wait" blocks all I/O access.

It seems like "panic" could result in the corruption of any other pools that aren't cleanly unmounted. Likewise, if a deadlock is the likely result of "continue," it seems like you're at risk of corrupting your other pools. I'm not sure about "wait" -- does it block all I/O, or just for that pool?

Is there really no way to force unmount a failed pool so that the system can keep working with the surviving pools safely?
 
J

jkh

Guest
It seems like "panic" could result in the corruption of any other pools that aren't cleanly unmounted. Likewise, if a deadlock is the likely result of "continue," it seems like you're at risk of corrupting your other pools.
Where do you get the idea that a deadlock in pool A would corrupt pools B or C? Seriously, I don't see it implied by the discussion, so if there's some specific code you'd like to cite in FreeNAS's implementation of ZFS, it would be useful in refuting what otherwise just looks like hysteria. :)
 

pjc

Contributor
Joined
Aug 26, 2014
Messages
187
It could well be hysteria, mostly from ignorance, and quite possibly misunderstanding of @jpaetzel's comment.

My impression (perhaps mistaken) was that it would deadlock the entire system. And my (perhaps misinformed) impression was that a system panic/crash is bad news for ZFS. I know ZFS does some journaling, but I wasn't sure how happy it would be in the face of a kernel deadlock/panic.

But if you're telling me that a catastrophic pool failure leaves the rest of the server and the other pools happy, I'm content. :)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
panics/crashes *can* be bad news from ZFS. ZFS is designed to be able to survive some pretty nasty situations, but the actual cause of the panic/crash is also a variable. For example, if the box crashes because a SAS/SATA controller malfunctions, there's nothing to stop the SAS/SATA controller from ending up in some loop where it starts writing tons of random data to the drives that are attached to it. From your perspective the box "crashed and then ZFS was dead" but the reality is much deeper.

A pool failure would only mean that one pool is offline. Of course, if its also the pool with the .system dataset then you may have lots of other problems because the system files aren't accessible anymore. But that's not a ZFS problem.
 

pjc

Contributor
Joined
Aug 26, 2014
Messages
187
So it sounds like kernel panics due to pool failure are unlikely to cause major problems for other pools, which is a relief.

And from what you're saying, it might also be a good idea to have .system on a separate pool for which failmode=panic...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
And from what you're saying, it might also be a good idea to have .system on a separate pool for which failmode=panic...

I'm not 100% certain, but failmode is initiated when a disk is lost. So if you have a RAIDZ3 and one disk fails you obviously wouldn't want a panic. The pool would be completely capable of functioning, but if it's set to panic then the box crashes. That doesn't sound like a "good thing". The reality is that if you are being a responsible admin you shouldn't need to change the setting because you are smart enough to make sure you aren't in a condition that is stupid. ;)
 

pjc

Contributor
Joined
Aug 26, 2014
Messages
187
I'm not 100% certain, but failmode is initiated when a disk is lost.
I don't think so. The documentation says "catastrophic pool failure," which sounds to me like corrupted/lost metadata, not just reduced redundancy.

And since you said that "if its also the pool with the .system dataset then you may have lots of other problems because the system files aren't accessible anymore," I was thinking that those "lots of other problems" could be worse than a panicked system, which at least puts you in a known state.
 

sfcredfox

Patron
Joined
Aug 26, 2014
Messages
340
Of course, if its also the pool with the .system dataset then you may have lots of other problems because the system files aren't accessible anymore. But that's not a ZFS problem.
Entirely possible that I missed this in the manual, but is there guidance on where/how to configure that system dataset pool? Since it's known that loosing the pool where the system dataset lives is bad, is there something that can be done to mitigate that? Is the answer simply: 'Put it on the most reliable/redundant pool you have'?

It looks like the OP was answered for the most part (as I understand) that you don't loose other pools, but you can have a serious issue/crash if you loose the system dataset pool, what is the recovery process for such an event?
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Entirely possible that I missed this in the manual, but is there guidance on where/how to configure that system dataset pool? Since it's known that loosing the pool where the system dataset lives is bad, is there something that can be done to mitigate that? Is the answer simply: 'Put it on the most reliable/redundant pool you have'? [...]
I have .system and jails in a separate pool that resides on separate disks on a separate controller. I was thinking along the lines:
  • If I loose the pool with .system, my data is still intact.
  • If I have problems with my data pool, .system is separate so I can troubleshoot as much as I want and still see all the logs and performance history.
 
Status
Not open for further replies.
Top