This comment from @jpaetzel in bug report <https://bugs.freenas.org/issues/6788#note-2> made my hair stand on end:
It seems like "panic" could result in the corruption of any other pools that aren't cleanly unmounted. Likewise, if a deadlock is the likely result of "continue," it seems like you're at risk of corrupting your other pools. I'm not sure about "wait" -- does it block all I/O, or just for that pool?
Is there really no way to force unmount a failed pool so that the system can keep working with the surviving pools safely?
To reinforce that, I found a similar sentiment echoed in the forums:By default we set the failmode to continue. If the system is under any write load when the pool goes unavailable it will likely start a death spiral towards a deadlock.
The available options for failmode are "panic", "continue", and "wait". As I understand it, "panic" reboots the machine when the pool fails, "continue" lets reads continue but returns EIO for any writes, and "wait" blocks all I/O access....the default is continue. But it doesn't do quite what it sounds like from your post. Asking around a year or so ago the only answer I've really gotten back is "it doesn't do what we had hoped and it never will without restructuring ZFS". I could find almost no useful documentation on this property nor could I find an example where it actually worked. So I have to think that it is useless. Do you have any detailed documentation on the property or a link to someone that used it successfully?
It seems like "panic" could result in the corruption of any other pools that aren't cleanly unmounted. Likewise, if a deadlock is the likely result of "continue," it seems like you're at risk of corrupting your other pools. I'm not sure about "wait" -- does it block all I/O, or just for that pool?
Is there really no way to force unmount a failed pool so that the system can keep working with the surviving pools safely?