Local replication concerns

Joined
Jan 4, 2014
Messages
1,644
This is an extract from the 11.3 release notes...

screenshot.485.png


Yesterday, I did some stress testing simultaneously replicating from two servers to a target server. You can get a sense of what's happening from the image in this post.

In the end, I lost the pool of the target server due to a kernel panic. It's the first time I've seen one of these in the six years I've been working with FreeNAS, so it must be a fairly rare event. Mind you, the conditions were quite unusual for the test. Fortunately, as it was a backup server, no real harm was done. I was at a slightly elevated risk of losing one or more primary datasets while rebuilding the backup server, but all is good now.

Searching the forum for 'kernel panic', where it's occurred, what seems to be a common reason for the kernel panic is a corrupted snapshot. As rare as it is, with this kernel panic issue still in the wild, it's a little worrying that it might rear its ugly head during local replication, which was only introduced in 11.3. It's one thing losing a backup server designed as a replication target; it's quite another thing losing the pool on a primary server due to local replication.

I'm curious to know what community members think about this.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Joined
Jan 4, 2014
Messages
1,644
I had been doing local replication long before that
I guess what I'm getting at is that the likelihood of a kernel panic is rare, and a trigger seems to be a corrupted snapshot. While the risk of occurrence is similar whether local or remote, the consequences are far more devastating for a local replication (on a primary server) than it is for a remote replication (on a secondary server).
 
Last edited:

styno

Patron
Joined
Apr 11, 2016
Messages
466
I had that issue for over a year until it was fixed (rather silently) in one of the latest updates (commit here)
The replication target would panic and a scrub would send the box in a panic loop.
Was your target box running on an older version?
 
Joined
Jan 4, 2014
Messages
1,644
Was your target box running on an older version?
No, all boxes involved are at the latest release 11.3-U4.1. The test conditions were extreme though.
 
Joined
Jan 4, 2014
Messages
1,644
I wonder if what I experienced was in any way related to the serious issue described in this thread Replicated pool: zeroed files - corruption undetected

EDIT: Seems the issues are not connected as that issue relates to pools with 1M recordsize. Mine use the default recordsize of 128k.
 
Last edited:

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
That is the kind of news that keeps me on 11.2... I rely on my 3 FreeNAS for enterprise class quality of service. I put everything I have in them and do not manage any data outside of them. They are the base on which I built my private cloud and with secondary server 400Km away, I do not wish to need a re-sync between that one and the main one.

Thanks for the warnings. I will stay on 11.2 for another long stretch I guess. No problem : it is doing the job very well so far and has been for the last few years.
 
Top