Always maintain a common snapshot for replication?

rockybulwinkle · Nov 3, 2021

We have two servers, one we will call "NAS_A" running Core U5, and the other "NAS_B" running Core U5.1.

NAS_A pushes automatic hourly snapshots to NAS_B. Right now we are retaining the snapshots for 24 hours on both NAS_A and NAS_B.

The problem I see is that if NAS_B or NAS_A were to be down for more than 24 hours, I think they would lose a common snapshot and the whole dataset would have to be transferred again. Even if I were to have less frequent snapshots with a longer retention, the point still stands that if a server was down for an extended time, the whole dataset would still need to be transferred again. During that time we would be at a much higher risk of losing the pool because we don't (yet) have a proper offsite backup and our dataset takes about a week to replicate from scratch.

If that is the case, is there a way for the replication to prevent deletion of these automatic hourly snapshots when there is only one remaining snapshot in common?

I think this would be implemented by first having the source and destination servers put a hold on the most recent snapshot during replication. Then it would release the hold from the snapshot held during the last replication.

If this isn't easy to do in the GUI, I guess a decent alternative would be to do weekly or monthly snapshots+replications as well so that it'd be extremely unlikely that we wouldn't intervene before all common snapshots are lost via the snapshot retention policies. It wouldn't be 100% guaranteed, but should be close enough...

Ericloewe · Nov 3, 2021

You could do something fancy that preserves the snapshots if they haven't yet been replicated. It's possible that this is supported in the latest versions of TrueNAS or possibly zrepl.

Alternatively, you can keep bookmarks after the snapshot is deleted. I'm not 100% on what the behavior will be, but I think you'll lose any missing incremental snapshots but still be able to replicate the newer ones that still exist.

winnielinnie · Nov 3, 2021

rockybulwinkle said:
I think this would be implemented by first having the source and destination servers put a hold on the most recent snapshot during replication. Then it would release the hold from the snapshot held during the last replication.

This is how I do it, in a script. However, it's rudimentary and I'm sending incremental snapshots to multiple (stored offsite) USB pools. In my case, I don't have automatic pruning nor any Replication Task tethered to a Periodic Snapshot Task. To do this solely with the GUI might be possible with the Save Pending Snapshots option. I don't know if it will address the issue of older snapshots being automatically pruned on NAS_B.

---

Ericloewe said:
Alternatively, you can keep bookmarks after the snapshot is deleted. I'm not 100% on what the behavior will be, but I think you'll lose any missing incremental snapshots but still be able to replicate the newer ones that still exist.

"Bookmarks" would only serve to save space on the source (i.e, NAS_A), but would not address the issue of missing the complementary snapshot on the target (i.e, NAS_B), if NAS_B indeed does prune snapshots automatically.

The bookmark on NAS_A labeled NAS_A/media#backup_202111030600 requires there be a complementary snapshot on NAS_B labeled NAS_B/media@backup_202111030600 if you wish to use the bookmark as the base for an incremental replication.

Ericloewe · Nov 3, 2021

winnielinnie said:
"Bookmarks" would only serve to save space on the source (i.e, NAS_A), but would not address the issue of missing the complementary snapshot on the target (i.e, NAS_B), if NAS_B indeed does prune snapshots automatically.

Yeah, this is a pretty unusual scenario, because 24 hours retention is crazy short for most people (at least as the longest-lived set of snapshots).

rockybulwinkle · Nov 3, 2021

Ericloewe said:
Yeah, this is a pretty unusual scenario, because 24 hours retention is crazy short for most people (at least as the longest-lived set of snapshots).

Yeah, I agree that 24 hours is very short. I do have longer lived snapshots and have them replicating, but it's still possible for the issue to arise depending on how (un)attentive the system administrator as long as these snapshots have a limited retention.

The main reason for the 24 hour retention is because we have nearly 200 datasets in this pool for different projects. However most are not being actively used but being snapshotted anyway. I've read that the total number of snapshots should be kept under 10k for performance though I'm not sure how true this is when most snapshots are empty.

I know I could disable empty snapshots and increase retention but that doesn't really solve the issue in the long term as it almost guarantees that infrequently modified datasets will lose their complementary dataset.

winnielinnie said:
This is how I do it, in a script. However, it's rudimentary and I'm sending incremental snapshots to multiple (stored offsite) USB pools. In my case, I don't have automatic pruning nor any Replication Task tethered to a Periodic Snapshot Task. To do this solely with the GUI might be possible with the Save Pending Snapshots option. I don't know if it will address the issue of older snapshots being automatically pruned on NAS_B.

This is how I'd like to do it but I don't see a way in the GUI. The "Save Pending Snapshots" doesn't really help as we're concerned about deletion of any complementary snapshots *before* the pending snapshots, not the pending snapshots themselves.

I think the solution may be zrepl with the source/pull configuration with last_n pruning policy: https://zrepl.github.io/stable/configuration/prune.html#source-side-snapshot-pruning. It does not prune the source side until the pulling side has replicated snapshots on the source side which should prevent the issue in my first post.

There is an unofficial plugin for zrepl on TrueNAS. I've looked at it before but gave up when I saw TrueNAS had builtin replication. I'm not sure how well zrepl integrates with TrueNAS yet...

Important Announcement for the TrueNAS Community.

Always maintain a common snapshot for replication?

rockybulwinkle

Dabbler

Ericloewe

Server Wrangler

winnielinnie

MVP

Ericloewe

Server Wrangler

rockybulwinkle

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Always maintain a common snapshot for replication?

rockybulwinkle

Dabbler

Ericloewe

Server Wrangler

winnielinnie

MVP

Ericloewe

Server Wrangler

rockybulwinkle

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Always maintain a common snapshot for replication?"

Similar threads