Snapshots - Replication tasks - and my Long-Term backup plan

Papa

Dabbler
Joined
Jan 21, 2012
Messages
29
Preparing to initiate my long-term backup plan and wanted some input on a potential issue with the snapshot/replication that will be my active agent.

In the new part of my home we built a security room - hidden and fire protected. I added wiring for lights/power and a network drop, (it's not a room for occupants, only like a big fire safe).

My current TrueNAS server and the backup TrueNAS use a pull replication initiated by the backup server - working very nicely I might add. From spare parts, I created a third TrueNAS machine that will be in the fire room. I will replicate it the same as my backup machine, but only turn it on every five or six months for a day or two so it can mirror itself to the system, then I'll power it off - it becomes air gapped and long-term backup, the cool thing is it costs almost nothing!

My concern is as the snapshots on the Long-Term box will be outdated with the others, will the system recognize it just needs to update and pull down the latest snapshots?
I've experienced some challenges/issues while learning full replication between my two original machines and find the need to ask the question.

Thanks, OH so much for your time. :wink:

Low
 
Joined
Oct 22, 2019
Messages
3,641
My concern is as the snapshots on the Long-Term box will be outdated with the others, will the system recognize it just needs to update and pull down the latest snapshots?
If a "common" base snapshot is missing on the Long-Term box, then you will not be able to do an incremental send/recv, and it will have to run a full replication from scratch (if allowed in the configuration for the Replication Task.)
 

Papa

Dabbler
Joined
Jan 21, 2012
Messages
29
If a "common" base snapshot is missing on the Long-Term box, then you will not be able to do an incremental send/recv, and it will have to run a full replication from scratch (if allowed in the configuration for the Replication Task.)
Thanks Winnielinnie - that makes perfect sense to me - and in your "Opinion" is a full replication to 1) remove all the shapshots and/or 2) wipe (or create a second dataset for a fresh replication?
 

Papa

Dabbler
Joined
Jan 21, 2012
Messages
29
Maybe I just answered it myself
1649734383668.png
 
Joined
Oct 22, 2019
Messages
3,641
Thanks Winnielinnie - that makes perfect sense to me - and in your "Opinion" is a full replication to 1) remove all the shapshots and/or 2) wipe (or create a second dataset for a fresh replication?
A "from scratch" replication does not take advantage of the efficiency and time that incremental replications offer. (By orders of magnitude for larger sources.)

It's like "starting all over again" in creating a replica of the source to the destination, including every child and previous snapshot found on the source, while destroying snapshots and datasets not found on the destination.

It's really a waste of time, and the preferable method is to always have a common base snapshot on the destination from which the send/recv can do an incremental replication using the source's base snapshot as the "starting point".

With a pruning and "expiration" policy of your Periodic Snapshot Tasks, you risk older snapshots being removed from the source, of which might be required as a base snapshot on the destination.

There are ways around this:
  • Don't set an expiration policy on your automatic snapshots (or set a very long expiration lifespan)
  • In the Replication Task, there's an option to "Save Pending Snapshots" (though it's not clear if this works to prevent the above issue)
  • Use the "zfs hold" and "zfs release" features to manually protect snapshots until the next time you do your "Long-Term" replication [1]
  • Use the "zfs bookmark" feature to bookmark every single snapshot created [1]
[1] The last two options are not available in the GUI, and thus you must use the command-line and/or scripts. They require some advanced "know-how", especially the bookmarks feature.
 
Joined
Oct 22, 2019
Messages
3,641
Here's an example of manually using the "zfs hold" feature. (I use made-up names below.)

Let's say the last time you ran a "Long-Term" replication was on March 1, 2022. The latest snapshot on your source was aptly named:
auto-20220301.0000

Back then on that day, immediately after the replication successfully finished, you protected this snapshot on your source from being deleted (either accidentally or due to the expiration policy):
zfs hold -r basesnap sourcepool@auto-20220301.0000

Much time has passed, let's say it's now December 31, 2022. And let's also say that your snapshot expiration policy is 6 months. Well, if you didn't protect that latest "base" snapshot of auto-20220301.0000, it would have been destroyed by September 2022! :oops: But luckily it's protected with the "hold" feature. This means that the latest snapshot on the Long-Term box is auto-20220301.0000, and thankfully you still have this common base snapshot on your source as well (because you protected it!) Whew! Close call!

So now you run another incremental replication from the source to the Long-Term box. It will use auto-20220301.0000 as the common base snapshot with which to do an incremental send/recv.

After it successfully completes, you now protect the new "most recent common base snapshot" on the source, which might look something like this:
zfs hold -r basesnap sourcepool@auto-20221231.0000

At this point, it's safe to remove the protection of the former (but no longer true) "most common case snapshot":
zfs release -r basesnap sourcepool@auto-20220301.0000

This means since it's no longer protected, auto-20220301.0000 will be destroyed on the source when expired snapshots a pruned. However, the newest "base snapshot" of auto-20221231.0000 will be indefinitely protected.

Rinse and repeat.
 

Papa

Dabbler
Joined
Jan 21, 2012
Messages
29
Great explanation!

Here is an interesting thought/question.

Remember we are talking about 3 different machines - Main - Backup - and Long-Term.

I create snapshots daily on the "Main" machine - which are pulled (as opposed to pushed) to the "Backup" machine for that replication.

If I was to snapshot every 4 weeks the Backup machine and have them expire in 6 or 8 months - and the long-term machine pulls those snapshots every - say 4 to 6 months - do the snapshots pulled over from the Main machine to the Backup machine conflict with those that are created to be pulled by the Long-term machine?

Truly appreciate your in-depth response - your very kind to lend your experienced time.

Lowell
 
Joined
Oct 22, 2019
Messages
3,641
If I was to snapshot every 4 weeks the Backup machine and have them expire in 6 or 8 months - and the long-term machine pulls those snapshots every - say 4 to 6 months - do the snapshots pulled over from the Main machine to the Backup machine conflict with those that are created to be pulled by the Long-term machine?
Yes, they will 'conflict". Replicating snapshots only work in a single direction.

Your replication task will either abort with a message about extraneous snapshots on the destination, or it will "succeed" and forcefully destroy the extraneous snapshots on the destination. [1]

One way around this is to only create snapshots on the Main box, while the Backup and Long-Term boxes only receive replications of snapshots.

In a sense the Main box will populate the Backup and Long-Term boxes. You can even use the Backup box as a source that replicates to the Long-Term box, as long as the snapshots which exist on the Backup box were not created independently (i.e, they were all pulled in from the Main box.)

EDIT: For the record, I would not enable the option of "allow replication from scratch". It's a nuclear bomb solution to a problem that can (and should) be resolved with nuance. It's more important to make sure incremental replications work, rather than to have TrueNAS "just do it all over from scratch all over again!"

[1] For example, let's say you have a snapshot task on the Main box that every day creates snapshots named auto-daily-XXXXXXXX.XXXX, but on your Backup box you have a snapshot task that infrequently creates longterm snapshots named auto-long-XXXXXXXX.XXXX. The next time you try to replicate your daily snapshots from Main to Backup, it's likely to complain about the "auto-long" snapshots that exist on the destination (which do not exist on the source). Depending how you configure the replication task and/or run the send/recv command-line, it will either abort the task or destroy the extraneous "auto-long" snapshots on the destination.
 
Last edited:

Papa

Dabbler
Joined
Jan 21, 2012
Messages
29
I understand! And now with knowledge and that understanding have started moving forward confident that what is being put in place will work.
Thank you so much for your time and insight. I will let you know in a few months how things turn out, but I am certain it will be a positive result.

Lowell
 
Top