Initial Replication Task from On-site Data

lostinak

Cadet
Joined
Oct 21, 2015
Messages
6
Hello -

I have two TrueNAS systems connected with a SSH connection within a Site-to-Site IPSEC tunnel and I would like to backup System A (source) to System B (target) through replication tasks. The two systems are > 2500 miles apart and it is not feasible to bring System A to System B. I have about 90% of the data already on System B.

I have successfully started the replication task on System A confirming the setup works and the two systems can talk to each other. The issue is it requires a "from scratch" replication as "no incremental base was found on dataset [target dataset]" and the "Synchronize Destination Snapshots with Source" option was required to be enabled (this destroys all snapshots on the target dataset).

As this is several terabytes worth of data and the bandwidth of the IPSEC tunnel is severely limited, this could take months. As I have most of the data already on the target system, just not as snapshots from the source system, is there a way to take snapshots on System B and allow the replication task to see a base? The incremental data to be sent through the replication task will be small enough that after the initial backup, the limited bandwidth will not be an issue.

I created a local replication task on and replicated the data on the System B to a new dataset, targeted the new dataset with the remote replication task from System A to the same issue of "no incremental base was found..." error when attempting the remote replication.

Thanks in advance - I hope this isn't something simple I am missing.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
As this is several terabytes worth of data and the bandwidth of the IPSEC tunnel is severely limited, this could take months.

Yes.

As I have most of the data already on the target system, just not as snapshots from the source system, is there a way to take snapshots on System B and allow the replication task to see a base?

No.

The problem is that what ZFS is tracking in a snapshot are changes to blocks of data. When a block of data changes, ZFS "knows" what the delta between the old block and the new block is, and as part of the replication process, it sends this along to the distant NAS unit. So what is being sent is a stream of changed blocks.

No matter how hard you try to "duplicate" the content on machine A and machine B, there will be differences in both the data inside the blocks (think: metadata) and also where on disk those blocks got stored, so all the block pointers for your "B" machine are different from the "A" machine. This makes it unfeasible to use replication against some sort of contrived base snapshot on the "B" machine.

What you probably want is an rsync job instead. Rsync is incredibly efficient at identifying differences within sets of files and propagating the FILES, rather than worrying about blocks. If you need snapshots on machine "B", make them on machine "B".
 

lostinak

Cadet
Joined
Oct 21, 2015
Messages
6
Darn
No matter how hard you try to "duplicate" the content on machine A and machine B, there will be differences in both the data inside the blocks (think: metadata) and also where on disk those blocks got stored, so all the block pointers for your "B" machine are different from the "A" machine.
I was kind of afraid that might have been the answer.

Thank you for the response - appreciate the help - I'll take a look at Rsync tasks.
 
Joined
Oct 22, 2019
Messages
3,641

Actually "Yes".

OHHHHHHHHHHHHHHHHhhhHhhHhhhhhhhh---hoohohohoh-hohohoh--hohoho-hohoohoh!!!!

WINNIE LINNIE BROUGHT THE BIG GUNS OUT! :cool: :eek: :oops:


...

...

...

...

Okay, so more accurately, "yes" for ZFS, but "no" when using TrueNAS as it is intended through its GUI.

This is, once again, a reason why TrueNAS needs to incorporate the use of the "ZFS Bookmarks" feature. We should be able to configure a "bookmark" to automatically be created with snapshots, if the user wants to enable such an option for their snapshot tasks and manual snapshots. And perhaps a way to manage "bookmarks" in bulk, as one would do with snapshots.

Long story short: If you had "bookmarks" for every snapshot created on your Source (A), then even if you later destroy these snapshots, the bookmarks remain. Bookmarks take up zero space.

But how does this help in your situation, so that you can do an incremental send from Source (A) to Backup (B), even after losing your "base" snapshots from Source (A)?

If you specify a bookmark as the "base snapshot" for the incremental replication, then the snapshot found on the Backup (B) side that correlates to the bookmark on Source (A) will be used as the base. (I've tested this out myself and it works as advertised.)

While this is "slower" than doing a pure "snapshot -> snapshot" incremental send (since it requires the Backup (B) server to communicate which records are in play), it still should be much faster than resorting to a full replication all over from scratch, which in your case can take months. :wink:

To quote my non-technical self from another thread:
winnielinnie said:
The bookmark (on the source's side) essentially instructs the destination side: "Hey, I still have a bookmark of an expired/deleted snapshot named #auto-2022-01-31. I see you have on your end the actual snapshot named @auto-2022-01-31. You must use that snapshot (on your side) as the 'base' snapshot for this incremental send. This will take longer than a normal incremental transfer, but it's for an emergency situation, and I would prefer not to have to start all over with a full replication from scratch. I would love to use my own @auto-2022-01-31 as the base for this incremental transfer, but it's been destroyed. I only have this bookmark."

I already submitted a feature request for TrueNAS to make use of ZFS's "hold" feature in the GUI, which I'm not even sure it'll ever be implemented. So you can understand why I'm not motivated to submit a request to exploit ZFS's "bookmarks" feature, which I argue has a VERY IMPORTANT use-case.
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
WINNIE LINNIE BROUGHT THE BIG GUNS OUT! :cool: :eek: :oops:

As long as you're offering to provide the forums support for it, ... yay.

Strikes me as a vaguely useful feature looking for a problem to solve, sort of like the occasional ZFS copies= debates that pop up. Just don't delete snapshots unnecessarily.

Circling back around to the OP, I will note that I propagate a backup copy of a 43TB dataset from the midwest to the west coast (about 2500 miles), and one of the really handy things about NOT doing replication is that you can potentially put a bunch of files on a large hard drive and move it that way.

Sneakernet, also summarized as "Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway. –Andrew Tanenbaum, 1981"
 

lostinak

Cadet
Joined
Oct 21, 2015
Messages
6
To quote my non-technical self from another thread:
Much appreciated winnielinnie but there are/were no snapshots from Source(A) on Target(B) - at least if I am understanding your solution correctly - or can I bookmark any snapshot on the Source(A) and send it to any snapshot (assuming both snapshots are of similar data) on Target(B) - at any point in time. I tried doing this with some test data this afternoon and kept getting "dosen't match incremental source" errors so it seems like there needed to be at least one "shared" snapshot.

I had been doing what jgreco suggested and bringing large amounts of data on big hard drives but finally decided to get the site-to-site setup and the two systems actually talking. I am not opposed to the Rsync solution - just like the replication for simplicity sake.

Appreciate the responses! Thanks again.
 
Joined
Oct 22, 2019
Messages
3,641
at least if I am understanding your solution correctly
It would have been viable had you been bookmarking every snapshot on Source (A).


or can I bookmark any snapshot on the Source(A) and send it to any snapshot (assuming both snapshots are of similar data) on Target(B)
If a bookmark (of a previously destroyed snapshot) on Source (A) refers to the the same snapshot on Backup (B), then this bookmark on Source (A) can be used as a the "base snapshot" for an incremental replication. (Bookmarks take zero space, so they can be left on the source pool indefinitely.)

Example 1:
Incremental replication, snapshot as the base from sourcepool. This assumes backuppool has the snapshot mydata@auto-2022-10-01
zfs send -i sourcepool/mydata@auto-2022-10-01 sourcepool/mydata@auto-2022-12-01 | zfs recv backuppool/mydata

Example 2:
Incremental replication, bookmark as the base from sourcepool. This assumes backuppool has the snapshot mydata@auto-2022-10-01
zfs send -i sourcepool/mydata#auto-2022-10-01 sourcepool/mydata@auto-2022-12-01 | zfs recv backuppool/mydata

In the first case, it works as normal. Only the differences between the two snapshots are sent to the destination.

In the second (emergency) case, no such base snapshot exists on the sourcepool. However, since the bookmark of @auto-2022-10-01 still exists as #auto-2022-10-01, it can be used as the "base", in which backuppool will communicate to the source "I have the snapshot of your bookmark here. I will tell you what records exists on this snapshot, so you'll know how to send me an incremental replication. It'll be slower, but at least we won't have to transfer all the sourcepool's data from scratch!"



Even as snapshots are destroyed on Source (A), their bookmarks remain. A bookmark can be used in place of a snapshot as the base snapshot for such an incremental send. (Obviously more for emergency uses, since ideally you want Source (A) and Backup (B) to both share a common snapshot as a base.)

TrueNAS, unfortunately, does not implement this automatically.

Just letting you know it could be feasible to create an automatic script to bookmark all (or the most recent 10, 20, 30, 50, etc) snapshots on Source (A) on an automatic schedule. Then you can have a contingency plan if this were to ever happen again.

One caveat to this is you would have to change or disable the expiration date for Backup (B)'s snapshots, so that you'll have a window of grace period just in case.

Or like @jgreco said, make sure no snapshots on Source (A) are removed that would affect your following incremental replications to Backup (B).
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
I appreciate the reply @winnielinnie - too bad it didn't work out this time.

Good information for the future for sure.
I feel your pain.

You might consider looking into creating a simple Cron Task that automatically makes bookmarks of existing snapshots on your Source pool, to have some sort of "contingency" in place.

However, this also assumes your Destination pool will not have its associated snapshots expired/destroyed.
 
Joined
Jun 15, 2022
Messages
674
Top