ZFS send/receive (replication task) advice needed

allegiance · Oct 20, 2023

I am hoping for some advice with replication tasks in TrueNAS Core 13.0-U3.1 (all servers listed in my scenario are running the same version of TrueNAS Core 13). I am attempting to replicate a large-ish (30Tb) dataset from a server in location A to a server in location B using an interim "portable" pool and 3rd server - I really cannot move either existing server to perform this task if possible. Here is what I attempted and the problem I am now having:

Live data is stored at location "A" on server "a" on pool "Alpha" in dataset "live". I am currently running hourly snapshots on dataset "live", keeping them for 2 weeks - this began about 3 weeks ago, so let's say dataset "live" now has snapshots 101-300 on it at this time. (I suspect this is the cause of my error/problem below)

My interim step (to avoid sending 30Tb of data over the internet) was to build a temporary/portable server "c" with pool "Gamma" and take it to location "A". There, I setup a nightly replication task to replicate dataset "live" from server "a" pool "Alpha" to server "c" pool "Gamma" dataset "archive". This replication was set to keep all snapshots and was setup when the initial snapshots task was setup on server "a". So, let's say dataset "archive" now has snapshots 1-300 on it at this time - all the snapshots ever created on dataset "live"

I have now driven server "c" with pool "Gamma" to location "B". I don't want two servers running at location "B" if it can be avoided, so I imported pool "Gamma" into server "b". In doing so, I can now see dataset "archive" on server "b" and all it's snapshots (1-300)

So, I then go to create a new replication task on server "b" to continue the nightly replication from dataset "live" on pool "Alpha" on server "a" at location "A" (now over VPN/internet) and replicate it to local/server "b" on pool "Gamma" dataset "archive". I used the GUI for all this, and selected Recursive for the source location. But, when I go to create/save the replication task, I get the error:

Destination Snapshots Are Not Related to Replicated Snapshots

Destination dataset does not contain any snapshots that can be used as a basis for the incremental changes in the snapshots being sent. The snapshots in the destination dataset will be deleted and the replication will begin with a complete initial copy.

Is this because I have more snapshots on my pre-existing destination dataset than on my source dataset? Could it be something else I am missing, or is my basic premise in error? Is there any way around this vs starting over again? I really want to avoid sending all the initial data over the internet, but also can't transport either the source or destination servers.

In case my description was not very clear, here is an attempt at a diagram of what I attempted:

Location A > Server a > pool Alpha > dataset live > last two weeks of hourly snapshots on a 3-week old dataset
<== performing nightly replication task to ==>
Location A > Server c > pool Gamma > dataset archive > all snapshots from inception of dataset live, keep forever
<== drove Server c and pool Gamma to Location B ==>
Location B > Server b > newly imported pool Gamma > dataset archive > all snapshots from inception of dataset live
<== attempt to setup new replication task via GUI on server b ==>
Source: Location A > Server a > pool Alpha > dataset live > last two weeks of hourly snapshots on a now 3-1/2week old dataset
Destination: Location B > Server b > pool Gamma > dataset archive > first three weeks of hourly snapshots, now several days behind dataset live snapshots

winnielinnie · Oct 20, 2023

This is hard to follow, and as someone who suffers dyslexia (I don't, but screw it, that's my excuse), it really can get confusing.

allegiance said:
<== performing nightly replication task to ==>
Location A > Server c > pool Gamma > dataset archive > all snapshots from inception of dataset live, keep forever

How can it have all snapshots? I thought the first week's worth of snapshots on live had already expired and were pruned?

allegiance said:
<== attempt to setup new replication task via GUI on server b ==>
Source: Location A > Server a > pool Alpha > dataset live > last two weeks of hourly snapshots on a now 3-1/2week old dataset
Destination: Location B > Server b > pool Gamma > dataset archive > first three weeks of hourly snapshots, now several days behind dataset live snapshots

Why? I thought archive already was already a complete replication from live? Was it the delay from the travel?

Regardless, incremental replications are impossible without a shared base snapshot. Perhaps the GUI's options won't suit your needs for this "one time thing".

You'll need to list all snapshots on both ends, and verify you do, in fact, have common base snapshots between live and archive.

allegiance · Oct 20, 2023

winnielinnie said:
This is hard to follow, and as someone who suffers dyslexia (I don't, but screw it, that's my excuse), it really can get confusing.

How can it have all snapshots? I thought the first week's worth of snapshots on live had already expired and were pruned?

Why? I thought archive already was already a complete replication from live? Was it the delay from the travel?

Regardless, incremental replications are impossible without a shared base snapshot. Perhaps the GUI's options won't suit your needs for this "one time thing".

You'll need to list all snapshots on both ends, and verify you do, in fact, have common base snapshots between live and archive.

Yes, sorry it is confusing. The archive dataset has all the snapshots because it was in place at the original location since the creation of the live dataset and doing nightly replication tasks. And yes, the delay was due to travel. I think you answered my question, though - going on the assumtion that a nightly replication task can suffer a loss of connection and then backfill missing recent snapshots, I was hoping that this ability would transfer with the archive dataset to the other server and it could just pick up where it left off and copy the missing recent snapshots ignoring the fact that the live dataset does not have older snapshots like the archive one does. So, I am going to try, at least for now, putting the archive dataset back into the server "c" that the local nightly replication tasks were running on and see if that can get caught up again on the missing recent snapshots, but over the internet this time. If that at least works, then that tells me may be able to try this again with a portable pool/dataset, but I have to make sure both the source and destination datasets that I am replicating has all matching snapshots before moving the destination dataset to another server

allegiance · Oct 30, 2023

winnielinnie said:
This is hard to follow, and as someone who suffers dyslexia (I don't, but screw it, that's my excuse), it really can get confusing.

How can it have all snapshots? I thought the first week's worth of snapshots on live had already expired and were pruned?

Why? I thought archive already was already a complete replication from live? Was it the delay from the travel?

Regardless, incremental replications are impossible without a shared base snapshot. Perhaps the GUI's options won't suit your needs for this "one time thing".

You'll need to list all snapshots on both ends, and verify you do, in fact, have common base snapshots between live and archive.

Thank you. I guess the crux of my question is - can I move a remote replication dataset from one server to another at some future point in time? So, say I've been doing remote replication tasks and the local/live dataset only keeps last two weeks but the remote one keeps all. 6 months down the road, the server that holds the remote replication dataset dies or is dying. Can I move that dataset (and/or it's pool) to another server and keep going, or do I need to start all over again with an initial replication task to the new destinate server?

Patrick M. Hausen · Oct 30, 2023

allegiance said:
Can I move a remote replication dataset from one server to another at some future point in time?

Sure. If you have the replicated target data set still available.

allegiance said:
So, say I've been doing remote replication tasks and the local/live dataset only keeps last two weeks but the remote one keeps all. 6 months down the road, the server that holds the remote replication dataset dies or is dying. Can I move that dataset (and/or it's pool) to another server and keep going, or do I need to start all over again with an initial replication task to the new destinate server?

If the server is dying you can move the pool, i.e. the disk drives to a new server, import the pool, and continue your replication where you left off. Assuming getting the new server and migrating everything does not take longer than the last common snapshot on the source server takes to expire.

If "server is dying" means the pool data is lost, you will have to start over.

allegiance · Oct 31, 2023

Patrick M. Hausen said:
Sure. If you have the replicated target data set still available.

If the server is dying you can move the pool, i.e. the disk drives to a new server, import the pool, and continue your replication where you left off. Assuming getting the new server and migrating everything does not take longer than the last common snapshot on the source server takes to expire.

If "server is dying" means the pool data is lost, you will have to start over.

I really appreciate your feedback. I'm going to ask a follow up with more details.
I attempted to do this, not because the "original" server was dying, but because it was an older server I used to accompany the pool of drives to the remote site to do the inital replication task. I then brought that pool of disks back to the "home" site, and imported the pool to my main production TrueNAS server - all 3 servers running the same version of TrueNAS. The pool import worked fine. I could see the destination dataset and it's snapshots, but when I attempted to setup a new replication task on the production server to go from the remote dataset to the destination dataset on the newly imported pool, it said it could not do it without starting over because the source dataset's snapshots did not match the destination dataset's snapshots. Which is true, because the source dataset only keeps 2 weeks of snapshots, but the destination dataset was set to keep all of them, and it took me more than two weeks to retrieve that pool with the destination dataset on it. So, does that mean that I cannot move a destination dataset to a new server, and then setup a new replication task to continue the replications unless both source and destination datasets have 100% matching snapshots?

Patrick M. Hausen · Oct 31, 2023

allegiance said:
Which is true, because the source dataset only keeps 2 weeks of snapshots, but the destination dataset was set to keep all of them, and it took me more than two weeks to retrieve that pool with the destination dataset on it. So, does that mean that I cannot move a destination dataset to a new server, and then setup a new replication task to continue the replications unless both source and destination datasets have 100% matching snapshots?

They do not need 100% matching snapshots - that would render the destination retention period setting pretty meaningless - but they need at least one snapshot in common. If your source retention is two weeks and you take more than two weeks to re-arrange things, that common ground is lost.

allegiance · Nov 5, 2023

Patrick M. Hausen said:
They do not need 100% matching snapshots - that would render the destination retention period setting pretty meaningless - but they need at least one snapshot in common. If your source retention is two weeks and you take more than two weeks to re-arrange things, that common ground is lost.

OK, I am going to try this again. I may have just missed something the first time I tried it. Originally - my source retention is/was two weeks, I had left the destination dataset (with unlimited retention) at the remote site doing daily replications for about 3 weeks, then took about 24-36 hours to move the destination set to it's permanent separate location and tried to import it to my "main" server then. So, they should have had some common snapshots - but maybe I missed something or overlooked something. Thank you for your insight.

allegiance · Nov 6, 2023

allegiance said:
OK, I am going to try this again. I may have just missed something the first time I tried it. Originally - my source retention is/was two weeks, I had left the destination dataset (with unlimited retention) at the remote site doing daily replications for about 3 weeks, then took about 24-36 hours to move the destination set to it's permanent separate location and tried to import it to my "main" server then. So, they should have had some common snapshots - but maybe I missed something or overlooked something. Thank you for your insight.

I took the destination server and pool do the location of the source dataset and started a new replication task Friday evening. Source dataset has been running hourly snapshot tasks for about 2 months now, set to keep last two weeks. Created a new PULL replication task on the destination server, setting it to keep all snapshots. First replication task ran without errors. So, I thought before I drove the destination server/pool/dataset back to my location, I would try a test - I created a new PULL replication task using the existing source dataset and existing dataset. I thought this test would replicate when I attach the destination pool/dataset to my production server at my location and then would need to create a new replication task to continue on with the on-site replication I started with the portable server/pool. But, I get the same error as I did when I tried the first time attempting to continue the nightly replication tasks after moving the destination dataset from the portable server to the production server:
"Destination dataset does not contain any snapshots that can be used as a basis for the incremental changes in the snapshots being sent. The snapshots in the destination dataset will be deleted and the replication will begin with a complete initial copy."

So, I'm either still missing some important step, and/or my test scenario of creating a new replication task using the existing source and destination datasets is flawed.

I've attached screenshots of my source server snapshots and destination server snapshots - ones that I feel match by date/time anyway. It looks like the size of the snapshots that are from the same date/time do not match though. I'm at a loss.

Patrick M. Hausen · Nov 6, 2023

My - positive and reliable - experience with replication tasks and different retention periods has been exclusively with PUSH. The source snapshot as well as the destination retention are controlled from the source only. Can you try that?

allegiance · Nov 6, 2023

Patrick M. Hausen said:
My - positive and reliable - experience with replication tasks and different retention periods has been exclusively with PUSH. The source snapshot as well as the destination retention are controlled from the source only. Can you try that?

I was wondering that too. I will give that a try. Thanks.

allegiance · Nov 7, 2023

allegiance said:
I was wondering that too. I will give that a try. Thanks.

So, I setup the replication task as a PUSH task on the source server, pointing to the original dataset for the source and the existing destination dataset for the destination. It gave me the same error, "Destination dataset does not contain any snapshots that can be used as a basis for the incremental changes in the snapshots being sent. The snapshots in the destination dataset will be deleted and the replication will begin with a complete initial copy.", so I hit "cancel". It still created the replication task, which I don't think happened in the past when I hit cancel. Since the source and destination were still local with each other, I figured it would be quicker to just run that task then and let it start over vs deleting the destination dataset and setting up a new task. To my pleasant surprise, it did not delete the destination dataset, but simply added the new snapshots to it.

I will be driving the destination dataset back to my location later today, where I will run another replication task or two as-is, then I will attempt to export the pool containing the destination dataset from my portable server, and import it into my production server, then edit the PUSH replication task to reflect the new destination server and try again.

Important Announcement for the TrueNAS Community.

ZFS send/receive (replication task) advice needed

allegiance

Explorer

winnielinnie

MVP

allegiance

Explorer

allegiance

Explorer

Patrick M. Hausen

Hall of Famer

allegiance

Explorer

Patrick M. Hausen

Hall of Famer

allegiance

Explorer

allegiance

Explorer

Attachments

Patrick M. Hausen

Hall of Famer

allegiance

Explorer

allegiance

Explorer

Similar threads