ZFS replication from local to remote site - how to achieve initial data sync over the line without preseeding

mano

Cadet
Joined
Sep 29, 2023
Messages
1
Hi,

I’m missing some fundamental aspects here so please could someone explain “step1” of inter-site replication. I seem to have gotten myself to “step2” successfully but missed the precursor. Rummaging though the IX documentation is not providing sufficient clarity to me. Also looked at https://www.truenas.com/community/threads/how-i-did-site-to-site-replication.22802/ and https://www.truenas.com/community/threads/initial-replication-task-from-on-site-data.105983/ but still unclear.

I setup 2 TrueNAS 13 core systems, a local with 65TB storage and remote with 20 TB storage available. There is a 100 Mbps layer 2 private 1:1 data link between the sites. There is currently no WAN optimisation in place.

I installed TrueNAS independently on each site with the same data source tree layout on each. Compression on, dedupe off.

I test copied 750 GB of VM full backups to local data source zpool01/vdata. Incremental backup files are added periodically to the same location on the local site.

I wish to replicate this data to the remote site after hours. An off-site mirror copy if you will.

I realise I can use rsync for this and was going to do so, but then saw the push replication task feature and tried that instead.

I am uncertain which approach, rsync vs zfs replication over ssh+netcat might be more performant for my use case so as an aside if anyone has pointers to any comparisons that would be useful. The full Veeam backup files are 10 GB-1.5TB in size with the delta forward incrementals smaller of course. A full backup of all my current data would comprise about 15 TB and the eventual GFS incremental deltas about 5 TB. While possible it’s a bit impractical to pre-seed the remote site so I wanted to copy everything over the data link.

In the TrueNAS replication configuration I selected the (local) Source and (remote) Destination sources and a once daily replication schedule.

Other settings:
Recursive = unchecked
Include dataset properties = checked
Almost full filesystem replication = unchecked
Encryption = unchecked
Replication from scratch = checked
Run automatically = checked
Schedule = checked.

End result is I have replicated a 25 GB snapshot to the remote site which I infer is yesterday’s incremental delta files on the local source.

However, very clearly, judging from my 750 GB initial file sizes on local I’m missing a copy of the original data from the local site at the remote site so what have I missed doing here?

I didn’t check Recursive because all files are to be stored in the same source “folder” i.e. no “subfolders” will be created.

I also took that because I had checked the “Replication from scratch” that I’d achieve a full replication to the remote site but this didn’t happen as I expected.

I’m wondering whether I possibly should have selected the “Almost full filesystem replication” and checked that.

The reason I didn’t go this route is because its conceivable that I may want to do 48 hrs of 2 hourly snapshots at the local site in the future, but I only wish to replicate a single daily snapshot to the remote site, and it seemed to me that if I chose “Almost full filesystem replication” I’d copy all snapshots made on local and I don’t require that level of RPO at the remote site. I have since figured this is hardly an issue as it appears I can create a replication task based on a specific named daily snapshot or schedule and thereby only replicate the daily local snapshot to the remote.

I was also planning to have a shorter snapshot lifetime (fewer snapshots) stored on the remote site than the local site. This simply because I presently have less space available at the remote.

Thus, given above confusion I am unsure how to best approach the initial sync of data to the remote site.

Thanks for any insights, suggestions.

This is what I presently have as snapshots on local – only one is a snapshot of vdata which is where the backup data I wish to replicate off site resides.

2023-09-29 15_19_11-ZFS replication from local to remote site - how to achieve initial data sy...png
 
Top