SOLVED Question about Rsync and ZFS replication

Alpha-Inc.

Dabbler
Joined
Feb 15, 2021
Messages
25
Hello everybody,

Recently there was a sale at my local IT store where I bought a bunch of HDDs. They now have been deployed to my backup server which will be located at my parents house in order tk have have backup located at a different place in order to be more secure of dataloss related to things like theft, fire or other events that may not be the systems fault. After the installation of my second server, I connected our homes via a VPN. Therefore, my server should be able to create the backup to the remote location over the internet. Regarding the backup, I‘m not sure whether to use Rsync or ZFS replication.

AFAIK, Rsync is a great tool to sync different folders. In my case, the source would be the main server at home the the destination would be the backup server at my kartend house. Every time the Rsync task is being executed, it checks for differences and syncs them after the main server. Is that correct?

And how does zfs replication work? I tried to read a bit about it and if I understand it correct, zfs uses snapshots, but how does this differ from Rsync?

This brings me to my main question - what is the best way to go for me to backup my main server to my remote backup server. Basically, I want the remote backup server to be identical to my main server. If any files gets deleted or altert, it should also apply tk the files on my backup server (like the name suggests). What may be also important, my upload speed is 40 Mbit/s.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
ZFS replication relies on snapshots indeed. Since ZFS keeps a record of every change, ZFS always knows the difference between two snapshots, and once the source and destination hold a common snapshot further replications are incremental affairs: ZFS literally transmit the changes at block level.
By contrast, rsync needs to traverse the source and destination filesystems in full to identify differences, and then transmit modified files.

In short, if both NAS use ZFS, choose replication over rsync.
 

Alpha-Inc.

Dabbler
Joined
Feb 15, 2021
Messages
25
Great. Than I’ll be using ZFS replication.
At the moment I don’t have snapshots scheduled. Do I need to do that or will the zfs replication task do it be default ?
 

ragametal

Contributor
Joined
May 4, 2021
Messages
188
I have experience with both.

I used to backup my truenas system to a different linux system via Rsync. Both systems were on the same network and connected via Ethernet. That process used to take hours just comparing the files, the data transfer itself used to be very quick.

I recently deployed another trueNAS server located at a friends house to serve as target for the backups from my main trueNAS server. I decided to use replication and the same backup that used to take hours with Rsync, now takes 2 minutes regardless of the fact that it is done via the internet. It is lightning fast.

In my experience, with replication you get the advantage of speedy backups, you maintain your snapshots and if you have encrypted the source dataset then the target dataset will also be encrypted. This last point is important because anyone with physical access to the remote dataset may be able to get your data if its not encrypted.

With Rsync, you would have to find a different solution to encrypt the remote target folder to protect your data.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
At the moment I don’t have snapshots scheduled. Do I need to do that or will the zfs replication task do it be default ?
Setting up a replication task may trigger a matching snapshot task, but you probably want to define your own snapshot policy first.
Say:
(hourly, retained for x days)
daily, retained for x weeks
weekly/monthly, retained for x weeks/months
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
That process used to take hours just comparing the files, the data transfer itself used to be very quick.
I remember a project where every day tens or even hundreds of thousands of PDF files were generated. The initial backup test needed about 30 hours just for scanning for changed files (on a pretty beefy SAN). We changed the logic such that the target directory looked like /data/<YEAR>/<MONTH>/<DAY>/<COUNTER> (where the last part was used to keep the number of files per directory below a certain threshold). All we needed then was a little script to determine year, month, and day and do a full backup of that path.

Not directly relevant, but perhaps useful to someone.
 
Top