RSync vs. Zfs snapshot Replication over LAN

Status
Not open for further replies.

Zohaib

Dabbler
Joined
Dec 18, 2013
Messages
15
Hi all,

I have two FreeNAS box, i have 8 TiB used space one one box and second one is free, I want to replicate my 8TiB data to second FreeNAS Box which is free. I am confuse which replication method i should choose RSync or zfs Replication. Can anybody help me to select replication method, which one is better between RSyns and zfs replication and why.

Regards
Zohaib
 

Zohaib

Dabbler
Joined
Dec 18, 2013
Messages
15
In my case i am going with Rsync its better when there connectivity issue aswell
 

Oded

Explorer
Joined
Apr 20, 2014
Messages
66
For the initial backup, the best approach I found (over LAN only!) was using the wget -m command. This mirrors one folder to the other server using ftp. The benefit is that there is no encryption or compression done so it is faster than rsync or scp. I got around 3-4x better speeds with this approach.

The downside of course is that it doesn't use encryption or compression :). I would never ever use this unless it's on the same LAN that is in my full control. Once it's out there, it's visible to anyone interested.

After you made your first sync, I think rsync makes more sense due to the reasons you provided above.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
It can also backfire badly if you disable atime and your file times don't change. Whoops!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
wget defaults to retrieving files based on time stamps and size. If you change a file but have atime off and the file doesn't change size then wget -m will be skipped. So the first time you do a wget -m it works great. It downloads EVERYTHING. The next time it only does what it thinks are changes, which are found based on time stamps and size. But if your time stamps are off then you just lost your primary indicator. To add to the mess, some programs don't update the time stamp. I don't know how they do this, but I've seen it twice in Windows which really sucks because programs that go by time stamps freak out.
 

Oded

Explorer
Joined
Apr 20, 2014
Messages
66
I agree. As I wrote above - this should be used only for the first sync. Once that is out of the way, replications or rsync are the better solution. I did notice however that it created the folders as new (timestamped to yesterday), but that makes sense because it builds the structure of the directories logically on the local machine, it doesn't copy the directory. If that can be a problem, then it's another reason to avoid wget.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
wget defaults to retrieving files based on time stamps and size. If you change a file but have atime off and the file doesn't change size then wget -m will be skipped.
If that were true, rsync would have the same problem. atime tracks access time, not file modification (mtime and ctime) or inode change (ctime). If you write to a file all three will update unless atime is disabled. I don't think you can disable mtime or ctime.

It is possible to write to a file and not affect the mtime (or change it back to what it was before the write), but this would impact rsync as well as wget.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Not necessarily. rsync can do matching by checksum of files on both ends. So file size and atime doesn't matter. Now you can choose to change the behavior, but the default is checksum. Most people choose to use checksum, which is precisely why rsync sucks big performance nuts when you have 10TB+ of data because every single byte is checksummed and compared against the destination(which also must do all of its checksumming).

To make matters worse, the checksumming is single-threaded. Arrg.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
The default is file size and modified time:
From BSD man page for rsync:
Rsync finds files that need to be transferred using a "quick check" algorithm (by default) that looks for files that have changed in size or in last-modified time.
Rsync is slow when files have changed and need to be updated, but if the majority of files haven't changed in size or modification time, it's pretty quick compared to sending everything. ZFS replication will be faster.

Further, atime is not modified time. atime is access time. atime is informative, but can be safely disabled on most filesystems if you don't care to take the performance hit of updating metadata everytime a file is opened. If modified time (mtime or ctime) were disabled on a dataset, rsync and wget would never transfer anything after the initial transfer.

You can disable the time and size checking in rsync using '--ignore-times' (don't ignore files that haven't changed in size or time) or examine only sizes using '--checksum' (files differing in size will be transferred, files with the same size will be checksummed first). see: reference and the above man page.
 
Status
Not open for further replies.
Top