Increasing rsync performance

Status
Not open for further replies.

starfish

Dabbler
Joined
Feb 24, 2013
Messages
10
We run rsync between production XFS filesystems over 10GE to our FreeNAS storage. I'm hoping to decrease the time rsync takes to determine the differential files. Is this possible by adding L2ARC cache? If so, what option is best? PCIe SSD or standard 2.5" SSD's? Once rsync figures itself out, the writes are decent over 10GE (400-500MB/s).

The data set is currently over 100TB. Each volume we rsync is around 60TB or so.

I believe the rsync process initially reads metadata. If this could reside in cache the transfers hopefully will move quicker.

Thanks!

edit: increase <-> decrease :)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Wait, you want to increase the time it takes rsync to run?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
I don't think there is a way. I had experimented with rsync with about 20TB of data. Here's how I think rsync works:

1. The machine with the rsync task that starts sends a message to the destination machine to checksum its files. Both machines then begin checksumming their files. Note that this is ALL files and not new or changed files. This is because rsync doesn't trust the date/time stamps. Also this step seems to be single threaded. (I believe you can change this to use date/time stamps but I'm not 100% sure and I couldn't use date/time stamps myself)
2. Once both machines complete the checksumming then a list of files to create/update is compiled between the 2 "lists" of files on both ends.
3. The files are uploaded to the backup machine as appropriate. There seems to be some kind of checksumming of the data as it is sent because 1 CPU maxes out during the transfer rate for rsync(on my i7 machine we couldn't get above about 28MB/sec on Gb).

I also had files that had non-english characters in the file names. These did not copy over with rsync properly. I'm not sure what the exact problem was or how to fix it, but I gave up trying to mess with rsync once I determined that you will be waiting for the list of files that are checksummed and the 28MB/sec "speed limit".

Using ZFS replication I was able to get a constant 110+MB/sec for 5 days straight as I migrated servers.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Rsync has been around a LONG time and there are a LOT of options. I'd suggest doing some more reading/googling to get familiar with them. Like Unix, it a very powerful and flexible tool. Maybe if you posted the command/options you're using someone might have a more specific idea ;)

There's also Unison which hasn't been around as long as Rsync, but a lot of people prefer it. It also has a lot of options, so make sure you understand them or you could end up clobbering files in the wrong path. Unison isn't included by default unlike Rsync, so you'd need to install the package in a jail.
 

starfish

Dabbler
Joined
Feb 24, 2013
Messages
10
Thanks for the info.

Here's the command I'm using for rsync:

Code:
rsync -av --stats --progress -e ssh /source/directory user@1.2.3.4:/mnt/destination


To make it quicker, I have been running multiple instances on different directories within the "/source/directory" path. This helps to fill gig circuits but it isn't as manageable as I like.

I'll check out Unison as well.

Thanks!
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
FWIW, I did some admittedly informal testing re: Rsync to a local DAS RAID. The L2ARC in question is a old EVO840, 500GB attached to SATA @ 6Gbit/s.

Stock Mini XL, No L2ARC - 6 hrs 25 min
w/ 500GB L2ARC, stock FreeNAS settings, first pass - 6hrs 15 min
w/ 500GB L2ARC, stock FreeNAS settings, 2nd pass - 3hrs 25 min

AFAIK, no other major processes were running, i.e. scrubs, bit-by-bit file comparisons between source and destination, and so on. But let me run a couple more trials. I was expecting an improvement from trial 1 to trial 2 but not a ~45% reduction in time needed. Actual file transfer quantities were similar, so that wasn't it either.

I'm going to explore playing around with some of the L2ARC settings to see just how they affect the overall directory 'traversal' time. I suspect that keeping the metadata in the L2ARC will speed up not only RSYNC but also simple directory browsing by the user.
 
Status
Not open for further replies.
Top