How to get 500MB/s+ replication speeds on 10GbE

Status
Not open for further replies.

Meyers

Patron
Joined
Nov 16, 2016
Messages
211
If you have a lot of data to move around between boxes and you're on a secure 10 GbE network, you can do some initial replication tasks manually and get vastly improved replication speeds. This doesn't use encryption, so I wouldn't use this method on an unsecured network.

I'm setting up two new storage servers for work. We have about 9TB of data. First I rsync'd the data to box1, then I used a replication task to replicate the datasets to box2 (both FreeNAS-11.0-U1, box2 being a warm standby). Both operations took days. I had to start over and didn't want to wait days again, so I decided to figure out if I could do the initial replication more quickly.

The replication task seemed to max out at about 500-600Mbps (about 60MB/s max) which was surprising, as both these boxes have 10gig interfaces. I was using fast encryption (not sure why I didn't turn this off) but even then, ssh was using maybe 25% CPU so I don't think that was the bottleneck.

In fact, I stumbled on a ticket that I can't seem to find right now that addressed this, I think. It has to do with how FreeNAS runs the replication tasks - something with that process slows it down considerably.

What I ended up doing which worked really well was to use nc as a transport for the initial replication tasks (you can use nc to open basic TCP sockets and transfer data, among other things). Using nc, I was able to get 500MB/s (about 4 Gbps) which is obviously quite the improvement.

First, I set up my snapshot tasks on box1 how I want them to be in production. I let FreeNAS take the initial snapshots.

Then on the box2 (the receiver) I ran:

Code:
nc -w 120 -l 8888 | zfs receive poolname/dataset_being_replicated


I didn't have to create the dataset first.

Then on box1 (the sender) I ran zfs list -t snap | grep dataset_to_replicate to find the snapshot names for the dataset I wanted to replicate. I copied the latest snapshot name with the intent of replicating all snapshots up to that one. Then I ran:

Code:
zfs send -R pool/dataset_to_replicate@auto-20170712.1333-1d | pv -ptera -s SIZE_OF_DATASET | nc -w 20 box2 8888


Running the data through pv shows transfer speed, time elapsed, and if you provide -s XXXg it will give you a rough idea of how long the replication will take (it's not 100% accurate, but it's close). Otherwise you won't get any output until the transfer is complete.

Once this was finished, I set up a replication task in FreeNAS to take care of replication from now on. Everything seems to be working just fine.

Oh, you'll probably also want to run zfs set readonly=on pool/dataset on box2 after the initial replication is complete because this is what FeeNAS will do on the initial replication.

Just wanted to post this because I had to dig and dig to figure this out (as always with FreeNAS / ZFS). If I did anything really stupid here, please let me know. I've been using FreeNAS for a couple of years now and I still feel like a complete novice just on the verge of totally destroying everything even though I'm pretty sure I'm past that point by now!

EDIT: I forgot to re-enable replication from box1 to box2 and when I did, replication started over from scratch again. I *think* this is because all the snapshots on box2 were stale, but I don't know for sure. Make sure you test this procedure well!
 
Last edited:
Status
Not open for further replies.
Top