Replication over slow connection

troudee · May 6, 2020

I want to replicate the data on my not-yet-existing ZFS system to another not²-yet²-existing² ZFS system, but over a very slow connection that I sometimes have to use for other stuff exclusively.

So my approach would be:
1. zfs send into file
2. rsync the file (with many interruptions by reboots of the devices and the connection and by me needing the connection, but finally it will arrive!)
3. zfs receive the file (finish the replication and have all snapshots and metadata correctly).

But there is a thing I do not like about it: The connection is slow and I do not want to have my RAIDZ-y HDD drives spinning (using power, making noise) all the time while one byte after the other is being pushed through the wire.

So my second-level approach is this:
1. zfs send into file on small ssd using a maximum file size
2. rsync the file (the HDDs can sleep during that if not used by other tasks)
3. zfs receive the file
4. if we got a resumable token in step 3, go back to step 1 with it! Otherways we are done.

My questions:
- What do you think about this approach? Could it work? Is it somehow bad practice?
- How can I achieve the maximum file size in step 1? When I pipe the zfs send results to dd and give it a size (count=...), will it stop zfs send from producing more data once "the file is full", or will zfs send produce all the data and dd just ignores it? That would be heartbreaking.
- Once I have that zfs-send-file chunk, can I somehow get the resumable token directly from it? Then I could parallelize the process (create the next chunk before the first one has been received).

troudee · May 22, 2020

Are there any opinions on this?

troudee · Jun 14, 2020

Are there really no opinions on this?

MikeyG · Jun 14, 2020

It might be because I've never looked into it, but why are you adding the rsync step in the middle? Why not just set up a standard replication task, and let FreeNAS do it's thing? I'm pretty sure replication auto resumes on failure now. If you are that worried about bandwidth usage, I'd imagine you could QoS the connection or do something on the firewall. How much data are we talking about and how fast is the connection?

I also don't understand why you'd try and spin down drives as they are reading/writing data. FreeNAS is not really designed to spin drives down, although it can be done, but doing it while writing data to it is impossible.

troudee · Jun 27, 2020

Oh, I am terribly sorry, I forgot to hit the "reload" button! :(

mgittelman said:
It might be because I've never looked into it, but why are you adding the rsync step in the middle? Why not just set up a standard replication task, and let FreeNAS do it's thing? I'm pretty sure replication auto resumes on failure now.

I added that step to be easily able to cancel and resume the process, i.e. let it explicitly run from midnight to 8am. Do you think I can manage that without the rsync step? I thought FreeNAS only starts based on schedule but only stops on error or EOF.

mgittelman said:
If you are that worried about bandwidth usage, I'd imagine you could QoS the connection or do something on the firewall.

If I only had a great firewall. :-D I only have a quite standard Wifi router and for the more important stuff a firewall behind it. The latter could do QoS, but it cannot priorize the gaming, because the gaming/TV/otherUntrustwothyStuff is on the outer ring, outside the firewall.

mgittelman said:
How much data are we talking about and how fast is the connection?

I cannot say yet, but I guess it will be loading for quite a few months, but only in the beginning.

mgittelman said:
I also don't understand why you'd try and spin down drives as they are reading/writing data. FreeNAS is not really designed to spin drives down, although it can be done, but doing it while writing data to it is impossible.

That's a good point and I already abandoned that part of my vision. (Of course I would only have spun them down after a few hours of inactivity anyway. I expect the activity to be "HUGE amount coming" ... "big pause" ... repeat, while the replication would run quite permanently due to the slow connection)

MikeyG · Jun 27, 2020

Correct, I don't think you can schedule replication to stop. It only does so on error or if you kill the task. If you do replication manually with resumable token, you can stop and restart the task from command line (use tmux for a session that won't expire) whenever you want. Since I've done that before, if I couldn't QoS and was very worried about bandwidth, I might go that route depending on how many times I'd have to stop and resume the task. Your idea with rsync might work - I've just never tried it.

If it was going to take me weeks or months to get the amount of data over, there's always the option of replicating snapshots to an external drive, and physically shipping it. Depends on the logistical details involved if that approach makes sense. 5TB is nothing to replicate to an external drive, but a heck of a lot to send over a 10mb link.

elorimer · Jun 27, 2020

mgittelman said:
If it was going to take me weeks or months to get the amount of data over, there's always the option of replicating snapshots to an external drive, and physically shipping it. Depends on the logistical details involved if that approach makes sense. 5TB is nothing to replicate to an external drive, but a heck of a lot to send over a 10mb link.

Snail mailing like this makes a lot of sense to me. Particularly if both servers are "not yet existing", put them side by side and do the replication, then ship the whole darn thing. Or, if the pool can be divided into suitable datasets that fit on an external drive(s), ship the external drive(s) back and forth and then set up the replication again. Or, if the primary server has enough ports for both pools, replicate, detach, ship, import, set up new replications. Might there be a way of doing it with split mirrors too? Particularly easy if you set up a means to access the remote server over the net (like where you have RDP access to the remote network).

I'm doing the first now for a 1,2,3 strategy because my upload link is 20mbps. (It's taking me months, but that is because the remote location is on lockdown.)

MikeyG · Jun 30, 2020

Just noticed this in 11.3:

Looks like you can limit replication speed right in the replication task. The limit might only be changeable before the task is run, but you should be able to pick a number that allows for your other applications to work correctly over your connection. That is if the snail mail approach doesn't work for you.

troudee · Jul 5, 2020

Hmm, just thinking about how complicated (rsyncery) or nasty (snail mail ^^) it could get, I am almost convinced to use my lazy brain for the QoS way.

The Limit setting could be just what I need!

Important Announcement for the TrueNAS Community.

Replication over slow connection

troudee

Explorer

troudee

Explorer

troudee

Explorer

MikeyG

Patron

troudee

Explorer

MikeyG

Patron

elorimer

Contributor

MikeyG

Patron

troudee

Explorer

Similar threads

Important Announcement for the TrueNAS Community.

Replication over slow connection

Explorer

Explorer

Explorer

Patron

Explorer

Patron

Contributor

Patron

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Replication over slow connection"

Similar threads