I want to replicate the data on my not-yet-existing ZFS system to another not²-yet²-existing² ZFS system, but over a very slow connection that I sometimes have to use for other stuff exclusively.
So my approach would be:
1. zfs send into file
2. rsync the file (with many interruptions by reboots of the devices and the connection and by me needing the connection, but finally it will arrive!)
3. zfs receive the file (finish the replication and have all snapshots and metadata correctly).
But there is a thing I do not like about it: The connection is slow and I do not want to have my RAIDZ-y HDD drives spinning (using power, making noise) all the time while one byte after the other is being pushed through the wire.
So my second-level approach is this:
1. zfs send into file on small ssd using a maximum file size
2. rsync the file (the HDDs can sleep during that if not used by other tasks)
3. zfs receive the file
4. if we got a resumable token in step 3, go back to step 1 with it! Otherways we are done.
My questions:
- What do you think about this approach? Could it work? Is it somehow bad practice?
- How can I achieve the maximum file size in step 1? When I pipe the zfs send results to dd and give it a size (count=...), will it stop zfs send from producing more data once "the file is full", or will zfs send produce all the data and dd just ignores it? That would be heartbreaking.
- Once I have that zfs-send-file chunk, can I somehow get the resumable token directly from it? Then I could parallelize the process (create the next chunk before the first one has been received).
So my approach would be:
1. zfs send into file
2. rsync the file (with many interruptions by reboots of the devices and the connection and by me needing the connection, but finally it will arrive!)
3. zfs receive the file (finish the replication and have all snapshots and metadata correctly).
But there is a thing I do not like about it: The connection is slow and I do not want to have my RAIDZ-y HDD drives spinning (using power, making noise) all the time while one byte after the other is being pushed through the wire.
So my second-level approach is this:
1. zfs send into file on small ssd using a maximum file size
2. rsync the file (the HDDs can sleep during that if not used by other tasks)
3. zfs receive the file
4. if we got a resumable token in step 3, go back to step 1 with it! Otherways we are done.
My questions:
- What do you think about this approach? Could it work? Is it somehow bad practice?
- How can I achieve the maximum file size in step 1? When I pipe the zfs send results to dd and give it a size (count=...), will it stop zfs send from producing more data once "the file is full", or will zfs send produce all the data and dd just ignores it? That would be heartbreaking.
- Once I have that zfs-send-file chunk, can I somehow get the resumable token directly from it? Then I could parallelize the process (create the next chunk before the first one has been received).