Multiple zfs recv's in parallel using tee - what is the most OS-efficient way to do it?

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
Since zfs send creates a stream that can be saved and reused, or piped to zfs recv, it's clearly possible to pipe zfs send through tee, and simultaneously create 2 copies of the source pool in parallel with one (slightly slower?) pass of the source.

I've seen several suggested syntaxes, but are there any that are preferable for a source pool that's tens of TB in size and will take 2-3 days to replicate, due maybe to the OS design and how it (differently) processes each of these? Or will they all perform near identically, and use near identical resources?

  1. Tee with 1 redirection:
    zfs send SEND_ARGS | tee >(zfs recv RECV_ARGS1) | zfs recv RECV_ARGS2

  2. Tee with 2 redirections:
    zfs send SEND_ARGS | tee >(zfs recv RECV_ARGS1) >(zfs recv RECV_ARGS2)

  3. Second session and stdin:
    Start another session to the same host, use tty to find its TTY name (eg /dev/ttyv2) then in the new session run zfs recv RECV_ARGS1 < &1 and in the original session run zfs send SEND_ARGS | tee >/dev/ttyv2 | zfs recv RECV_ARGS2

  4. Named pipes:
    mkfifo "~/pipe1" "~/pipe2" zfs recv RECV_ARGS1 <"~/pipe1" & pid1=$! zfs recv RECV_ARGS2 <"~/pipe2" & pid2=$! zfs send SEND_ARGS | tee "~/pipe1" "~/pipe2" wait $pid1 $pid2 rm ~/pipe1 ~/pipe2

Bonus points if there are other options that would make it more efficient if added to the mix. Metric used: Time from kicking it off, to both copies being completed, with nothing else running.

For info, TrueNAS 12-U1 (at the moment) is running off a mirrored 40 GB SSD + 256 GB ECC RAM + 8 core Broadwell-EP Xeon (E5 v4), so it's got some room for caching data in transit, but not 40 TB's worth or even a fraction of it. Also pool and targets are deduped so a ton of CPU involved in hashing data at the two zfs recv processes, which might be a soft limit.

I could also pipe one of them (or both??) across a 10GbE LAN to spread that CPU load across 2 systems. But that wouldn't differentiate between which version of the command was better would it?
 
Last edited:
Top