Copying Datasets to New Machine With zfs send/recv

Status
Not open for further replies.

ewhac

Contributor
Joined
Aug 20, 2013
Messages
177
I'm not quite grokking the semantics of copying datasets around using zfs send and recv. Specifically, I think my familiarity with rsync is causing me confusion when thinking about copying datasets.

As I understand it, zfs send and recv operate on snapshots. This seems simple enough: Create snapshot of source dataset, zfs send it to a stream (over the network), zfs recv the stream into a destination dataset. That dataset then acquires a new snapshot. "Rollback" the destination dataset to that snapshot. Poof! You've replicated the dataset. Except...

Every time I've tried it, zfs recv creates new datasets, rather than tacking snapshots onto existing ones. If I specify an existing dataset, an error is reported:
"cannot receive new filesystem stream: destination 'foo/bar/dataset' exists. must specify -F to overwrite it". Yeah, I know it exists, I'm trying to fill up that dataset with files.

The experiment I tried went as follows:
  • Using the GUI, create a recursive snapshot against the source dataset ( tank/home/user), named it xfer-test.
  • Using the GUI, create a dataset on the destination machine ( tank/recvtest).
  • On the source machine: zfs send -Rv tank/home/user@xfer-test | mbuffer -s 128k -m 1G -O [X.X.X.X]:9090
  • On the destination machine: mbuffer -4 -s 128k -m 1G -I 9090 | zfs receive -Fd tank/recvtest
What I expected was for all the files in tank/home/user to appear underneath tank/recvtest. Instead, two new datasets appeared -- tank/recvtest/home and tank/recvtest/home/user -- with the files appearing under the latter.

Okay, not what I expected, but the files are here and, after fixing up some ownership issues, was ready to move them into their final home of tank/home/user (same as on the source machine). So I thought to use zfs send again to ensure all metadata got transferred. I created a new non-recursive snapshot named bouncehouse (containing all the ownership fixes) and attempted to send it over. And that's where I bumped into "dataset exists" errors (note I'm not sending with the -R flag in this case):

Code:
root@freenas# zfs send tank/recvtest/home/user@bouncehouse | zfs recv -nv tank/home/user
cannot receive new filesystem stream: destination 'tank/home/user' exists
must specify -F to overwrite it
warning: cannot send 'tank/recvtest/home/user@bouncehouse': signal received

Reading the man page, zfs receive -F says:
Code:
		 -F	  Force a rollback of the file system to the most recent
				 snapshot before performing the receive operation. If
				 receiving an incremental replication stream (for example, one
				 generated by "zfs send -R {-i | -I}"), destroy snapshots and
				 file systems that do not exist on the sending side.

To solve the immediate problem, I suppose I could create a snapshot on tank/home/user (it's effectively empty at the moment) and then supply the -F flag. But clearly I'm not understanding some underlying concepts about what zfs send/recv are actually doing that would make such a rollback necessary (again, my thinking is probably clouded by trying to think of it in terms of rsync). Can someone point me in the right direction?
 

ewhac

Contributor
Joined
Aug 20, 2013
Messages
177
(It occurs to me that I could destroy/rename the tank/home/user dataset, then rename tank/recvtest/home/user to tank/home/user (and then fix up the dataset properties (they're not identical), and hopefully Samba/CIFS won't throw a fit). But pretend I didn't say that. I still want to better understand what zfs send/receive are actually doing.)
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Its been a little while, but the zfs send stream is from either the beginning of time, to the specified snapshot, or is incremental, from one snapshot to another. For the stream to be appended to the receiving dataset, there needs to be a common snapshot, and you can't just take a snapshot on the destination dataset and expect that to count.

If there is no data on the receiving dataset, just F (force) it to overwrite. Thence onward, just use incrementals.

So, if you are only planning this to be a once off, but the source dataset is busy... and may take time... you take a snapshot, send it... and when its done, you send another incremental snapshot from the previous snapshot, to the new one... repeat until there are no changes you care about.

Or just let FreeNAS do it.
 

PhilipS

Contributor
Joined
May 10, 2016
Messages
179
What I expected was for all the files in tank/home/user to appear underneath tank/recvtest. Instead, two new datasets appeared -- tank/recvtest/home and tank/recvtest/home/user -- with the files appearing under the latter.

This is because of the -d option on your receive, which preserves the dataset path after the pool name. (Be aware that the built in replication uses this option as well.)

the zfs send stream is from either the beginning of time, to the specified snapshot

This sounds like every snapshot will be sent up to the specified snapshot - may not be what you intended to say, but to clarify, only the specified snapshot is sent. If you are sending the latest snapshot, none of the earlier snapshots will be sent - all the data will be there, you just can't rollback to earlier snapshots on the destination since those snapshots do not exist there.

If you want all of them, then you need to start with the oldest snapshot and then incrementally send the next snapshot and so forth. The built in replication script does this if you start from scratch. If you manually send the latest snapshot, then enable the built in replication task, it will send any incremental snapshots created after that one, but none of the earlier ones. It isn't possible to send a snapshot that is older than one on the destination without a rollback/destroy of all the newer snapshots first - using the -F option.
 

ewhac

Contributor
Joined
Aug 20, 2013
Messages
177
What I've been trying to do is the initial transfer of data from the old NAS to the new one. So naturally I first created the users' home directories/datasets (one dataset per user), then tried to copy data from the old NAS in to them.

After some fiddling around, I think the key concept is that zfs send/recv copies datasets. It does not copy the contents of a dataset. ("It does, too!") Hold on a sec...

Imagine you have a mv/cp program, but all you can manipulate are directories. You can't use those commands to manipulate the files within those directories. Thus, if you attempt to move a directory to a location that already has a directory of that name, you could do one of three things:
  1. Place the source directory inside the destination directory;
  2. Replace the destination directory with the new one, discarding the previous directory and all its contents;
  3. Report an error, saying, "Destination directory exists."
This seems to be how zfs send/recv treats things. If there's no dataset already there, create it. If there is already a dataset there, report an error, and offer the option -F to blow away the existing dataset and replace it with the new one. zfs send/recv does not "scoop out" files from the source dataset and deposit them inside another dataset.

The rules seem to change slightly with incrementals. I haven't played with those yet, but so far it feels like applying diffs on top of a dataset which must already exist.
 
Status
Not open for further replies.
Top