ZFS Recursive replication bug? 1 extra unnecessary step causing issues?

joegsn · Jun 13, 2012

I believe I've run into a bug in FreeNAS 8.2-beta3, regarding replicating ZFS datasets, when you have sub-datasets, and recursive snapshots turned on. I understand if it's probably there to prevent accidental data loss, but it appears to also cause an unexpected side effect in replicating sub-datasets.

I have a ZFS dataset structure that looks like this:

Code:

(local system - charon)
NAME                                             USED  AVAIL  REFER  MOUNTPOINT
data1                                            826G  2.55T  51.0G  /mnt/data1
data1/vm_backups                                 141M  2.55T   136K  /mnt/data1/vm_backups
data1/vm_backups/email                           112K  2.55T   112K  /mnt/data1/vm_backups/email
data1/vm_backups/unix                           70.9M  2.55T  70.8M  /mnt/data1/vm_backups/unix
data1/vm_backups/windows                        70.3M  2.55T  70.2M  /mnt/data1/vm_backups/windows

I don't know if it's necessarily a good idea to separate my vmware VDR backups into separate volumes, but that's a discussion for another topic. (It made sense in my brain at the time.)

Anyway, I setup periodic recursive snapshots on data1/vm_backups.

This meant I ended up with a set of snapshots like such:

Code:

(local system - charon)
NAME                                             USED  AVAIL  REFER  MOUNTPOINT
data1                                            826G  2.55T  51.0G  /mnt/data1
data1/vm_backups                                 141M  2.55T   136K  /mnt/data1/vm_backups
data1/vm_backups@auto-20120613.1133-1h              0      -   136K  -
data1/vm_backups/email                           112K  2.55T   112K  /mnt/data1/vm_backups/email
data1/vm_backups/email@auto-20120613.1133-1h        0      -   112K  -
data1/vm_backups/unix                           70.9M  2.55T  70.8M  /mnt/data1/vm_backups/unix
data1/vm_backups/unix@auto-20120613.1133-1h       64K      -  70.8M  -
data1/vm_backups/windows                        70.3M  2.55T  70.2M  /mnt/data1/vm_backups/windows
data1/vm_backups/windows@auto-20120613.1133-1h    64K      -  70.2M  -

(I was originally doing daily snapshots expiring after a few days, but I started testing)

I then went to the ZFS replication tab, and started entering in the information:
Volume/dataset: data1/vm_backups (this is the only selectable option at this point)
Remote ZFS filesystem name: tank/charon
Recursively replicate and remove stale snapshot on remote side: YES (checked)
Initialize remote side for once: YES (checked)
And filled in the remote details.

After some testing, I found I was getting the dataset I specified, and I was certainly getting snapshots - and old snapshots were even expiring - but I wasn't getting the recursive aspect.

Code:

(remote system - atropos)
tank/charon                                    57.3K  1.15T  26.6K  /tank/charon
tank/charon/vm_backups                         30.6K  1.15T  30.6K  /tank/charon/vm_backups
tank/charon/vm_backups@auto-20120613.1133-1h       0      -  30.6K  -
tank/charon/vm_backups@auto-20120613.1148-1h       0      -  30.6K  -

I'd googled some on how ZFS replication works, and the commands in the log files looked correct. I even tested them, and they did run.
I then tried deleting the replication, deleting the periodic snapshots, and all the existing snapshots. I tried doing the process manually. I created a recursive snapshot, and then told it to replicate, based partially on the command I'd seen in the log files, but without being quiet about errors or anything of the sort. It worked perfectly.

Code:

zfs snapshot -r data1/vm_backups@1
zfs send -R data1/vm_backups@1 | /usr/bin/ssh -i /data/ssh/replication atropos "/sbin/zfs receive -Fd tank"

Code:

(local system - charon)
NAME                         USED  AVAIL  REFER  MOUNTPOINT
data1                        813G  2.56T  51.0G  /mnt/data1
data1/vm_backups             141M  2.56T   136K  /mnt/data1/vm_backups
data1/vm_backups@1              0      -   136K  -
data1/vm_backups/email       112K  2.56T   112K  /mnt/data1/vm_backups/email
data1/vm_backups/email@1        0      -   112K  -
data1/vm_backups/unix       70.8M  2.56T  70.8M  /mnt/data1/vm_backups/unix
data1/vm_backups/unix@1         0      -  70.8M  -
data1/vm_backups/windows    70.2M  2.56T  70.2M  /mnt/data1/vm_backups/windows
data1/vm_backups/windows@1      0      -  70.2M  -

Code:

(remote system - atropos)
NAME                                           USED  AVAIL  REFER  MOUNTPOINT
tank                                          1.50T  1.15T  33.3K  /tank
tank/vm_backups                                137M  1.15T  30.6K  /tank/vm_backups
tank/vm_backups@1                                 0      -  30.6K  -
tank/vm_backups/email                         25.3K  1.15T  25.3K  /tank/vm_backups/email
tank/vm_backups/email@1                           0      -  25.3K  -
tank/vm_backups/unix                          68.8M  1.15T  68.8M  /tank/vm_backups/unix
tank/vm_backups/unix@1                            0      -  68.8M  -
tank/vm_backups/windows                       68.6M  1.15T  68.6M  /tank/vm_backups/windows
tank/vm_backups/windows@1                         0      -  68.6M  -

So, I remembered one line from the log file whenever I setup a new sync process:
Jun 12 18:00:01 charon autorepl[70896]: Creating tank/vm_backups on remote system
This was something I had not done on my testing, when logging in and trying it manually. So I got curious, what if I do the first replication with the periodic snapshot, and then let the ZFS replication interface take over from there?

This appeared to work. It seems I also needed to set freenas:state=LATEST when doing this, after which, things seemed to settle into a proper cycle of snapshot, replicate, wait, repeat.

I'm thinking the correct operation would involve detecting sub-datasets, and creating those as well as the dataset in question. That assumes that the creation of the dataset on the remote end is intentional and not some lingering solution from very old versions of zfs send/receive.

Important Announcement for the TrueNAS Community.

ZFS Recursive replication bug? 1 extra unnecessary step causing issues?

joegsn

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

ZFS Recursive replication bug? 1 extra unnecessary step causing issues?

joegsn

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "ZFS Recursive replication bug? 1 extra unnecessary step causing issues?"

Similar threads