Replication problem, what to make of the errors?

Daniel Claesson · Aug 14, 2016

Hi all,

I have some problem with replication tasks. I replicate my main FreeNAS box to a secondary FreeNAS box located in another part of the building.

I have a snapshot of the whole ZFS volume on the main box. This snapshot i have then created a replication task for to my secondary box.

If i log in to my main box and check the status of the replication task, it i stated as "success / up to date".
But i still get e-mail notifications like these:

##
The replication failed for the local ZFS vipera/jails/plexmediaserver_1 while attempting to
apply incremental send of snapshot auto-20160812.2132-1w -> auto-20160813.2132-1w to 192.168.5.101
##
The replication failed for the local ZFS vipera/jails/.warden-template-pluginjail-clean-clone while attempting to
apply incremental send of snapshot auto-20160812.2132-1w -> auto-20160813.2132-1w to 192.168.5.101
##
The replication failed for the local ZFS vipera/jails/owncloud_1 while attempting to
apply incremental send of snapshot auto-20160812.2132-1w -> auto-20160813.2132-1w to 192.168.5.101
##
The replication failed for the local ZFS vipera/jails/transmission_1 while attempting to
apply incremental send of snapshot auto-20160812.2132-1w -> auto-20160813.2132-1w to 192.168.5.101
##

It sure looks like it's only the jails that is giving errors.

Some info on systems and config:
- Periodic Snapshot task config is: Volume=vipera, Recursive=Yes, When=06:00-01:00 everyday, Frecuency=every 1 day, Keept=1 week, VMware sync=Yes

- Replication task config: Volume=vipera, Remote host: IP secondary box, Remote ZFS volume=Tank/Rep-backup, Delete stale=Yes, Rep Stream Comp=lz4, No kb/s limit and can run whenever 24/7.

HW Configs (short version):
- Main Box= Supermicro X9SRL-F, Intel Xeon-E5 2620v1, 32GB ECC RAM, Boot 160GB Intel 320 SSD, Storage 6x 2TB Seagate drives in RAIDz2

- Secondary Box= HP Proliant DL120 G6, Intel Pentium G6950, 12GB ECC RAM, Boot Sandisk 8GB USB, Storage 4x 2TB Seagate drives in RAIDz1

Have i done a "brainfart" and messed up my configs of the replication task or what else can it be that is making the replications of the Jails fail.

Sorry for any bad english in this topic.

Best Regards
Daniel Claesson
Sweden

depasseg · Aug 15, 2016

Nothing obvious jumps out. Have you tried the "initialize the remote side" option? This will delete the data on Pull and start a fresh replication. Also, any chance of a network issue? Is it only happening to the jails datasets (you can try creating separate replication tasks for the lower level datasets as a test)?

Daniel Claesson · Aug 17, 2016

Hi and thanks for your replay.

The "initialize the remote side" seem to be a legacy thing, as it is not present in version 9.10
See documentation: http://doc.freenas.org/9.10/freenas_storage.html#replication-tasks
But here in the old docs it is present: http://olddoc.freenas.org/index.php/Replication_Tasks

Network issues should not be a thing, i have a very basic setup. Only one VLAN for the FreeNAS boxes and the clients through managed HP switches. No "other" indications on a network error or disturbance. No dropped packets, low latency and overall good performance.

As i see no other option then to redo the snapshots and the replication from scratch with datasets as separate tasks. I will keep this thread up to date on how things progress.

nojohnny101 · Aug 17, 2016

Whenever I have seen those error messages, it has been one of these things:

1) There is a dataset that I recently added on "PULL" within the vdev being replicated that is not on "PUSH". That can throw error messages (ex: clone a previous snapshot to pull files off of it and and forget to delete it).

2) I know the system should clear snapshots out on PULL for datasets that no longer exist on PUSH but I had to do it manually one time to fix similar errors to what you're seeing.

Would it be possible for you just to delete all snapshots on PULL and start fresh?

Daniel Claesson · Aug 18, 2016

Hi,

I did a complete "reinstall/reconfigure" of the whole replication "flow". On the "Pull" i deleted the dataset and reconfigured a new one without any data on it.
After that i reconfigured "push" with new snapshots and replication tasks, every dataset as a separate task this time, before i did a snapshot and a replication task of the whole pool.

But so far i still get some error but not the same as before. This error was generated this night.
##
The replication failed for the local ZFS vipera/ESXi-Storage while attempting to
send snapshot manual-20160220 to 192.168.5.101
##
Replication vipera/ESXi-Storage -> 192.168.5.101:TANK/Rep-backup failed: Failed: vipera/ESXi-Storage (manual-20160220)
##

This might be a one time error and things will start to work after some time, when there are more snapshot task completed.
Any thoughts on this?

depasseg · Aug 18, 2016

Can you prove a screen shot of your snapshot jobs and replication jobs? Do you have replication running on each dataset individually as well as a recursive one for the whole pool?

Daniel Claesson · Aug 19, 2016

Hi,

Of course i can, see below.
At the moment i only have individual replications running on each dataset and no recursive one for the whole pool. This seems to work, as since last reply i haven't got any more error messages.

This is the view of "Perodic Snapshot Tasks" on PUSH:

This is the view of "Replication Tasks" on PUSH:

This is the view of "Snapshots" on PULL:

Important Announcement for the TrueNAS Community.

Replication problem, what to make of the errors?

Daniel Claesson

Dabbler

depasseg

FreeNAS Replicant

Daniel Claesson

Dabbler

nojohnny101

Wizard

Daniel Claesson

Dabbler

depasseg

FreeNAS Replicant

Daniel Claesson

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Replication problem, what to make of the errors?

Daniel Claesson

Dabbler

depasseg

FreeNAS Replicant

Daniel Claesson

Dabbler

nojohnny101

Wizard

Daniel Claesson

Dabbler

depasseg

FreeNAS Replicant

Daniel Claesson

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Replication problem, what to make of the errors?"

Similar threads