Replication error

dovla091

Cadet
Joined
Feb 5, 2020
Messages
9
Hi,
I have set replication between two FreeNAS storage. I am pushing data from one to another, and everything seems to be OK until I got some weird error:

Error
I/O error
most recent snapshot of Replica/ZFS does not
match incremental source.
Logs
[2020/05/15 00:34:48] INFO [Thread-98] [zettarepl.paramiko.replication_task__task_7] Connected (version 2.0, client OpenSSH_8.0-hpn14v15)
[2020/05/15 00:34:48] INFO [Thread-98] [zettarepl.paramiko.replication_task__task_7] Authentication (publickey) successful!
[2020/05/15 00:34:57] INFO [replication_task__task_7] [zettarepl.replication.run] For replication task 'task_7': doing push from 'production/ZFS' to 'Replica/ZFS' of snapshot='auto-2020-05-15_00-00' incremental_base='auto-2020-05-14_00-00' receive_resume_token=None
[2020/05/15 00:34:57] INFO [replication_task__task_7] [zettarepl.paramiko.replication_task__task_7.sftp] [chan 67] Opened sftp connection (server version 3)
[2020/05/15 00:34:57] INFO [replication_task__task_7] [zettarepl.transport.ssh_netcat] Automatically chose connect address '134.107.107.245'
[2020/05/15 00:34:58] ERROR [replication_task__task_7] [zettarepl.replication.run] For task 'task_7' unhandled replication error SshNetcatExecException(ExecException(1, 'I/O error\n'), ExecException(1, 'most recent snapshot of Replica/ZFS does not\nmatch incremental source\n'))
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/zettarepl/replication/run.py", line 142, in run_replication_tasks
run_replication_task_part(replication_task, source_dataset, src_context, dst_context, observer)
File "/usr/local/lib/python3.7/site-packages/zettarepl/replication/run.py", line 203, in run_replication_task_part
... 6 more lines ...
ReplicationProcessRunner(process, monitor).run()
File "/usr/local/lib/python3.7/site-packages/zettarepl/replication/process_runner.py", line 33, in run
raise self.process_exception
File "/usr/local/lib/python3.7/site-packages/zettarepl/replication/process_runner.py", line 37, in _wait_process
self.replication_process.wait()
File "/usr/local/lib/python3.7/site-packages/zettarepl/transport/ssh_netcat.py", line 183, in wait
raise SshNetcatExecException(connect_exec_error, self.listen_exec_error) from None
zettarepl.transport.ssh_netcat.SshNetcatExecException: I/O error
most recent snapshot of Replica/ZFS does not
match incremental source

I do have snapshots which are created daily but shouldn't this replication backup everything? why sudden does not match incremental source?
and how can I fix this?
 

dovla091

Cadet
Joined
Feb 5, 2020
Messages
9
does this have to do that I have set automatic snapshot on a remote target storage, so the snapshot is done after the one which is being transferred from the primary storage?
 

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
Did you get any answers to this? What version are you on?
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
does this have to do that I have set automatic snapshot on a remote target storage, so the snapshot is done after the one which is being transferred from the primary storage?
That is the most likely scenario.
On the remote, you can delete the snapshot which were created automaticaly.
 

dovla091

Cadet
Joined
Feb 5, 2020
Messages
9
On my case I erased everything and start from the scratch. I have made first manual snapshot on the remote server (secondary replication server as the replication is not going to run if there is none...) Then I have created automatic snapshot on the source server (master) and replication that will be shipped every day at 00:00h. I have used wizzard. P. S. Now everything works perfectly. The problem was most likely due to fact that if you create auto snapshot on the target server, it conflicts with the snapshot from the source that is copied to a target... BTW I use the latest version 11.3 if I recall
 
Joined
Jan 3, 2019
Messages
8
Hi,

We are running FreeNAS 11.3 and we have the exact same issue.
The interesting thing is that in our case, we don't have snapshots enable on the remote host.

Any other ideas?
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
I think, maybe some of the snapshot have expired and been destroyed on the source and the replication cannot find what was last pushed to the remote.
 

dovla091

Cadet
Joined
Feb 5, 2020
Messages
9
Hi,

We are running FreeNAS 11.3 and we have the exact same issue.
The interesting thing is that in our case, we don't have snapshots enable on the remote host.

Any other ideas?
I have been running around to fix this issue, so after many attempts, I end with deleting all the snapshots on the source and then set a completely new one, and on the target set everything set up (SSH connection, generating SSH keys, and manual snapshot - only one on empty storage...) without any auto-snapshot..., so only 1 manual snapshot is created, as appliance required me to do so... after that everything worked as expected. What I believe is, that this appliance is missing some additional conditional statements that should do some pre-check, before you set replication on..., as sometimes you really end up with messy/partial replication on the secondary storage (like it was in my case). As Apollo suggested, it might be that some of snapshots are missing so it cannot figure out what was pushed and what is not, but what I also noticed when you need to move an insanely large amount of data (I am talking about approx 30TB of data) from which replication needs to be pushed to the secondary storage, if on the source appliance, some files are missing or being deleted by the users, I encountered with errors that those files cannot be moved as they does not exist on the source anymore... Again, I do not know why or how, but that was my case. So try to remove all snapshots and make if from scratch and set replication. I know that this means that you will loose possibility of "point in time recovery" but this is the risk you will need to take.

Cheers.
 
Joined
Jan 3, 2019
Messages
8
I have been running around to fix this issue, so after many attempts, I end with deleting all the snapshots on the source and then set a completely new one, and on the target set everything set up (SSH connection, generating SSH keys, and manual snapshot - only one on empty storage...) without any auto-snapshot..., so only 1 manual snapshot is created, as appliance required me to do so... after that everything worked as expected. What I believe is, that this appliance is missing some additional conditional statements that should do some pre-check, before you set replication on..., as sometimes you really end up with messy/partial replication on the secondary storage (like it was in my case). As Apollo suggested, it might be that some of snapshots are missing so it cannot figure out what was pushed and what is not, but what I also noticed when you need to move an insanely large amount of data (I am talking about approx 30TB of data) from which replication needs to be pushed to the secondary storage, if on the source appliance, some files are missing or being deleted by the users, I encountered with errors that those files cannot be moved as they does not exist on the source anymore... Again, I do not know why or how, but that was my case. So try to remove all snapshots and make if from scratch and set replication. I know that this means that you will loose possibility of "point in time recovery" but this is the risk you will need to take.

Cheers.
OK, thank you for your reply. I'm unable to start from scratch currently but I consider your advice.

Additional information linked to @Apollo 's answer:
I now have the exact same error with a another dataset with a single snapshot. I did this test to be sure that the datas are identical on both side.
 
Top