Replication error

dovla091 · May 14, 2020

Hi,
I have set replication between two FreeNAS storage. I am pushing data from one to another, and everything seems to be OK until I got some weird error:

Error
I/O error
most recent snapshot of Replica/ZFS does not
match incremental source.
Logs
[2020/05/15 00:34:48] INFO [Thread-98] [zettarepl.paramiko.replication_task__task_7] Connected (version 2.0, client OpenSSH_8.0-hpn14v15)
[2020/05/15 00:34:48] INFO [Thread-98] [zettarepl.paramiko.replication_task__task_7] Authentication (publickey) successful!
[2020/05/15 00:34:57] INFO [replication_task__task_7] [zettarepl.replication.run] For replication task 'task_7': doing push from 'production/ZFS' to 'Replica/ZFS' of snapshot='auto-2020-05-15_00-00' incremental_base='auto-2020-05-14_00-00' receive_resume_token=None
[2020/05/15 00:34:57] INFO [replication_task__task_7] [zettarepl.paramiko.replication_task__task_7.sftp] [chan 67] Opened sftp connection (server version 3)
[2020/05/15 00:34:57] INFO [replication_task__task_7] [zettarepl.transport.ssh_netcat] Automatically chose connect address '134.107.107.245'
[2020/05/15 00:34:58] ERROR [replication_task__task_7] [zettarepl.replication.run] For task 'task_7' unhandled replication error SshNetcatExecException(ExecException(1, 'I/O error\n'), ExecException(1, 'most recent snapshot of Replica/ZFS does not\nmatch incremental source\n'))
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/zettarepl/replication/run.py", line 142, in run_replication_tasks
run_replication_task_part(replication_task, source_dataset, src_context, dst_context, observer)
File "/usr/local/lib/python3.7/site-packages/zettarepl/replication/run.py", line 203, in run_replication_task_part
... 6 more lines ...
ReplicationProcessRunner(process, monitor).run()
File "/usr/local/lib/python3.7/site-packages/zettarepl/replication/process_runner.py", line 33, in run
raise self.process_exception
File "/usr/local/lib/python3.7/site-packages/zettarepl/replication/process_runner.py", line 37, in _wait_process
self.replication_process.wait()
File "/usr/local/lib/python3.7/site-packages/zettarepl/transport/ssh_netcat.py", line 183, in wait
raise SshNetcatExecException(connect_exec_error, self.listen_exec_error) from None
zettarepl.transport.ssh_netcat.SshNetcatExecException: I/O error
most recent snapshot of Replica/ZFS does not
match incremental source

I do have snapshots which are created daily but shouldn't this replication backup everything? why sudden does not match incremental source?
and how can I fix this?

dovla091 · May 15, 2020

does this have to do that I have set automatic snapshot on a remote target storage, so the snapshot is done after the one which is being transferred from the primary storage?

MikeyG · May 20, 2020

Did you get any answers to this? What version are you on?

Apollo · May 20, 2020

dovla091 said:
does this have to do that I have set automatic snapshot on a remote target storage, so the snapshot is done after the one which is being transferred from the primary storage?

That is the most likely scenario.
On the remote, you can delete the snapshot which were created automaticaly.

dovla091 · May 21, 2020

On my case I erased everything and start from the scratch. I have made first manual snapshot on the remote server (secondary replication server as the replication is not going to run if there is none...) Then I have created automatic snapshot on the source server (master) and replication that will be shipped every day at 00:00h. I have used wizzard. P. S. Now everything works perfectly. The problem was most likely due to fact that if you create auto snapshot on the target server, it conflicts with the snapshot from the source that is copied to a target... BTW I use the latest version 11.3 if I recall

Kevin_Lemaire · Jun 1, 2020

Hi,

We are running FreeNAS 11.3 and we have the exact same issue.
The interesting thing is that in our case, we don't have snapshots enable on the remote host.

Any other ideas?

Apollo · Jun 2, 2020

I think, maybe some of the snapshot have expired and been destroyed on the source and the replication cannot find what was last pushed to the remote.

dovla091 · Jun 2, 2020

Kevin_Lemaire said:
Hi,

We are running FreeNAS 11.3 and we have the exact same issue.
The interesting thing is that in our case, we don't have snapshots enable on the remote host.

Any other ideas?

I have been running around to fix this issue, so after many attempts, I end with deleting all the snapshots on the source and then set a completely new one, and on the target set everything set up (SSH connection, generating SSH keys, and manual snapshot - only one on empty storage...) without any auto-snapshot..., so only 1 manual snapshot is created, as appliance required me to do so... after that everything worked as expected. What I believe is, that this appliance is missing some additional conditional statements that should do some pre-check, before you set replication on..., as sometimes you really end up with messy/partial replication on the secondary storage (like it was in my case). As Apollo suggested, it might be that some of snapshots are missing so it cannot figure out what was pushed and what is not, but what I also noticed when you need to move an insanely large amount of data (I am talking about approx 30TB of data) from which replication needs to be pushed to the secondary storage, if on the source appliance, some files are missing or being deleted by the users, I encountered with errors that those files cannot be moved as they does not exist on the source anymore... Again, I do not know why or how, but that was my case. So try to remove all snapshots and make if from scratch and set replication. I know that this means that you will loose possibility of "point in time recovery" but this is the risk you will need to take.

Cheers.

Kevin_Lemaire · Jun 4, 2020

dovla091 said:
I have been running around to fix this issue, so after many attempts, I end with deleting all the snapshots on the source and then set a completely new one, and on the target set everything set up (SSH connection, generating SSH keys, and manual snapshot - only one on empty storage...) without any auto-snapshot..., so only 1 manual snapshot is created, as appliance required me to do so... after that everything worked as expected. What I believe is, that this appliance is missing some additional conditional statements that should do some pre-check, before you set replication on..., as sometimes you really end up with messy/partial replication on the secondary storage (like it was in my case). As Apollo suggested, it might be that some of snapshots are missing so it cannot figure out what was pushed and what is not, but what I also noticed when you need to move an insanely large amount of data (I am talking about approx 30TB of data) from which replication needs to be pushed to the secondary storage, if on the source appliance, some files are missing or being deleted by the users, I encountered with errors that those files cannot be moved as they does not exist on the source anymore... Again, I do not know why or how, but that was my case. So try to remove all snapshots and make if from scratch and set replication. I know that this means that you will loose possibility of "point in time recovery" but this is the risk you will need to take.

Cheers.

OK, thank you for your reply. I'm unable to start from scratch currently but I consider your advice.

Additional information linked to @Apollo 's answer:
I now have the exact same error with a another dataset with a single snapshot. I did this test to be sure that the datas are identical on both side.

Important Announcement for the TrueNAS Community.

Replication error

dovla091

Cadet

dovla091

Cadet

MikeyG

Patron

Apollo

Wizard

dovla091

Cadet

Kevin_Lemaire

Cadet

Apollo

Wizard

dovla091

Cadet

Kevin_Lemaire

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

Replication error

Cadet

Cadet

Patron

Wizard

Cadet

Cadet

Wizard

Cadet

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Replication error"

Similar threads