Failing Replication

Jun 2, 2016
I have hourly snapshots replicating to an external drive that I rotate offsite. These snapshots are kept for two weeks.

Sometimes, after a long period between rotations of the offsite drive, full replication requires moving a lot of data to a "new" external drive. This is often (but not always) accompanied by replication failure where a snapshot appears to nearly complete replication (it gets to 99%) but then it abruptly fails. Almost always, the failing snapshot is one at the oldest end of the snapshot date range.

My best guess is that expired snapshots are deleted once daily based on date (i.e. not time) and that active replication prevents deletion of the snapshot (i.e. replication occurring past midnight of a snapshot due to be deleted the day after the replication began prevents said deletion). Failed deletion leaves the snapshot intact, so FreeNAS continuously tries to replicate the snapshot. But something about the attempted deletion must make it impossible for the snapshot to be successfully replicated and this leads to a very irritating looping failure from which the only apparent escape is to disable said replication task, wait for any active replication to fail, delete the expired snapshot, and restart the replication task.

Any suggestions? Does this theory hold water? Is this a bug worth filing with the developers?
