Replication Issues

talsit

Cadet
Joined
Dec 1, 2016
Messages
1
I've got 2 almost identical machines, 2x Dell R520, both with 8 HDD + 1 SSD(for sys/boot). My main has 8x8TB, my backup has 8x4TB. They are connected via a point-to-point 10GigE link.

I am trying to replicated the data of my main to by backup on a fortnightly basis, so I have setup some replication tasks on my backup to pull the datasets from my main, using the SSH+NETCAT Transport. I have in total 7 replication tasks, one for each parent dataset on the main. The initial replication always works fine, there's no problem there. However, whenever it's time to do a new replication task, 2 weeks later, there's always some failure. Today it's these 2:
  1. destination <local_dataset_A> contains partially-complete state from "zfs receive -s".
  2. most recent snapshot of <local_dataset_B> does not match incremental source.
I have my replication tasks to start in 2 minute increments from each other, starting at the top of the hour. I always wait after all replication tasks are complete (+5 minutes) before shutting down my backup machine. On my main machine, all snapshot tasks have at least 2 months retention policy. I never go more than 2 weeks between backups.

Problems like these, especially (2.) above always happen no matter what. There is always at least one replication task that fails, and after hours of trying to fix it or make the replication work, I give up and start a replication again from scratch.

I don't know what I could try next, but any help into investigating what's going on would be great.
 
Top