SOLVED Strange replication issue.

Status
Not open for further replies.

ornstedt

Dabbler
Joined
Apr 26, 2014
Messages
10
I hade motherboard failure on one of my two FreeNAS servers. Both replicate to each other and the other host keep creating snapshots to replicate and obviously complain of failing to replicate. However after I got the broken server fixed and replication could continue I kept getting these error messages about failure to replicate. However when doing zfs list -t all on both nodes I could tell that all of the latest snapshots where already replicated on both sides.

I also notices that the freenas:state we incorrect the snapshot from when the server broke was marked as the LATEST and all snapshots after that was marked as NEW. Even though they where replicated. I manually fixed this. However now I have no snapshots having any state.

I now get error messages where FreeNAS tries to replicate historical snapshots on the source server which has since long been deleted on the destination. Still all the latest snapshots has been replicated.

Any clue what is wrong?
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
What FreeNAS version(s) and has either been updated since it was working properly?
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
My motherboard broke during upgrade to the latest version. (9.3 not 10)
Is that 201508250051 or 201509022158, and what version were you on before the upgrade? A lot has been going on with replication since 9.3.1!
 

ornstedt

Dabbler
Joined
Apr 26, 2014
Messages
10
Here is my upgrade history. Don't know exactly when each train was released.

freenas-boot
freenas-boot/ROOT
freenas-boot/ROOT/FreeNAS-9.3-STABLE-201504152200
freenas-boot/ROOT/FreeNAS-9.3-STABLE-201505010007
freenas-boot/ROOT/FreeNAS-9.3-STABLE-201505100553
freenas-boot/ROOT/FreeNAS-9.3-STABLE-201505130355
freenas-boot/ROOT/FreeNAS-9.3-STABLE-201506232120
freenas-boot/ROOT/FreeNAS-9.3-STABLE-201506292332
freenas-boot/ROOT/FreeNAS-9.3-STABLE-201509022158
freenas-boot/ROOT/FreeNAS-9.3-STABLE-201509022158@2015-04-02-00:37:23
freenas-boot/ROOT/FreeNAS-9.3-STABLE-201509022158@2015-05-04-02:28:22
freenas-boot/ROOT/FreeNAS-9.3-STABLE-201509022158@2015-05-13-10:24:38
freenas-boot/ROOT/FreeNAS-9.3-STABLE-201509022158@2015-05-15-04:41:50
freenas-boot/ROOT/FreeNAS-9.3-STABLE-201509022158@2015-06-29-03:18:38
freenas-boot/ROOT/FreeNAS-9.3-STABLE-201509022158@2015-07-12-15:20:21
freenas-boot/ROOT/FreeNAS-9.3-STABLE-201509022158@2015-09-05-01:23:40
freenas-boot/ROOT/default
freenas-boot/grub
freenas-boot/grub@Pre-Upgrade-Wizard-2015-03-23_01:11:26
freenas-boot/grub@Pre-Upgrade-FreeNAS-9.3-STABLE-201503270027
freenas-boot/grub@Pre-Upgrade-FreeNAS-9.3-STABLE-201504100216
freenas-boot/grub@Pre-Upgrade-FreeNAS-9.3-STABLE-201504152200
freenas-boot/grub@Pre-Upgrade-FreeNAS-9.3-STABLE-201505010007
freenas-boot/grub@Pre-Upgrade-FreeNAS-9.3-STABLE-201505100553
freenas-boot/grub@Pre-Upgrade-FreeNAS-9.3-STABLE-201505130355
freenas-boot/grub@Pre-Upgrade-FreeNAS-9.3-STABLE-201506232120
freenas-boot/grub@Pre-Upgrade-FreeNAS-9.3-STABLE-201506292332
freenas-boot/grub@Pre-Upgrade-FreeNAS-9.3-STABLE-201509022158
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
It is certainly possible that snapshots created by the June release are handled badly by the August release. And you won't be getting any snapshots deleted on the sending server until the next (unreleased) release. So it is probably something like this that caused the problem. At the least you may have to delete a lot of snapshots manually to get the machines synchronised and get rid of undeletable snapshots. And the current replication task won't (deliberately this way) delete snapshots it did not replicate itself, so they will need pruning. If possible it might be best to restart the whole process, but I would certainly wait for the next update. The other thing is that the "latest snapshot replicated" in the GUI of the replication task is now meaningless, and will only show whatever was in the box the last time the June release was running. This does not affect which is the actual last snapshot replicated.

I am having difficulty with "both replicate to each other". Presumably none of the sending and receiving datasets are involved in replication in the opposite direction - this would be confusing?
 

mjs66

Cadet
Joined
Aug 12, 2015
Messages
3
I'm having similar issues since upgrading to FreeNAS-9.3-STABLE-201508250051. My replication job has been working fine up for over a year until the upgrade.

The process acts like its trying to start over again like it thinks the replica is new. I went ahead and removed all snapshots on both sides and let it start over from scratch but it kept failing. In looking at /var/log/messages it seems it was looking for the destination dataset to already exist on the receiving side but it wasn't there yet. I created it using zfs create pool0/rep, and the sending side immediately started sending the replica data. I thought all was good until it finally finished, then generated another failure and started trying to send the whole replica again from scratch.

I see there are a few new updates available so I'm trying those to see if it resolves the replication issues.
 

ornstedt

Dabbler
Joined
Apr 26, 2014
Messages
10
I have two separate replication sets. One in each direction. Both behave the same way.
 

ornstedt

Dabbler
Joined
Apr 26, 2014
Messages
10
After latest updates it is still not working correctly. I have got an old snapshot that is replicated but it stil tries to replicate it. The rest of the snapshots where purged. However this one is not possible to purge. Even if I try todo this manual.

~# zfs destroy Pool/Share@auto-20150904.0900-1m
cannot destroy snapshot Pool/Share@auto-20150904.0900-1m: dataset is busy

Same goes for the opposite dataset on my other FreeNAS node.

Any suggestion?
 

ornstedt

Dabbler
Joined
Apr 26, 2014
Messages
10
Seems like the trick was to release the freenas:repl hold on the snapshot!
~# zfs holds Pool/Share@auto-20150904.0900-1m
NAME TAG TIMESTAMP

Pool/Share@auto-20150904.0900-1m freenas:repl Fri Sep 4 9:00 2015

Then it got purged directly!
 
Status
Not open for further replies.
Top