Major replication issues - Oct-Nov 2015 builds

Status
Not open for further replies.

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Since the update in Mid-Oct 2015, I've been experiencing major issues with Replication. I've finally spent some time testing and I've confirmed that Replication is broken.

The bottom line is that it appears after a replication, the destination somehow gets unmounted, this causes the reporting collector to fail (collectd & statvfs console errors) and replication failed messages to be emailed. It appears that the replication does in fact work successfully, but these secondary effects are very bothersome. Furthermore, I've found that the replication issue also affect the creation of jails (if the replication happens at the same time the template is being downloaded and the jail creation is happening, the source jail dataset dissappears for a second and causes the "template not found" message. Disabling replication while creating the jail is a workaround that seems to work. Of course, once the jail is created, and replication is re-enabled, the failure emails continue. And once Jail snapshots are created, they can not be deleted. Oh, and there's the issue that one must use an encryption cipher to replicate (disabled, no longer works).

So far I've found that the following bugs all relate to these replication issues:
Bug #5293: Snapshot & Replication: backup pool on the Remote ZFS Volume/Dataset is empty (total loss of data).
Bug #12143: Replication sends incorrect "fail" emails and doesn't update last snapshot sent in GUI
Bug #12252: "Cannot destroy <long snapshot name>: snapshot has dependent clones use '-R' to destroy the following datasets"
Bug #12379: Replication fails with cipher set to Disabled

Steps to recreate
1. Create a pool called TestSource
2. Create a pool called TestDestination.
3. Create 3 datasets (A, B, C) in TestSource
4. Set up a snapshot task for TestSource (recursive, 5 minutes, keep for 2 hours)
5. Set up a replication task from TestSource to TestDestination on 127.0.0.1 using standard compression and fast encryption (disabled cipher won't work).
6. In 5 minutes TestDestination will show datasets A,B,C. In 10 minutes they will still show up in the storage pane, but if you 'ls /mnt/TestDestination' the results will be empty.
7. If you now reboot, the TestDestination A,B,C datasets get added to the "Partition" tab under Reporting and in the console you'll see "collectd 48717: statvfs(/mnt/TestDestination/A) failed: No such file or directory"
8. try to create a jail, likely the source dataset will get stomped on during replication causing the template to be lost. Disable replication, and then the jail can be created fine. Re-enable replication.
9. Once the jail is created and gets replicated, email "replication failed" alerts start appearing for the jail datasets.
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
That seems like it belongs in a bug report. If you file one, please make a note of it here.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
You mean like the 4 bug reports that I linked to? :smile: Maybe the links don't show up in tapatalk?
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874

jamiejunk

Contributor
Joined
Jan 13, 2013
Messages
134
Replication in freenas always seems to be hit or miss. Mine just out of the blue stopped working about two weeks ago. No changes to the system. Just doesn't want to do it anymore apparently. :(
 

raymonvdm

Dabbler
Joined
Aug 24, 2011
Messages
14
I also found this topic due to issues with ZFS sync. I just stopped working after i moved the FreeNAS server to another room. ( I think it was syncing when i shut it down for the move)

I have three syncs setup and one of the is working again after remove the holds but the other two are not. I also noticed that the error of one sync entry is shown by both entries.

FreeNAS_Sync.png


I think i know how it should be working but it is not

Note: I`m currently running version "FreeNAS-9.3-STABLE-201511280648"
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
So if you are on the 1128 build, there is a chance the replication is going on in the background and the status message is erroneous (https://bugs.freenas.org/issues/12568). Check the network traffic of both boxes and the dataset usage on the PULL system.
 

Blake1970

Dabbler
Joined
Nov 2, 2014
Messages
12
Does anyone know the status of the Replication Gui issue? I see in systat -ifstat that it is replicating, I also see it is replicating all child snaps too. I hope this is fixed since I upgraded to 0648 and change the job not to replicate any child datasets even know I have none under the replicated dataset. It seems all the snaps are being replicated though. When it finishes I will test to see if the next snaps replicate only the newest snap with the setting changes.
 

adrianwi

Guru
Joined
Oct 15, 2013
Messages
1,231
I'm so relieved I rolled back to the last 9.3 release after hitting issues with the replication and LSI driver in the first 9.3.1 version. 20150629 is a long time for these still not to be resolved, and with delays to 10 I suspect there's little focus on getting 9.3.1 issues sorted.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Does anyone know the status of the Replication Gui issue? I see in systat -ifstat that it is replicating, I also see it is replicating all child snaps too. I hope this is fixed since I upgraded to 0648 and change the job not to replicate any child datasets even know I have none under the replicated dataset. It seems all the snaps are being replicated though. When it finishes I will test to see if the next snaps replicate only the newest snap with the setting changes.
It's ready for release:
https://bugs.freenas.org/issues/12568
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Looks like the update was just released today.
 

Blake1970

Dabbler
Joined
Nov 2, 2014
Messages
12
Thanks. I have a quick question for you on replication. When you setup replication on the push side you choose the push data set then you specify the pull (destination) Volume/dataset and it is suppose to create the destination dataset, Does your implementation work like that or do you need to create the destination on the pull side first?

"Example"

source: volume_1/data/test
Destination: volume_1/replication

This should result in a destination of: volume_1/replication/test (Without creating destination) Mine is saying dataset doesn't exist and I have to create it manually. However, I know it is suppose to create it for you and I think this is why m,y permissions are not following the replication dataset.

Thoughts?
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
The destination you specify needs to exist, but sub datasets will be created. If your source dataset was volume_1/data, then a test subfolder (or dataset) will be created.
 

Blake1970

Dabbler
Joined
Nov 2, 2014
Messages
12
So you are saying that it should have created volume_1/replication/test and it does not. I have to create it in order for it to work and then permissions are not following. I think the issues are realted but have no idea on how to fix the replication not auto-creating the destination dataset.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
No, it's not going to create the a sub dataset for what you are trying to do. It doesn't know that you want "test" in a sub dataset. You told it you want "Test" mapped and replicated to "Destination/something that does not exist", then that's what it will do. Think of it this way: You are mapping "source" to "destination". So whatever is in "source" will replicate to "destination". There isn't any magic here.
 

Blake1970

Dabbler
Joined
Nov 2, 2014
Messages
12
Sorry, I failed to mention the replicated exists.

source: volume_1/data/test
Destination: volume_1/replicated (This exists)

Therefor I should end up with a destination like this: volume_1/replicated/test

Freenas should auto create the "test" sub dataset under replicated.

It used to do this so I am not sure if a update broke it or what.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Have you tried with a trailing slash? volume_1/replicated/

I can't see how it would know to create a sub dataset vs replicating directly into that location. Unless you chose /volume/data/ as the source, in which case I could see test being created on Pull.
 

Blake1970

Dabbler
Joined
Nov 2, 2014
Messages
12
Yes I have tried. The reason it should work and has worked in the past is that you are replicating datasets not data. When I specify a destination dataset it is suppose to create a sub dataset where I have specified.

I should be able to create a sub dataset called "test" under my source "source_volume_1/dataset1". Then setup periodic snap and replication job as follows:

Replicate this: source_volume_1/dataset1/test

To this: destination_volume_1/dataset2 (As long as dataset2 exists on the destination it should auto create "test")

which should give you a destination dataset of "destination_volume_1/dataset2/test"

Everything works though if I create it manually, however, I think this is why my permissions are getting over written and do not follow. I end up with permissions like 10955 and 10500 on the dataset, snaps and files.

How can I get someone from developement to look at this? What is the process to bring to there attention?

Thanks
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
I think those are 2 different issues. The permissions are probably due to the ID mapping. The datasets, I'm not sure.

Open a bug report (link at the top of this page) to report an issue.
 
Status
Not open for further replies.
Top