ZFS Replication: Dataset becomes unavailable on receiving end after receiving snapshot.

gunnarsson · May 11, 2015

Hello all,

As the subject should suggest, I am having problems with ZFS replication where the dataset becomes unavailable on the receiving end once a new snapshot is received. If I connect via SSH and get a directory listing the dataset is not present or accessible.

The console shows errors along the line of the following, one msg per affected dataset every 10 seconds:

Code:

May 11 19:00:56 haugur2 collectd[8701]: statvfs(/mnt/vol1/cbdp/cbdp_photo/2014) failed: No such file or directory

If I reboot the box, all datasets are intact and up-to-date and everything seems to be working perfectly until the next snapshot is received and the datasets becomes unavailable again until next reboot.

I have been having this problem since I first set up my two boxes in September 2014 (9.2.8). Both boxes replicate data to the other and the data is shared via CIFS as read-only on the receiving end. Currently I am running FreeNAS-9.3-STABLE-201505040117.

It is an annoying and inconvenient problem as I would like to have the data accessible as read-only on the remote side but since I haven't had a lot of time available to put into this and since I know the data is getting replicated, I haven't done anything serious about it yet.

I have searched for information on the problem before but not found enough relevant information to put me on the right track to solving the problem. Initially I thought it might be a bug and if so then it would presumably get a fix soon enough but it doesn't look like a lot of other people are having this problem so now I am leaning more towards the explanation that I have made some sort of a configuration error.

Hardware: ASRock C2750D4I, 2x8GB ECC RAM, 3x 4TB WD RE4 (Raid-Z1)

I would appreciate any input and if more information is required to debug this, let me know.

dlavigne · May 17, 2015

Were you able to figure this out?

gunnarsson · May 17, 2015

Unfortunately, no. Although it is always lingering in my brain I can't say I've tried much though due to lack of time. I was hoping someone might come with suggestions that would put me on the track but I am aware that this might just be a freak case caused by me messing something up and thus potentially pretty difficult to figure out, so I understand it very well if people on here are not too keen on even starting to look at it :)

Initially when I set up my boxes I did a fair bit of fiddling around with permissions via the terminal by means of chmod instead of ACLs and it has occurred to me whether something I did then may have caused this. Most of the data on the boxes was copied from linux servers and I was trying to retain the users and permission structure from there, which in hindsight wasn't smart. After I discovered that there was a problem I thought it was a CIFS issue but after having tried to delete all the cifs shares and kill the cifs service I realised that was not the problem. Later I tried to create new datasets and followed the documentation as I possibly could, but ended up with the same results for those datasets. Ater reading some posts I started thinking whether manual replication has the same effect or not, but I haven't done any tests with that yet. I imagine I may have to start from square one with my boxes when I get the time to do so. I will probably get yet another box and start with that so I don't disrupt the replication / backup system I have in place already, since I at least know that the data is getting replicated ...it is just not possible to access it locally on the remote box.

In any case, thank you for responding dlavigne. If you have any ideas that you can throw me...where to look for clues etc, that would be much appreciated.

Sir.Robin · May 19, 2015

I might be mistaken, but i'm quite sure i have read that the filesystem will be unavailable while receiving.

Seems to me this could be your issue.
I also think that it is not supposed to work like you are doing it, sharing the datset beeing replicated on the pull side.
So samba will be sharing a dataset wich suddenly is unavailable.... ;)

Sent from my mobile using Tapatalk

gunnarsson · May 19, 2015

Thanks very much for your input Sir.Robin :)

Perhaps that is the case. How ever...it does not matter if CIFS (or any sharing of the datasets in question) is on or not. The datasets become inaccessible on the remote side (after the replication has been completed) even if I try and access it on the FreeNas machine that is carrying it, until after next reboot (or probably restarting some process which I haven't figured out how to do yet, or perhaps somehow re-mounting the dataset).

I would not expect the dataset to be accessible while it is being updated by the replicated snapshot on the remote machine, but I was thinking (or hoping) it should become available again after it has been updated. Maybe that's not so?
If the dataset should be available after it's been updated, then I feel like it shouldn't be a problem sharing it via CIFS or any other means as READ ONLY. I admit this is an assumption based on limited knowledge and understanding of it all.

Perhaps I will have to resort to different ways of cloning the data between the boxes, which I find a bit sad because I really love the idea of the automated replication in FreeNAS and how it works (apart from this problem of course). I guess I could implement the desired functionality myself , when I find the time to dig deeper into this.

Many thanks!

gunnarsson · Jul 26, 2015

Now over 3 months later I had some time to fiddle with this and actually found something out.

I went to the terminal of the machine on the receiving end of the replication and did:

Code:

zfs unmount /mnt/vol1/cbdp/

On this particular occasion the message I got back was:

Code:

cannot unmount '/mnt/vol1/cbdp': Device busy

Despite the "cannot unmount" error msg, immediately after this, the statvfs errors stopped and the dataset became available again after having been inaccessible following a successful replication.

Possibly a bug? ...or at least an incomplete feature?

philiplu · Oct 6, 2015

Forgive me necroing this post, but I ran into something very similar (with manual instead of GUI-based replication). My workaround was inspired by your most recent post - unmount the destination dataset before the replication, then remount afterwards. That avoids the collectd-statvfs error messages. See https://forums.freenas.org/index.php?threads/replication-causing-datasets-to-act-unmounted.38410/ for more info.

gunnarsson · Oct 11, 2015

Thanks philiplu :) , I appreciate you informing me that you have a similar problem and the work-around you came up with.
I haven't had time to look into this since I posted. When I have needed to access the data I have manually unmounted/remounted to be able to access the data. I imagine I will simply ditch the GUI way of setting up the replication, unless a fix comes before I get the time to sort it out. Your script will help me towards that. Thanks again!

Important Announcement for the TrueNAS Community.

ZFS Replication: Dataset becomes unavailable on receiving end after receiving snapshot.

gunnarsson

Cadet

Attachments

dlavigne

Guest

gunnarsson

Cadet

Sir.Robin

Guru

gunnarsson

Cadet

gunnarsson

Cadet

philiplu

Explorer

gunnarsson

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

ZFS Replication: Dataset becomes unavailable on receiving end after receiving snapshot.

gunnarsson

Cadet

Attachments

dlavigne

Guest

gunnarsson

Cadet

Sir.Robin

Guru

gunnarsson

Cadet

gunnarsson

Cadet

philiplu

Explorer

gunnarsson

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "ZFS Replication: Dataset becomes unavailable on receiving end after receiving snapshot."

Similar threads