Replication doesn't like me: datasets unmount no matter what!

JustinOtherBobo · Nov 23, 2018

Using Freenas 11.1-U6 I am seeding a backup by creating two pools on the same system and using zfs send | receive to make them identical.

The remote pool then goes into a new system and freenas replication is configured to run back to the receiving pool.

Once replication starts the datasets in the receiving system give

Code:

collectd[92325]: statvfs (/mountpoint) failed: No such file or directory errors.

At this point the datasets become invisible and look empty when accessed via CLI.

Snapshots remain accessible via GUI and can be shared out in case of need.

If the pool is exported and re-imported--a couple of times--the "disappeared" datasets become accessible once again... but only untill the next replication run. zfs mount shows the datasets as mounted.

What am I doing wrong and how do I fix?

JustinOtherBobo · Nov 26, 2018

Longer version:

Freenas-11.1-U6 fresh installs on older dual xeon hp workstations w/32+GB of RAM and mirrored pools ranging from 6 to 15TB usable. Compression is on.

Replication repeatedly renders the received destination pool invisible. zfs mount shows the datasets as mounted; but navigating to them via CLI doesn't show any files in them. and no .zfs folder. However access to the received snapshots through the GUI remains available.

To confirm the issue I repeatedly exported and imported the receiving pool (including a CLI pool rename in between) and ended up with the receiving pool fully operational and all my datasets mounted and accessible.

Then I set up a single recursive snapshot task for a dataset that is native to the receiving pool and not part of the replication.

All files remained visible.

12 hours later once replication updates had taken place and new snapshots had arrived, all the previously visible and accessible replication datasets started giving "

Code:

collectd[92325]: statvfs (/mountpoint) failed: No such file or directory

" errors.

Navigating to the mountpoint became fruitless as nothing showed there. ls -al no longer shows the existing files and trying to cd to .zfs gets you the message that .zfs doesn't exit in the affected folders!

The file structure remains visible when clicking on Storage on the gui. Using the Gui to detach and re-attach the volume temporarily fixes the issue.

Thankfully the snapshots remain available via the gui and can still be cloned and shared out to give access to the files in case of an emergency

The receiving pool structure looks like this:

Code:

poolpull
--poolpull
----pushdata-replicated-to-poolpull
--------pushdata-files-and-other-datasets-and-folders-replicated-recursively-from-poolpush/pushdata
----pulldata(native)
--------pulldata-native-files-and-other-datasets-and-folders

The seeded datasets contain manual snapshots with hold tags because they are capturing retention points going back to 2015. The issue also exists with new datasets created and replicated without prior seeding.

Since disappearing data sets doesn't sound "right", I assume I must be missing or messing up something!

H e l p ?

dlavigne · Nov 29, 2018

Were you able to resolve this?

JustinOtherBobo · Nov 30, 2018

Not really. At least not to a fully satisfactory/I feel confident level!

At this point the issue isn't happening.

That's after all five machines had all their snapshot and replication tasks deleted. Machines rebooted. Minimal snapshot schedules created while being careful to ensure that no replicated snapshots intermingle with non replicated ones. replications re-created and re-started.

I've been running a bit ragged and wasn't touching what wasn't complaining.

However at your question of whether this has resolved, I've just cloned a random backup snapshot on the main receiving server and shared it out as if I were performing a file recovery.

I am planning to wait and see if this "active share in the receiving dataset" interacts with a receiving snapshot schedule and starts the disappearing thing again.

I am fully open to any ideas as I believe but I am not sure that this may all be related to the 4 year old+ ACL corruption bug reports?

As mentioned would love to hear from anyone who has experienced and resolved this, and I see someone below also presenting a similar issue...

JustinOtherBobo · Dec 1, 2018

So this is to confirm that the "datasets" won't mount behavior is "by design" to protect the replicated dataset according to the notes in section 8.3 of the Freenas manual.

I would think we could just allow replication to delete any changes that had been made to the dataset getting replicated; but I am sure that other people would then complain about changes to the receiving side being lost

However, I can ALSO confirm that a CLONED and shared out snapshot from a dataset that is getting replicated will ALSO act un-mounted after a while (once a follow up incremental replication takes place)

While the replication doesn't seem to be affected by the errors... one would still have a bit of a hard time fully using this cloned snapshot long term.

Important Announcement for the TrueNAS Community.

Replication doesn't like me: datasets unmount no matter what!

JustinOtherBobo

Dabbler

JustinOtherBobo

Dabbler

dlavigne

Guest

JustinOtherBobo

Dabbler

JustinOtherBobo

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Replication doesn't like me: datasets unmount no matter what!

JustinOtherBobo

Dabbler

JustinOtherBobo

Dabbler

dlavigne

Guest

JustinOtherBobo

Dabbler

JustinOtherBobo

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Replication doesn't like me: datasets unmount no matter what!"

Similar threads