ZFS Replication apply on target causes NFS failure

firsway

Dabbler
Joined
Oct 20, 2018
Messages
32
Hi,

So I don't know if anyone else does this, but I have set up a replication of a Dataset that is used as a Plex Media Server library, both at the source and the target end.
Source and target are running FreeNAS 11.3-U4.1.
The target Plex Media Server is installed on an Ubuntu 18.04.5 Server VM with mounts set up pointing back to the (replication) target FreeNAS dataset presented as an NFS V3 share.
The mounts from Ubuntu work just fine initially - Plex can quite happily pick up the content and it can be played. The ZFS replication works just fine. However, during the ZFS replication cycle, the mounts on Ubuntu then stop working, although they still exist.
I've since worked out that the only way I can recover this situation so far, is to restart the NFS services on the FreeNAS target.
I've also tried changing the target dataset to readonly=off and ensuring the replication doesn't set readonly=on, but this makes no difference.

Does anyone know if there is a way to stop this from happening? Thanks

[edit.. a bit more info]

Here's what I get when I try to remount the shares, having force unmounted them;

andrew@firsplex01:/mnt$ sudo mount -av
/ : ignored
none : ignored
/mnt/Plex_Media_Files : already mounted
mount.nfs: trying text-based options 'nolock,bg,soft,intr,tcp,timeo=50,retrans=5,actimeo=10,retry=5,vers=4.2,addr=192.168.5.201,clientaddr=192.168.5.5'
mount.nfs: mount(2): Protocol not supported
mount.nfs: trying text-based options 'nolock,bg,soft,intr,tcp,timeo=50,retrans=5,actimeo=10,retry=5,vers=4.1,addr=192.168.5.201,clientaddr=192.168.5.5'
mount.nfs: mount(2): Protocol not supported
mount.nfs: trying text-based options 'nolock,bg,soft,intr,tcp,timeo=50,retrans=5,actimeo=10,retry=5,vers=4.0,addr=192.168.5.201,clientaddr=192.168.5.5'
mount.nfs: mount(2): Protocol not supported
mount.nfs: trying text-based options 'nolock,bg,soft,intr,tcp,timeo=50,retrans=5,actimeo=10,retry=5,addr=192.168.5.201'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 192.168.5.201 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=6
mount.nfs: trying 192.168.5.201 prog 100005 vers 3 prot TCP port 695
mount.nfs: mount(2): Stale file handle
mount.nfs: backgrounding "192.168.5.201:/mnt/vol1/Media/Movies"
mount.nfs: mount options: "ro,noatime,nolock,bg,soft,intr,tcp,timeo=50,retrans=5,actimeo=10,retry=5"
/mnt/Plex_Media_Movies : successfully mounted
mount.nfs: trying text-based options 'nolock,bg,soft,intr,tcp,timeo=50,retrans=5,actimeo=10,retry=5,vers=4.2,addr=192.168.5.201,clientaddr=192.168.5.5'
mount.nfs: mount(2): Protocol not supported
mount.nfs: trying text-based options 'nolock,bg,soft,intr,tcp,timeo=50,retrans=5,actimeo=10,retry=5,vers=4.1,addr=192.168.5.201,clientaddr=192.168.5.5'
mount.nfs: mount(2): Protocol not supported
mount.nfs: trying text-based options 'nolock,bg,soft,intr,tcp,timeo=50,retrans=5,actimeo=10,retry=5,vers=4.0,addr=192.168.5.201,clientaddr=192.168.5.5'
mount.nfs: mount(2): Protocol not supported
mount.nfs: trying text-based options 'nolock,bg,soft,intr,tcp,timeo=50,retrans=5,actimeo=10,retry=5,addr=192.168.5.201'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 192.168.5.201 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=6
mount.nfs: trying 192.168.5.201 prog 100005 vers 3 prot TCP port 695
mount.nfs: mount(2): Stale file handle
mount.nfs: backgrounding "192.168.5.201:/mnt/vol1/Media/Music"
mount.nfs: mount options: "ro,noatime,nolock,bg,soft,intr,tcp,timeo=50,retrans=5,actimeo=10,retry=5"
/mnt/Plex_Media_Music : successfully mounted
mount.nfs: trying text-based options 'nolock,bg,soft,intr,tcp,timeo=50,retrans=5,actimeo=10,retry=5,vers=4.2,addr=192.168.5.201,clientaddr=192.168.5.5'
mount.nfs: mount(2): Protocol not supported
mount.nfs: trying text-based options 'nolock,bg,soft,intr,tcp,timeo=50,retrans=5,actimeo=10,retry=5,vers=4.1,addr=192.168.5.201,clientaddr=192.168.5.5'
mount.nfs: mount(2): Protocol not supported
mount.nfs: trying text-based options 'nolock,bg,soft,intr,tcp,timeo=50,retrans=5,actimeo=10,retry=5,vers=4.0,addr=192.168.5.201,clientaddr=192.168.5.5'
mount.nfs: mount(2): Protocol not supported
mount.nfs: trying text-based options 'nolock,bg,soft,intr,tcp,timeo=50,retrans=5,actimeo=10,retry=5,addr=192.168.5.201'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 192.168.5.201 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=6
mount.nfs: trying 192.168.5.201 prog 100005 vers 3 prot TCP port 695
mount.nfs: mount(2): Stale file handle
mount.nfs: backgrounding "192.168.5.201:/mnt/vol1/Media/TV_Shows"
mount.nfs: mount options: "ro,noatime,nolock,bg,soft,intr,tcp,timeo=50,retrans=5,actimeo=10,retry=5"
/mnt/Plex_Media_TV_Shows : successfully mounted

The issue I believe is the "Stale file handle" errors that are in the log above.
Note also, that when I switch mountd logging on at the FreeNAS end, it comes out with the following;

Aug 30 19:06:14 freenas2 mountd[83635]: mount request succeeded from 192.168.5.5 for /mnt/vol1/Media/Movies
Aug 30 19:06:14 freenas2 mountd[83635]: mount request succeeded from 192.168.5.5 for /mnt/vol1/Media/Music
Aug 30 19:06:14 freenas2 mountd[83635]: mount request succeeded from 192.168.5.5 for /mnt/vol1/Media/TV_Shows

But when I subsequently try to unmount, I get;

andrew@firsplex01:/mnt$ sudo umount -f /mnt/Plex_Media_Movies
umount: /mnt/Plex_Media_Movies: not mounted.

That would suggest that FreeNAS is reporting a successful NFS mount operation, when in reality it failed (owing to the Stale File Handle)


Best Regards


Andy
 
Last edited:

Alecmascot

Guru
Joined
Mar 18, 2014
Messages
1,177
I have a very similar issue to you.
I have a security camera recording via NFS to a dataset on one pool.
If I do a simple local replication to a second local pool then the camera looses its connection and is a pain to get recording again, often needing to reformat its recording dataset.
The next time I try this I will look at the NFS log.
 

firsway

Dabbler
Joined
Oct 20, 2018
Messages
32
I have a very similar issue to you.
I have a security camera recording via NFS to a dataset on one pool.
If I do a simple local replication to a second local pool then the camera looses its connection and is a pain to get recording again, often needing to reformat its recording dataset.
The next time I try this I will look at the NFS log.
Thanks for this. Meanwhile I switched over to using Rsync, which whilst slower and more convuluted, at least does not cause this particular issue.
Has anyone else seen this issue? Does it sound like expected operation, or should I log it as a bug?
 

mingusrock

Cadet
Joined
Nov 3, 2020
Messages
1
Thanks both, I am experiencing very similar issues. I'm not sure enough of the exact chain of events that causes it but it looks like the zfs filesystem on the sync target doesn't always get remounted successfully after the snapshot has been replayed. zfs /thinks/ it's mounted.... but it's not:

Code:
fstore2# zfs list fstore2-pool1/nfsv4_root/packages

NAME                                USED  AVAIL  REFER  MOUNTPOINT
fstore2-pool1/nfsv4_root/packages 4.38M 30.7T 876K /mnt/fstore2-pool1/nfsv4_root/packages # NOT REALLY MOUNTED

fstore2# zfs mount fstore2-pool1/nfsv4_root/packages # NOW IT'S MOUNTED

fstore2# zfs mount fstore2-pool1/nfsv4_root/packages # DO IT AGAIN  ;-)
cannot mount 'fstore2-pool1/nfsv4_root/packages': filesystem already mounted


And the snippet from the zettarepl.log is below:

Code:
[2020/11/02 18:00:01] DEBUG    [replication_task__task_7] [zettarepl.transport.base_ssh.root@fstore2.foo.foofoo.lan.shell.2843.async_exec.126811] Running ['zfs', 'umount', 'fstore2-pool1/nfsv4_root/packages']
[2020/11/02 18:00:01] DEBUG    [replication_task__task_7] [zettarepl.paramiko.replication_task__task_7] [chan 4] Max packet in: 32768 bytes
[2020/11/02 18:00:01] DEBUG    [Thread-7916] [zettarepl.paramiko.replication_task__task_7] [chan 4] Max packet out: 32768 bytes
[2020/11/02 18:00:01] DEBUG    [Thread-7916] [zettarepl.paramiko.replication_task__task_7] Secsh channel 4 opened.
[2020/11/02 18:00:01] DEBUG    [Thread-7916] [zettarepl.paramiko.replication_task__task_7] [chan 4] Sesch channel 4 request ok
[2020/11/02 18:00:01] DEBUG    [replication_task__task_7] [zettarepl.transport.base_ssh.root@fstore2.foo.foofoo.lan.shell.2843.async_exec.126811] Reading stdout
[2020/11/02 18:00:01] DEBUG    [Thread-7916] [zettarepl.paramiko.replication_task__task_7] [chan 4] EOF received (4)
[2020/11/02 18:00:01] DEBUG    [replication_task__task_7] [zettarepl.transport.base_ssh.root@fstore2.foo.foofoo.lan.shell.2843.async_exec.126811] Waiting for exit status
[2020/11/02 18:00:01] DEBUG    [replication_task__task_7] [zettarepl.transport.base_ssh.root@fstore2.foo.foofoo.lan.shell.2843.async_exec.126811] Success: ''


[2020/11/02 19:00:01] DEBUG    [replication_task__task_7] [zettarepl.transport.base_ssh.root@fstore2.foo.foofoo.lan.shell.2845.async_exec.126828] Running ['zfs', 'umount', 'fstore2-pool1/nfsv4_root/packages']
[2020/11/02 19:00:01] DEBUG    [replication_task__task_7] [zettarepl.paramiko.replication_task__task_7] [chan 4] Max packet in: 32768 bytes
[2020/11/02 19:00:01] DEBUG    [Thread-7921] [zettarepl.paramiko.replication_task__task_7] [chan 4] Max packet out: 32768 bytes
[2020/11/02 19:00:01] DEBUG    [Thread-7921] [zettarepl.paramiko.replication_task__task_7] Secsh channel 4 opened.
[2020/11/02 19:00:01] DEBUG    [Thread-7921] [zettarepl.paramiko.replication_task__task_7] [chan 4] Sesch channel 4 request ok
[2020/11/02 19:00:01] DEBUG    [replication_task__task_7] [zettarepl.transport.base_ssh.root@fstore2.foo.foofoo.lan.shell.2845.async_exec.126828] Reading stdout
[2020/11/02 19:00:01] DEBUG    [Thread-7921] [zettarepl.paramiko.replication_task__task_7] [chan 4] EOF received (4)
[2020/11/02 19:00:01] DEBUG    [replication_task__task_7] [zettarepl.transport.base_ssh.root@fstore2.foo.foofoo.lan.shell.2845.async_exec.126828] Waiting for exit status
[2020/11/02 19:00:01] DEBUG    [replication_task__task_7] [zettarepl.transport.base_ssh.root@fstore2.foo.foofoo.lan.shell.2845.async_exec.126828] Error 1: "cannot unmount 'fstore2-pool1/....kages': not currently mounted\n"


The fix is to disable/enable the NFS share, I think either with or without the manual zfs mount alongside (so something in the enable seems to kick something even though zfs reports that all is fine). Like you I am now resorting to rsync, which is a shame.

I think the above is almost enough to start a bug report with, if anyone has more to add feel free! :smile:
 
Top