NFS clients times out with big volumes

Status
Not open for further replies.

Oko

Contributor
Joined
Nov 30, 2013
Messages
132
I have two identically configured FreeNAS file server. The only difference is that one RAID-Z2 is 21TB while the other one is 7.2TB. I absolutely have not problem mounting entire 7.2 TB volume via NFS to my desktop. However when I try to mount the larger volume it mostly hangs (not always sometimes mounts). This is relevant info

This is 7.2 TB file server
Code:
 rpcinfo -p neill-zfs
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100000 4 7 111 portmapper
100000 3 7 111 portmapper
100000 2 7 111 portmapper
100005 1 udp 4002 mountd
100005 3 udp 4002 mountd
100005 1 tcp 4002 mountd
100005 3 tcp 4002 mountd
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100024 1 udp 4000 status
100024 1 tcp 4000 status
100021 0 udp 4001 nlockmgr
100021 0 tcp 4001 nlockmgr
100021 1 udp 4001 nlockmgr
100021 1 tcp 4001 nlockmgr
100021 3 udp 4001 nlockmgr
100021 3 tcp 4001 nlockmgr
100021 4 udp 4001 nlockmgr
100021 4 tcp 4001 nlockmgr


and

Code:
[root@loom /]# showmount -e neill-zfs
Export list for neill-zfs:
/mnt/zfsneill 10.8.0.0,192.168.6.0


while this is the info for the larger 21TB server.

Code:
[root@loom /]# rpcinfo -p gaia
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100000 4 7 111 portmapper
100000 3 7 111 portmapper
100000 2 7 111 portmapper
100005 1 udp 4002 mountd
100005 3 udp 4002 mountd
100005 1 tcp 4002 mountd
100005 3 tcp 4002 mountd
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100024 1 udp 4000 status
100024 1 tcp 4000 status
100021 0 udp 4001 nlockmgr
100021 0 tcp 4001 nlockmgr
100021 1 udp 4001 nlockmgr
100021 1 tcp 4001 nlockmgr
100021 3 udp 4001 nlockmgr
100021 3 tcp 4001 nlockmgr
100021 4 udp 4001 nlockmgr
100021 4 tcp 4001 nlockmgr


and

Code:
[root@loom /]# showmount -e gaia
Export list for gaia:
/mnt/zfsauton 10.8.0.0,192.168.6.0


Even the command showmount takes very long time on 21TB server.

Is this behavior to be expected due to the large volume size? Should I create smaller volumes? The NFS client machines are clones of RedHat.

Thank you.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
How about giving the forum rules a once over and provide some of that info?
 

Oko

Contributor
Joined
Nov 30, 2013
Messages
132
How about giving the forum rules a once over and provide some of that info?

Calm down pal. It is Friday evening and I just finished 14h shift. There is no reason for the hostility. If you own FreeNAS or this forum please let me know so that I can move on.

Edit: Hardware specifications for both servers can be found in this

http://forums.freenas.org/index.php?threads/8-disks-recommended-zfs-storage-pool.17893/#post-97003

thread. I am using enterprise level hardware. The bigger server has 10 GBit LAN controller and tons of RAM and CPUs. It is running 9.2.1 amd64. I am eagerly awaiting 9.2.2 release to update because LDAP is broken from me (need TLS). Hardware, DNS, and firewalls are definitely not the problem.
 

aufalien

Patron
Joined
Jul 25, 2013
Messages
374
I've seen similar issues with misbehaving DNS. I know you said DNS is not an issue but try something like;

dig @1.1.1.1 -x 2.2.2.2

... were 1.1.1.1 is one of your DNS servers and 2.2.2.2 is the host you wish to resolve.

Try from both FreeNAS servers for sh%ts and giggles. You can use nslookup as well but dig is your new best friend.

The size of your volume would have nothing to do with NFS time outs so don't devote any energy in that vein. NFS does to a reverse DNS lookup so look there perhaps?
 
  • Like
Reactions: Oko

Oko

Contributor
Joined
Nov 30, 2013
Messages
132
@aufalien Thank you so much for your kind answer. I just carefully went over the following thread

http://forums.freenas.org/index.php?threads/nfs-mount-times-out.7270/

which also indicates DNS problem. However both small and large file server use the same cluster of Unbound resolvers so I would not expect to see the difference in behavior. I started playing with "Host name database". I just read and you confirmed that FreeNAS expects the reverse DNS for clients and half of my clients are on VPN network (10.8.0.0) and do not have a valid DNS record on my DNS cluster. I can easily create those. The thing which bewilders me is why does neill-zfs have no problems with those VPN clients. I would also expect gaia to behave nice on 192.168.6.0/24 network where all client have valid DNS entries.

After creating smaller data set on the large file server it looks like at lest I have no problem anymore with the clients on the private network 192.168.6.0 which is also the location of the file server.

By the way I spent the whole day tcpdump-ing my network. That is way I was so confident that there were no firewall and DNS issue.
 

aufalien

Patron
Joined
Jul 25, 2013
Messages
374
@aufalien Thank you so much for your kind answer. I just carefully went over the following thread

http://forums.freenas.org/index.php?threads/nfs-mount-times-out.7270/

which also indicates DNS problem. However both small and large file server use the same cluster of Unbound resolvers so I would not expect to see the difference in behavior. I started playing with "Host name database". I just read somewhere that FreeNAS expects the reverse DNS for clients and half of my clients are on VPN network (10.8.0.0) and do not have a valid DNS record on my DNS cluster. I can easily create those. The thing is that neill-zfs has no problem with those VPN clients.

After creating smaller data set on the large file server it looks like at lest I have no problem anymore with the clients on the private network 192.168.6.0 which is also the location of the file server.

So wait, are you saying that with the same server, but a diff data set, a smaller one, that the NFS timeouts are no longer?
 

Oko

Contributor
Joined
Nov 30, 2013
Messages
132
Yes with smaller data sets I do not have anymore time outs on the 192.168.6.0 network (server is on that network). And even more amazingly I have no problem mounting smaller sever neill-zfs which is on 192.168.6.0/24 on the VPN clients 10.8.0.0/24 which do not have valid DNS entries on my Unbound cluster (I have actually 3 DNS servers). Note that gaia is also on 192.168.6.0/24.
 

Oko

Contributor
Joined
Nov 30, 2013
Messages
132
Solved!!! DNS crap. I just added records of VPN clients into the Host name data base of Gaia and it works like a charm now. Thank you so much aufalien. However it looks like it smaller data sets did help a bit or maybe after 16h I am just hallucinating.
 

aufalien

Patron
Joined
Jul 25, 2013
Messages
374
Solved!!! DNS crap. I just added records of VPN clients into the Host name data base of Gaia and it works like a charm now. Thank you so much aufalien. However it looks like it smaller data sets did help a bit or maybe after 16h I am just hallucinating.

Stoked you got it working. After a 14h shift and now fixing this, I think its time to tie one on, its margy time!
 
  • Like
Reactions: Oko
Status
Not open for further replies.
Top