NFS Stuck under load

Joined
Jan 30, 2017
Messages
6
Hi,
We are running the following version and is ONLY used for NFS (no other services are enabled)
------------------
Build FreeNAS-9.3-STABLE-201502142001
Platform Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
Memory 131001MB
System Time Mon Jan 30 12:14:18 IST 2017
Uptime 12:14PM up 324 days, 16:40, 1 user
Load Average 1.39, 1.17, 1.15
------------------

Once in while, the nfs mount on random clients (Ubuntu 12.04) side "gets stuck" i.e. 'ls' or any other filesystem command to the nfs mounted filesystem becomes inaccessible.
There are no errors in the logs on the client side, simply the filesystem becomes inaccessible, the only way to resolve (and remount the filesystem) is by rebooting the client.

The issue appears to be (yet not 100% confirmed) when the client is under heavy load, not necessarily writing or reading this specific filesystem.

The number of NFSD threads configured on the FreeNAS is twice the number of CPUs in the FreeNAS Server (I found in documentation this should be no higher than the number of CPUs, yet this seems a little strange to me).

I would appreciate any help in debugging this painful issue.

Thanks
Laurence
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Hi,
We are running the following version and is ONLY used for NFS (no other services are enabled)
------------------
Build FreeNAS-9.3-STABLE-201502142001
Platform Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
Memory 131001MB
System Time Mon Jan 30 12:14:18 IST 2017
Uptime 12:14PM up 324 days, 16:40, 1 user
Load Average 1.39, 1.17, 1.15
------------------

Once in while, the nfs mount on random clients (Ubuntu 12.04) side "gets stuck" i.e. 'ls' or any other filesystem command to the nfs mounted filesystem becomes inaccessible.
There are no errors in the logs on the client side, simply the filesystem becomes inaccessible, the only way to resolve (and remount the filesystem) is by rebooting the client.

The issue appears to be (yet not 100% confirmed) when the client is under heavy load, not necessarily writing or reading this specific filesystem.

The number of NFSD threads configured on the FreeNAS is twice the number of CPUs in the FreeNAS Server (I found in documentation this should be no higher than the number of CPUs, yet this seems a little strange to me).

I would appreciate any help in debugging this painful issue.

Thanks
Laurence
Welcome to the forums!

I found these two related bug reports; they're at least tangentially related to the problem you're having:

NFSv4 stops working, cannot kill nfsd
NFS gets stuck if you reboot FreeNAS while NFS Server is Running

Have you considered simply upgrading FreeNAS? The version you're using is nearly two years old.

Good luck!
 
Joined
Jan 30, 2017
Messages
6
Thanks for the reply, however these bugs appear to refer to issues on the server side, all clients hang or are unable to access the filesystem.
In my situation, only 1 or 2 clients have issues and all the other clients continue to work with no interruption.

Upgrading to a later version is possible, but disruptive. Really not sure it will solve any problems.

Regards
Laurence
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Thanks for the reply, however these bugs appear to refer to issues on the server side, all clients hang or are unable to access the filesystem.
In my situation, only 1 or 2 clients have issues and all the other clients continue to work with no interruption.

Upgrading to a later version is possible, but disruptive. Really not sure it will solve any problems.

Regards
Laurence
Frustrating! I assume it's not always the same 1 or 2 clients? i.e., the few clients for whom NFS lock ups are random?
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Totally random, sometimes only 1 client and sometimes 2, but never the same ones...
Heh. I blame my grey hair on random, intermittent problems like these! Well... my daughters may be partially to blame as well. :D

Are there any reports of random NFS connection failures on the Ubuntu side of things? Wouldn't hurt to investigate the client side.
 
Joined
Jan 30, 2017
Messages
6
: )

We've been investigating the client side for months, not a single log indicating an issue, not even "server is xxxx is not responding" absolutely nothing.
Simply the mount is inaccessible and the clients needs to be rebooted.

Are there logs on the Server that can be enabled?
 

snaptec

Guru
Joined
Nov 30, 2015
Messages
502
If you have these problems, is there a high io_wait? How is your network setup?

I got the problem that under heavy load clients need really long to do ls, but they did after a couple of minutes. Even when the freenas and network should still have free resources for that.
F.e. Gbe Network at 600-700 MBit, freenas CPU still headroom.
When you did that ls directly on the FN it worked instant.

The problem was gone when I upgraded to 10gb Ethernet.
Now even when I have 70-80% bandwidth used everything works as it shouldand yeah, no io_wait anymore.

I've been on 9.3 where I had that problem with only Linux clients through nfs.


Gesendet von iPhone mit Tapatalk
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
: )

We've been investigating the client side for months, not a single log indicating an issue, not even "server is xxxx is not responding" absolutely nothing.
Simply the mount is inaccessible and the clients needs to be rebooted.

Are there logs on the Server that can be enabled?
Not that I know of via the GUI interface.
 
Joined
Jan 30, 2017
Messages
6
We have 1Gbit Network, yet some of the clients sit on a 10Gbit network and access the FN via a 1Gbit switch connected to the 10Gbit network.
Only the clients on the 10Gbit network have this issue, however only those clients have high work loads.
Our read is approx 30Mbytes per second, write is around 5Mbytes per second.

Even when this issue occurs, all the other clients continue to function normally with no performance problems.
 

snaptec

Guru
Joined
Nov 30, 2015
Messages
502
I also had no problems with other clients.
When it happen next time have a look at io_wait values on the Linux side. Maybe that give a hint.


Gesendet von iPhone mit Tapatalk
 

nitrobass24

Dabbler
Joined
Apr 25, 2017
Messages
19
Did you ever make any headway with this issue? I am dealing with something similar, but in my case I am on 9.10 and it happens on all of my clients, ESX, CentOS, Ubuntu at the same time.
 
Joined
Jan 30, 2017
Messages
6
Unfortunately not. We have just purchased new hardware (SSD + 2x10Gb NIC) for the FreeNAS, installing the latest (stable) version and hoping to migrate in the next 10 days.
 

millerdc

Cadet
Joined
May 1, 2015
Messages
6
One thing I had to do with my FreeNAS system was add two system tunables that helped with NFS traffic. I also have a TrueNAS z30 and noticed I did not have the same issues with my NFS clients. So I tried to see what was different from what I setup compared to what iXsystems configured. I noticed two sysctl values set on the TrueNAS that were not on the FreeNAS.

# increase NFSd cache flooding value
type: sysctl variable: vfs.nfsd.tcphighwater value: 100000

# timeout nfs cache sooner
type: sysctl variable: vfs.nfsd.tcpcachetimeo value: 300

Most of my 30 plus NFS clients are mounting via NFSv4. I have had both the TrueNAS and FreeNAS systems for over two years now. They have been mostly problem free. Back in March I updated both to the latest 9.10 stable train. So far so good.
 

The Hobbyist

Cadet
Joined
Jun 19, 2017
Messages
9
I just wanted to log in and confirm that millerdc's answer resolved our issue as well. Hopefully it helps someone else as well.
 
Top