I've got a freenas running freebsd 11.1-STABLE (althouth this happened on earlier versions as well) serving user directories over nfs3 to a cluster of linux machines (which are used as a compute cluster). Directories on the cluster are automounted, including home directories. When someone starts a job on many of the cluster machines at the same time (a fairly common occurence), a number of the jobs will fail to start saying they failed to mount the user's home directory. Checking on the server,
The address is close to mountd's, but doesn't exactly match anything in
I've tried increased the size of the accept queue (kern.ipc.soacceptqueue) from 128 to 1024 and rebooted the machine, but
(
How to I get mountd to either have a larger queue or to otherwise handle the request spikes from the cluster?
dmesg
will have an error like sonewconn: pcb 0xfffff801e5278570: Listen queue overflow: 193 already in queue awaiting acceptance (1 occurrences)
The address is close to mountd's, but doesn't exactly match anything in
lsof -iTCP -sTCP:LISTEN -P
when I check. netstat -s
says there were listen queue overflows in tcp, and watching with netstat -Lan
I can watch the listen queue for mountd go up to 183. (I'm forcing the nfs mount requests to be tcp, rather than udp, with proto=tcp on the clients.)I've tried increased the size of the accept queue (kern.ipc.soacceptqueue) from 128 to 1024 and rebooted the machine, but
netstat -Lan
shows that the limit is still 128 for mountd:Code:
tcp4 0/0/128 *.763 tcp6 0/0/128 *.763
(
rpcinfo -p
shows mountd is at port 763).How to I get mountd to either have a larger queue or to otherwise handle the request spikes from the cluster?