nfsd thread stuck

Status
Not open for further replies.
Joined
May 19, 2017
Messages
6
I'm running 9.10.2-U3 serving nfs to a couple dozen linux clients over a 10Gb (copper) network. Under heavy load, all the clients seem to get stuck when doing file moves (rename) and on the file server I see a single nfsd thread eating 100% cpu for many (>10) minutes and none of the other nfsd threads doing much.

When the workload is running normally, I see many nfsd service threads running and using up to 40% cpu at times.

There are no errors in dmesg, var/log/messages, etc..

specs:
2x E5-2620 v2 256GB, Intel 10Gb NIC, 2x LSI 9207, 1x LSI 9300 (linux clients are mostly dual socket machines with intel NICs running centos6/7).
2 zfs pools:
72 1TB Micro SSD - as 3 backplanes controlled by the 2x LSI 9207 controllers (7x raid z1 of 10 drives + 2 spares)
44 6TB Seagate - as 2 backplanes controlled by the LSI 9300 (4x raid z1 of 10 drives + 4 spares)

nfs is serving one filesystem off the ssd pool, and 3 filesystems off the spinning drives.

The issue happens on the ssd pool. The other pool is not heavily used during that time.

There is a nearly identical setup in another location which is not exhibiting the issue.

The client nfs settings are - rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2
nfsd settings are - v3 only, 40 servers.
Even though it is on 10Gb network, jumbo framing is not enabled at the moment.

Is there anyway to see what the one nfs thread is doing when it is eating all the cpu, or to see what the other threads are waiting on?
thanks
 
Joined
May 19, 2017
Messages
6
So this turned out to be multiple clients all doing repeated directory scans in the same directory, but the only way I found this out was to do a tcpdump and wade through the wireshark output. Is there any better way to dig into what an nsfd thread is doing?
 
Status
Not open for further replies.
Top