Kenny Simpson
Cadet
- Joined
- May 19, 2017
- Messages
- 6
I'm running 9.10.2-U3 serving nfs to a couple dozen linux clients over a 10Gb (copper) network. Under heavy load, all the clients seem to get stuck when doing file moves (rename) and on the file server I see a single nfsd thread eating 100% cpu for many (>10) minutes and none of the other nfsd threads doing much.
When the workload is running normally, I see many nfsd service threads running and using up to 40% cpu at times.
There are no errors in dmesg, var/log/messages, etc..
specs:
2x E5-2620 v2 256GB, Intel 10Gb NIC, 2x LSI 9207, 1x LSI 9300 (linux clients are mostly dual socket machines with intel NICs running centos6/7).
2 zfs pools:
72 1TB Micro SSD - as 3 backplanes controlled by the 2x LSI 9207 controllers (7x raid z1 of 10 drives + 2 spares)
44 6TB Seagate - as 2 backplanes controlled by the LSI 9300 (4x raid z1 of 10 drives + 4 spares)
nfs is serving one filesystem off the ssd pool, and 3 filesystems off the spinning drives.
The issue happens on the ssd pool. The other pool is not heavily used during that time.
There is a nearly identical setup in another location which is not exhibiting the issue.
The client nfs settings are - rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2
nfsd settings are - v3 only, 40 servers.
Even though it is on 10Gb network, jumbo framing is not enabled at the moment.
Is there anyway to see what the one nfs thread is doing when it is eating all the cpu, or to see what the other threads are waiting on?
thanks
When the workload is running normally, I see many nfsd service threads running and using up to 40% cpu at times.
There are no errors in dmesg, var/log/messages, etc..
specs:
2x E5-2620 v2 256GB, Intel 10Gb NIC, 2x LSI 9207, 1x LSI 9300 (linux clients are mostly dual socket machines with intel NICs running centos6/7).
2 zfs pools:
72 1TB Micro SSD - as 3 backplanes controlled by the 2x LSI 9207 controllers (7x raid z1 of 10 drives + 2 spares)
44 6TB Seagate - as 2 backplanes controlled by the LSI 9300 (4x raid z1 of 10 drives + 4 spares)
nfs is serving one filesystem off the ssd pool, and 3 filesystems off the spinning drives.
The issue happens on the ssd pool. The other pool is not heavily used during that time.
There is a nearly identical setup in another location which is not exhibiting the issue.
The client nfs settings are - rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2
nfsd settings are - v3 only, 40 servers.
Even though it is on 10Gb network, jumbo framing is not enabled at the moment.
Is there anyway to see what the one nfs thread is doing when it is eating all the cpu, or to see what the other threads are waiting on?
thanks