NFS unresponsive during scrub

Status
Not open for further replies.

MasterTacoChief

Explorer
Joined
Feb 20, 2017
Messages
67
System currently has 11.1-U1 installed. I have a volume that is shared with VMWare servers via NFS. Once a scrub started, disk latency to VMWare shot through the roof (40+ seconds reported in some cases), and VMs using this share were virtually locked up. Immediately after the scrub completed everything returned to normal. Seems like the scrub priority is set too high compared to other tasks.
 

MrToddsFriends

Documentation Browser
Joined
Jan 12, 2015
Messages
1,338
Swap usage during the scrub?
 

wblock

Documentation Engineer
Joined
Nov 14, 2014
Messages
1,506
Update to 11.1-U2. The default values of two sysctls that affect system responsiveness during scrub or resilver were changed back to their old defaults. This should make the system much more responsive during those operations.
 

MasterTacoChief

Explorer
Joined
Feb 20, 2017
Messages
67
There's a 240GB SAS SSD configured as L2ARC. Swap stayed constant at ~40MB usage. System is the "Main" one in my signature. CPU usage was around 65%, aggregate disk throughput peaked at about 1.4GB/s.

Based on wblock's comment, I'll have to upgrade in the near future. Makes me wish there was a good way to have redundant failover so I could avoid having to migrate all my VMs to a different FreeNAS server before the update.
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
There's a 240GB SAS SSD configured as L2ARC. Swap stayed constant at ~40MB usage. System is the "Main" one in my signature. CPU usage was around 65%, aggregate disk throughput peaked at about 1.4GB/s.

Based on wblock's comment, I'll have to upgrade in the near future. Makes me wish there was a good way to have redundant failover so I could avoid having to migrate all my VMs to a different FreeNAS server before the update.

ESXi does sync writes. When there's a high IO load on the disks and latency spikes, it can be lethal for sync write performance. Assuming your SSDs can do decent iops, I would carve off a 20GB partition(or mirror if you have 2 ssds) and add it to your ESXi volume as a slog.

Also, FreeNAS uses a very conservative number for the number of nfsd threads it allows by default. You could also go into the NFSd service configuration and increase the number from the default (which is still 4, I think.) to something more like 10 or 20. Depending on where you are bottlenecking, that might help as well.
 

MrToddsFriends

Documentation Browser
Joined
Jan 12, 2015
Messages
1,338
Swap stayed constant at ~40MB usage.

Stux' pagein Perl script might be useful to find out if a temporal correlation between "performing a scrub" and "(re-)starting swap usage" exists in your FreeNAS environment. Might come in handy if factoring out possible root causes for unresponsiveness becomes an issue.
 
Last edited:

wblock

Documentation Engineer
Joined
Nov 14, 2014
Messages
1,506
Status
Not open for further replies.
Top