NFS unresponsive during scrub

MasterTacoChief · Feb 24, 2018

System currently has 11.1-U1 installed. I have a volume that is shared with VMWare servers via NFS. Once a scrub started, disk latency to VMWare shot through the roof (40+ seconds reported in some cases), and VMs using this share were virtually locked up. Immediately after the scrub completed everything returned to normal. Seems like the scrub priority is set too high compared to other tasks.

c32767a · Feb 24, 2018

Do you have a SLOG or Cache device on the pool?

MrToddsFriends · Feb 24, 2018

Swap usage during the scrub?

wblock · Feb 25, 2018

Update to 11.1-U2. The default values of two sysctls that affect system responsiveness during scrub or resilver were changed back to their old defaults. This should make the system much more responsive during those operations.

MasterTacoChief · Feb 25, 2018

There's a 240GB SAS SSD configured as L2ARC. Swap stayed constant at ~40MB usage. System is the "Main" one in my signature. CPU usage was around 65%, aggregate disk throughput peaked at about 1.4GB/s.

Based on wblock's comment, I'll have to upgrade in the near future. Makes me wish there was a good way to have redundant failover so I could avoid having to migrate all my VMs to a different FreeNAS server before the update.

c32767a · Feb 25, 2018

James Beams said:
There's a 240GB SAS SSD configured as L2ARC. Swap stayed constant at ~40MB usage. System is the "Main" one in my signature. CPU usage was around 65%, aggregate disk throughput peaked at about 1.4GB/s.

Based on wblock's comment, I'll have to upgrade in the near future. Makes me wish there was a good way to have redundant failover so I could avoid having to migrate all my VMs to a different FreeNAS server before the update.

ESXi does sync writes. When there's a high IO load on the disks and latency spikes, it can be lethal for sync write performance. Assuming your SSDs can do decent iops, I would carve off a 20GB partition(or mirror if you have 2 ssds) and add it to your ESXi volume as a slog.

Also, FreeNAS uses a very conservative number for the number of nfsd threads it allows by default. You could also go into the NFSd service configuration and increase the number from the default (which is still 4, I think.) to something more like 10 or 20. Depending on where you are bottlenecking, that might help as well.

MrToddsFriends · Feb 25, 2018

James Beams said:
Swap stayed constant at ~40MB usage.

Stux' pagein Perl script might be useful to find out if a temporal correlation between "performing a scrub" and "(re-)starting swap usage" exists in your FreeNAS environment. Might come in handy if factoring out possible root causes for unresponsiveness becomes an issue.

wblock · Feb 25, 2018

James Beams said:
Based on wblock's comment, I'll have to upgrade in the near future.

Well... or change the sysctl values to the old default.

Important Announcement for the TrueNAS Community.

NFS unresponsive during scrub

MasterTacoChief

Explorer

c32767a

Patron

MrToddsFriends

Documentation Browser

wblock

Documentation Engineer

MasterTacoChief

Explorer

c32767a

Patron

MrToddsFriends

Documentation Browser

wblock

Documentation Engineer

Similar threads

Important Announcement for the TrueNAS Community.

NFS unresponsive during scrub

Explorer

Patron

Documentation Browser

Documentation Engineer

Explorer

Patron

Documentation Browser

Documentation Engineer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "NFS unresponsive during scrub"

Similar threads