Slow NFS but SMB and local access are fast

2fst4u

Dabbler
Joined
Mar 20, 2019
Messages
24
I'm aware this subject gets pretty much done to death but unfortunately I've been searching and searching every corner of the internet for answers to no avail.

My issue is that NFS is incredibly slow. I've got one dataset being used by Plex and NextCloud mounted directly as jails, and local access is as expected for SATA. I've done an iozone and I'm happy with those results. A 32GB test file done locally gives read and writes consistent with local SATA.

Iperf3 tests to TrueNAS over gigabit Ethernet are full-speed, 900+Mb/s both ways. So no networking issues.

I've set up an SMB dataset outside that one just for testing and a 9GB test file copied to and from the NAS is full-speed on a windows device connected via gigabit ethernet. Roughly 90MB/s so around 720Mb/s which seems fine for SMB. I'm happy with that.

An NFS test through from both a raspberry pi 4 (low power device) and my surface pro 7 in windows subsystem for Linux (not low power) only gets about 3-4MB/s 29-34MB/s on second check. Absolutely abysmally slow and I have no idea why.

I set up another dataset outside this one to test NFS again and I see no change so it doesn't seem to be specific to the dataset and is something to do with NFS.

Now, specs:
  • Motherboard: ASRock rack E3C224D2I
  • CPU: Intel Core i3-4170 CPU @ 3.70GHz
  • RAM: 16GB ECC DDR3 (max the board supports)
  • Hard drives: 6 X WD Red 3TB WD30EFRX-68EUZN0 in RaidZ2
  • Hard disk controllers: onboard SATA
  • Network cards: onboard Intel gigabit, just one port no teaming
I have disabled atime on the dataset and that only gives me the speed I see now (yes, it was even slower before that). I've been doing hours of research on zfs to understand how write caching, sync writes and async writes work and tried setting the dataset to Async only, but it makes absolutely no difference at all. 4MB/s is the fastest I see over NFS.

I feel like a SLOG isn't the answer here. I should be seeing roughly the same speeds as SMB, right? Before working on adding things like SLOG I want to figure out what's slowing it down at the basic level.

So what can I try next?
 
Last edited:

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
What are your NFS mount options? Typically, you'd want to include rsize=131072 and wsize=131072 as part of your mount options for best performance.
 

2fst4u

Dabbler
Joined
Mar 20, 2019
Messages
24
What are your NFS mount options? Typically, you'd want to include rsize=131072 and wsize=131072 as part of your mount options for best performance.
Alright, progress! (But not resolution).

So I tell a lie, one of the tests from the raspberry pi inside the SABnzbd application that's running on it gave a speed of around 30MB/s so about 240Mb/s. Still too slow for my liking.

So with those settings you gave I mounted again inside WSL on my surface and the speed is back up to equal to that, around 30MB/s.

Which begs the question what does that change? But anyway, that puts us on a level playing field with another device in the network which eliminates one thing at least.

So where might I be losing all the rest of that speed? 30MB/s still seems very slow compared to the SMB speeds I can achieve.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
In the NFS service, how many servers do you have configured under the Number of servers field? You typically should set this equal to the number of cores on your CPU. I think the default is 2, but if you have more cores, you should increase this.
 

2fst4u

Dabbler
Joined
Mar 20, 2019
Messages
24
It is set to 4 which is equal to the number of cores I have.

I got excited for a moment that this might be it but nope, it's set correctly.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Actually, your CPU only has 2 cores, but uses HyperThreading to present 4 virtual cores, according to Intel ARK. You might be oversubscribing your CPU. Try dropping it down to 2.
 

2fst4u

Dabbler
Joined
Mar 20, 2019
Messages
24
Ok I'll give it a try. In the tooltip for that setting it says:

Keep this less than or equal to the number of CPUs reported by sysctl -n kern.smp.cpus to limit CPU context switching.
Which returns 4.

With 2 set I get slightly reduced speed at about 28MB/s. Putting it back at 4 goes back up to 30-35MB/s.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
OK, now we're in the domain of system tuning via System->Tunables. Under System->Advanced, try enabling Autotune, which will set several sysctl tunables for best system, TCP, and ZFS performance. On my Asrock E3C226D2I system, I also add the following tunables:
  • loader tunable cc_cubic_load="YES"
  • sysctl tunable hw.igb.rx_process_limit="-1"
  • sysctl tunable net.inet.tcp.cc.algorithm="cubic"
Reboot to have the loader tunable take effect, as well as activating the autotune script.
 

2fst4u

Dabbler
Joined
Mar 20, 2019
Messages
24
Appears to make no change. If anything a slight loss of 1-3MB/s. Removing the tunables and disabling the autotune (I'm not sure if that does anything) and rebooting again gets it back up to the reference speed.

That's unfortunate.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Hmm. What does nfsstat show on the client and server?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Also, are you trying NFS over Wi-Fi to either the Pi or the Surface?
 

2fst4u

Dabbler
Joined
Mar 20, 2019
Messages
24
All devices are wired gigabit. Iperf confirms this so I know there aren't any weird speed syncs set wrong on any NICs. The surface is connected via a surface dock with gigabit ethernet to the same switch as TrueNAS and the raspberry pis (there are actually several) are all connected via another gigabit switch before the one that TrueNAS is connected to. They are absolutely connected at gigabit because that's the same path out to the internet for the raspberry pis and I can get (close enough to) gigabit speed with speedtest-cli out to the internet. I'm 100% sure we can rule out layer 1 but good call checking.

Attached is nfsstat. Client on left and server on right. This is just after both devices were rebooted which is why I assume the numbers are so low. Do you want me to do anything before running the command?
 

Attachments

  • Screenshot 2021-11-21 160259.png
    Screenshot 2021-11-21 160259.png
    222.3 KB · Views: 1,187

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
OK, so the client shows it's reauthenticating with every RPC call. This is obviously going to result in poor performance. Try adding timeo=600, retrans=2, and sec=sys, to your mount options.
 

2fst4u

Dabbler
Joined
Mar 20, 2019
Messages
24
Appears to make no change. In the absence of proper screen capture software here is a video of nfsstat as the copy occurs on the client.

Edit: with all the aforementioned mount options enabled

 

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288
@2fst4u For clarity, please post the contents of "/etc/exports" on TrueNAS and the mount command you're using on your RPi and the output of "nfsstat -m". Contrary to what you said in your OP re: "seeing roughly the same speeds as SMB", I'd expect the NFS xfer speeds always to be lower than your SMB xfer speeds on your TrueNAS.

All the mount options mentioned by Samuel are defaults on any up to date linux distro. Here's my example:

Code:
chris@kubuntu:~$ sudo mount -vvv -t nfs 192.168.0.99:/mnt/Bpool/test /home/chris/NFS
mount.nfs: timeout set for Sun Nov 21 09:02:21 2021
mount.nfs: trying text-based options 'vers=4.2,addr=192.168.0.99,clientaddr=192.168.0.201'
mount.nfs: mount(2): Protocol not supported
mount.nfs: trying text-based options 'vers=4.1,addr=192.168.0.99,clientaddr=192.168.0.201'
chris@kubuntu:~$ nfsstat -m
/home/chris/NFS from 192.168.0.99:/mnt/Bpool/test
 Flags: rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.201,local_lock=none,addr=19
2.168.0.99

chris@kubuntu:~$ mount | grep NFS
192.168.0.99:/mnt/Bpool/test on /home/chris/NFS type nfs4 (rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=s
ys,clientaddr=192.168.0.201,local_lock=none,addr=192.168.0.99)
chris@kubuntu:~$ umount /home/chris/NFS


If by "tried setting the dataset to Async only", you mean you set "sync=disabled" for that dataset, then I would have expect the NFS xfer speed to increase.

FYI, the NFS server works in sync mode ( SMB server works in async mode), leaving the sync=standard on the datset then allows the client to work in sync or async mode. A Linux mount defaults to async on the client side.
 
Last edited:

2fst4u

Dabbler
Joined
Mar 20, 2019
Messages
24
@2fst4u For clarity, please post the contents of "/etc/exports" on TrueNAS
Code:
root@Truenas:~ # cat /etc/exports
V4: / -sec=sys
/mnt/Volume01/Shared -alldirs -mapall="root":"wheel"
/mnt/Volume01/kubernetes -mapall="root":"wheel" 


The "kubernetes" one was the second export I created to see if anything changed but it's the same. "Shared" is the regular export that I'm using.
and the mount command you're using on your RPi and the output of "nfsstat -m".
The raspberry pis are mounting in kubernetes and I haven't figured out how to add mount options there so for consistency let's just stick to the Linux on my surface where I know the mount options and the speeds are corroborated. This is /etc/fstab:
Code:
192.168.50.10:/mnt/Volume01/Shared /mnt/shared nfs Async,nolock,rsize=131072,wsize=131072,timeo=600, retrans=2,sec=sys

Contrary to what you said in your OP re: "seeing roughly the same speeds as SMB", I'd expect the NFS xfer speeds always to be lower than your SMB xfer speeds on your TrueNAS.
Yes so that's all well and good but the disparity I'm seeing I don't think is normal. 90MB/s compared to 30MB/s doesn't seem like a simple matter of protocol inefficiency.
If by "tried setting the dataset to Async only", you mean you set "sync=disabled" for that dataset, then I would have expect the NFS xfer speed to increase.
Yes that's what I mean. I would too which is why I tried it.
FYI, the NFS server works in sync mode ( SMB server works in async mode), leaving the sync=standard on the datset then allows the client to work in sync or async mode. A Linux mount defaults to async on the client side.
This is why I forced sync=disabled to ensure it was only operating in this mode but to no avail.

Any thoughts?
 

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288
The disparity may be greater than you expected. Between my NAS and a bog standard Linux PC, I get 117MB/s for a single large file xfer over SMB. For NFSv4, with sync=standard the xfer is bursty and averages circa 68 MB/s for the same file copy. As your SMB vaules are already lower than mine, what could you expect for NFS?

I had wondered if your NFS client was working in sync mode. One to thing to check when you write data to your sync=disabled dataset is that there is zero zil activity on your pool as shown by the "zilstat" command on TrueNAS.
 
Last edited:

2fst4u

Dabbler
Joined
Mar 20, 2019
Messages
24
Definitely no hits on ZIL when I run zilstat. Zeroes across the board while it transfers.

I guess I just find it hard to believe NFS would be just one third of the speed as SMB. There's no encryption and no authentication (on mine) so I don't see why it would only be able to reach about a quarter of the available network throughput with no contention. Even your example is double the speed I'm getting.

If anyone can corroborate and say "yea this is expected speed, there's nothing to fix" then I'll be happy to stand down and leave this here for posterity, but right now forgive me if I seem a bit sceptical. It just doesn't seem right.

If it is as expected for my situation, what could be done to improve throughput? What needs upgrading in my hardware? Better processor? More vdevs? The CPU doesn't seem to be getting hammered during a transfer so I don't think the bottleneck is there. I want to build a new box from scratch and if I can design it to be faster that would be ideal. I've had this one since before FreeNAS Coral and it's served me well but it's big and not very energy or space efficient.
 

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288
I'd be happier if you were testing xfers from a PC or laptop, I only have a RPi3 so cannot offer any direct comparisons. I did boot into my Win 10 desktop install, which I hardly use, and setup WSL for the first time to install debian. I soon discovered it had to be WSL2 in order to use mount.nfs , nfsstat etc. I'm surprised your WSL didn't barff on that "Async" param. Not too scientific, but a quick test showed speeds not that disimilar from Linux direct, in my case. A timed 10GB file copy to my NAS took 1m34.828s.

But I agree something seems amiss. I would discount a CPU bottleneck for this use, my 2-core 4-thread Xeon E3-!220L has similar compute power to your i3. Your 6 disk raidz2 stream speed should enable near 1Gbe wire speed file read/write(async), even a single mirror vdev of WD REDs should do that. I suppose locks could/would reduce xfter speeds, but you don't seem to be using any ( not a multiprotocol share?). You've not said whether revertig to NFSv3 made any difference. By the way, which version of TrueNAS are you running? I don' have any other ideas right now. But I would suggest reviewing your pool health, and check if you have one or more HDDs that are not performing well, see : https://klarasystems.com/articles/openzfs-using-zpool-iostat-to-monitor-pool-perfomance-and-health/

The optimal design for a new box is of course very usecase dependent. Do you optimise for streaming or IOPS? Do you have heavy sync writes? How much RAM can you install,max number of HDDs/SSDs, etc. , etc. The needs careful planning.
 

nasboy

Cadet
Joined
Mar 13, 2022
Messages
1
I am having a similar issue, this happened to me since I upgraded from FreeNAS 11(Latest) to TrueNAS(Lates). NFS client writes start out super fast, then come to a crawl after several seconds. I tested writing to the zpool locally(no issues), the CIFS writes from 3 different machines(Linux[no issues]/Windows[no issues]), but NFS writes from one host not only slow down but they start timing out all actions from all of my NFS mounts on multiple servers. The NFS timeouts from multiple hosts on my monitoring system(check_mk) is the way I found this issue.

I suspect there is a bug somewhere in the NFS daemon.
 
Top