10g SMB inconsistent READs (writes excellent though)

SMnasMAN

Contributor
Joined
Dec 2, 2018
Messages
177
Hi, so I’ve been testing my freeNas set up for about six months now, the only issue/ final hurdle I have is inconsistent 10g read SMB speeds (read from FN). (write is always rock solid and consistent , 800-1100MB/s, in all cases). This smb read issue seems totally random, at times the most I will be able to get reading is 550 to 650 MB a second, other times I will get the speeds I’m looking for which are 800 to 1GB/ s +. (watching G stat to confirm if file is all from ARC or if it’s coming from disks, when the issue is occuring, it does not matter if its a full ARC read or not).

I realize this sounds exactly like a MTU issue, (ie transmit/MTU from FN, is the problem), but i really dont think that is the case, as i have tried 1500, 9000, 9014 (even direct nic to nic), and each time confirming mtu is properly set via -DF pings with 8972 (or 1484 sizes), on both sides. 9k MTU only adds about 5-10% boost, which is inline with what others have observed in my forums research.

as this is a test set up (really almost a full rack of test equipment), I have a ridiculous amount of (unused) hardware and drives at my disposal, until i settle upon what will run FN (and the rest of my setup/servers). So i have no real files/data, except for large test .rar , .zip and .dd files ive made.

i’ve confirmed I have the same issue on 4x different hardware setups running FN (2x SM X9 based setups, 1x x10 based , 1x xeon-d embedded system). all systems have 64 GB or more of ECC ram, ram is from the boards QVL.

the clients i’m testing this from (ie the SMB client that is doing the reading from freeNas, are x10 based- either running win10 or server 2016 , fresh installs. 2016 srv is usually a bit faster on R/W than win10, so i generally use that in these tests). the clients either have either large Enterprise nvme drives or hw raid 0 with 4x or more hgst sas3 ent. ssds). I do confirm that the clients can write above 1000 MB a second (aja or AS ssd), and I do watch resource monitor to be sure the disks aren’t being maxed while doing these SMB copies from FN.

The pools on the various free Nas machines are either single disk NVMe, or a 8x disk stripe of hgst sas3 ent. ssds. Benchmarks directly on FN show very high and consistent speeds, as expected (speeds greater than 10gbit would need).

For testing, I have atime disabled, compression disabled, and sync = disabled (not that writes are the issue).

The issue persists whether I am using a DAC direct between client and server, or if it’s going through either of 2x 10g switches (The switches aren’t connected together, and aren’t connected to anything else).

i’ve tested / tried - FN 11.1u6 , 11.2 , 11.2u2.1

the 10g nic s i’m using / moved around are chelsio T520-CR , and connect X2 nics (I try to stick to the 520s for FN, as I know those are the suggested nics, but have tried both ).

Iperf from server 2016 to FN = 9.4 GB/s
Iperf from FN to 2016 srv = 4.5 GB/s
(this is consistant, always, but it i do 3x iperf jobs/threads , ofcourse then FN to 2016 srv = 9.6 GB/s


The only rhyme or reason is that it seems to go in "waves" ie for 15 or 30min every test file i copy (ARC or from disk), will have great speeds (800MB/s +), then at some point, for another 15 or 30m or so, every file i copy from FN will top out at max 600 MB/s (always, ALWAYS, writes to FN are fast, ie 850MB/s +)

I know SMB on Fbsd is single threaded per connection, but that does not explain why i will see great speeds for a while, then not (even on a single cpu system, on a DP its possible that the random affinity of the specific connections thread , at the time, could be causing this, but im seeing this on 2x single CPU systems), also its not that smb is maxing a core (except when i see the fast speeds!) , ie when its doing the slow ~600mb/s smb is at 50-65%)

I havent added any tunables or aux parameters (on any of these test installs), as i haven't found any that are not from 4 or 5 years ago (and most experts seem to post, that as of later FN versions you really dont need/ shouldnt use any tunables).

Can anyone offer any ideas or things to try (or test) to help ID this temporarily reduced speed 10g SMB read-only issue? I know this post maybe a bit all over the place, but i didnt know where to give specifics when working with so many different servers/devices on this one issue (wo making the OP sooo long that no one reads it, which it may already be! lol). pls keep in mind i can test almost any scenario as i have alot of un-used HW for now.

thanks for reading, and with any ideas or help, i appreciate it!

(btw, everything is physical, nothing virtual)
 
Joined
May 10, 2017
Messages
838
Don't have a solution but I have experinced the same with FreeNAS and Linux servers, writes are close to line speed and most reads are usually 500/600MB/s, though a few times I get better speeds, I suspect it's Samba related.
 
Joined
Dec 29, 2014
Messages
1,135
The first thing I would suggest to to try iperf or iperf3 between the end stations. That way you can test the raw through capabilities of your network with synthesized traffic. It should look something like below.
Code:
This is the FreeNAS side:
Welcome to FreeNAS
$ iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.252.37, port 17545
[  5] local 192.168.252.27 port 5201 connected to 192.168.252.37 port 36389
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   897 MBytes  7.52 Gbits/sec
[  5]   1.00-2.00   sec  1021 MBytes  8.57 Gbits/sec
[  5]   2.00-3.00   sec  1005 MBytes  8.43 Gbits/sec
[  5]   3.00-4.00   sec  1024 MBytes  8.59 Gbits/sec
[  5]   4.00-5.00   sec   998 MBytes  8.37 Gbits/sec
[  5]   5.00-6.00   sec  1.01 GBytes  8.65 Gbits/sec
[  5]   6.00-7.00   sec  1017 MBytes  8.53 Gbits/sec
[  5]   7.00-8.00   sec  1.01 GBytes  8.66 Gbits/sec
[  5]   8.00-9.00   sec  1009 MBytes  8.46 Gbits/sec
[  5]   9.00-10.00  sec  1003 MBytes  8.42 Gbits/sec
[  5]  10.00-10.00  sec   181 KBytes  7.21 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  9.80 GBytes  8.42 Gbits/sec                  receiver
-----------------------------------------------------------

Code:
This is the ESXi host side:
[root@vm-ucs1:~] /usr/lib/vmware/vsan/bin/iperf3 -c 192.168.252.27
Connecting to host 192.168.252.27, port 5201
[  4] local 192.168.252.37 port 36389 connected to 192.168.252.27 port 5201
iperf3: getsockopt - Function not implemented
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   897 MBytes  7.52 Gbits/sec  8626536   0.00 Bytes    
iperf3: getsockopt - Function not implemented
[  4]   1.00-2.00   sec  1021 MBytes  8.57 Gbits/sec    0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   2.00-3.00   sec  1005 MBytes  8.43 Gbits/sec    0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   3.00-4.00   sec  1024 MBytes  8.59 Gbits/sec    0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   4.00-5.00   sec   998 MBytes  8.37 Gbits/sec    0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   5.00-6.00   sec  1.01 GBytes  8.65 Gbits/sec    0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   6.00-7.00   sec  1017 MBytes  8.53 Gbits/sec    0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   7.00-8.00   sec  1.01 GBytes  8.66 Gbits/sec    0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   8.00-9.00   sec  1009 MBytes  8.46 Gbits/sec    0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   9.00-10.00  sec  1003 MBytes  8.42 Gbits/sec  4286340760   0.00 Bytes  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  9.80 GBytes  8.42 Gbits/sec    0             sender
[  4]   0.00-10.00  sec  9.80 GBytes  8.42 Gbits/sec                  receiver

If that looks good then you need to look at things like RAM, drive controllers, drives, pool layout.
 

SMnasMAN

Contributor
Joined
Dec 2, 2018
Messages
177
Don't have a solution but I have experinced the same with FreeNAS and Linux servers, writes are close to line speed and most reads are usually 500/600MB/s, though a few times I get better speeds, I suspect it's Samba related.

Thanks johnnie, this helps me know im not crazy or missing something really obvious. (as i have spent nearly a week trying to control for EVERY possible HW / SW element, and my results are consistant!)

> elliot
>The first thing I would suggest to to try iperf or iperf3 between the end stations.

Thanks elliot, solid troubleshooting step! I have ofcourse used iperf and ipref3, in both directions extensively in my testing. It does not appear to be a network throughput issue. (ie regardless of FN server HW , or Client OS / HW, i get 9-9.6 gbit/s in both directions, with both iperf2 and iperf3).

The only relevant evidence related to iperf/network is with any windows OSs (w10 or 2016srv), i get 9.x Gbit from win to FN, but usually only 5-6.5gbit from FN to win. While this seems like it would exactly ID my issue, this appears to be a common issue with windows outside of Freenas (ie ive researched this heavily, and in the vmware and pfsense forums, this is common for windows + iperf). However (and this is important), if i set the iperf jobs/number of threads to 2 or 3, then i will get the full 9.x Gbit iperf tests from FN to win.

My understanding is win10 / srv2016 makes use of a newer type of SMB that allows large transfers to be multi-threaded/multi-connections, which speeds up 1gbit+ SMB copies.

Also FTP transfers from FN to Win (using filezilla client) max at about 500-600mb/s per SINGLE CONNECTION, as soon as i simultaneously ftp 2 or more files, i start to see task manager pinned at 9.9gbit.

All along my suspicions has been its some problem with smb server on FN/FreeBSD that for some reason, only *sometimes* uses multi-connection/parallel smb transfers. (PLS keep in mind, in all these tests im only copying 1 single, large file, not multiple small files).
(anyone know how to test for this? ive tried looking at the network connections hoping to see 2 or more when parallel smb is working, but it still appears as one srcip: port / dstip: port pair in netstat or resourse monitor)

If anyone running 10gbit and seeing near 10gbit speeds, has any tunables or SMBd Aux Parameters they think i should try, pls post. (as i said in OP, i haven't used any really as there arent any suggested that are from RECENT posts, and most pros suggest tunables arent really needed/suggested as of more recent FN versions)

Ofcourse my tests from server 2016 to server 2016 smb shares run consistently ~ 1.10GB/s in both directions.

(w FN i have also tried using NFS to ubuntu, and speeds are better, but usually ~700-800 MB/s read from FN)

thanks

EDIT; i figured i would add some quick screen shots, i literally have 100s of other screen shots, trying to ID this issue- but these are some quick ones i made just now, it only started doing it for about 10min on 1x of 2 FN machines (both are fresh/clean installs with nothing running but SMB and SSH and NFSd). you can see in blue box, the issue started happening (TOP on the FN box shows nothing unusual), and right after i did a copy from the 2nd FN machine, just to verify its not something on the win box im testing from (read from 2nd server ran fast). while it shows 656 MB/s, its usually a bit wrose when it happens (ie ~450-600mb/s), and even if i copy the same file RIGHT after (so that it comes from ARC the 2nd time, and i see little/no disk IO), the speed will still be "capped / slower". tks
random speeds capCapture.JPG
 
Last edited:
Top