Hi, so I’ve been testing my freeNas set up for about six months now, the only issue/ final hurdle I have is inconsistent 10g read SMB speeds (read from FN). (write is always rock solid and consistent , 800-1100MB/s, in all cases). This smb read issue seems totally random, at times the most I will be able to get reading is 550 to 650 MB a second, other times I will get the speeds I’m looking for which are 800 to 1GB/ s +. (watching G stat to confirm if file is all from ARC or if it’s coming from disks, when the issue is occuring, it does not matter if its a full ARC read or not).
I realize this sounds exactly like a MTU issue, (ie transmit/MTU from FN, is the problem), but i really dont think that is the case, as i have tried 1500, 9000, 9014 (even direct nic to nic), and each time confirming mtu is properly set via -DF pings with 8972 (or 1484 sizes), on both sides. 9k MTU only adds about 5-10% boost, which is inline with what others have observed in my forums research.
as this is a test set up (really almost a full rack of test equipment), I have a ridiculous amount of (unused) hardware and drives at my disposal, until i settle upon what will run FN (and the rest of my setup/servers). So i have no real files/data, except for large test .rar , .zip and .dd files ive made.
i’ve confirmed I have the same issue on 4x different hardware setups running FN (2x SM X9 based setups, 1x x10 based , 1x xeon-d embedded system). all systems have 64 GB or more of ECC ram, ram is from the boards QVL.
the clients i’m testing this from (ie the SMB client that is doing the reading from freeNas, are x10 based- either running win10 or server 2016 , fresh installs. 2016 srv is usually a bit faster on R/W than win10, so i generally use that in these tests). the clients either have either large Enterprise nvme drives or hw raid 0 with 4x or more hgst sas3 ent. ssds). I do confirm that the clients can write above 1000 MB a second (aja or AS ssd), and I do watch resource monitor to be sure the disks aren’t being maxed while doing these SMB copies from FN.
The pools on the various free Nas machines are either single disk NVMe, or a 8x disk stripe of hgst sas3 ent. ssds. Benchmarks directly on FN show very high and consistent speeds, as expected (speeds greater than 10gbit would need).
For testing, I have atime disabled, compression disabled, and sync = disabled (not that writes are the issue).
The issue persists whether I am using a DAC direct between client and server, or if it’s going through either of 2x 10g switches (The switches aren’t connected together, and aren’t connected to anything else).
i’ve tested / tried - FN 11.1u6 , 11.2 , 11.2u2.1
the 10g nic s i’m using / moved around are chelsio T520-CR , and connect X2 nics (I try to stick to the 520s for FN, as I know those are the suggested nics, but have tried both ).
Iperf from server 2016 to FN = 9.4 GB/s
Iperf from FN to 2016 srv = 4.5 GB/s
(this is consistant, always, but it i do 3x iperf jobs/threads , ofcourse then FN to 2016 srv = 9.6 GB/s
The only rhyme or reason is that it seems to go in "waves" ie for 15 or 30min every test file i copy (ARC or from disk), will have great speeds (800MB/s +), then at some point, for another 15 or 30m or so, every file i copy from FN will top out at max 600 MB/s (always, ALWAYS, writes to FN are fast, ie 850MB/s +)
I know SMB on Fbsd is single threaded per connection, but that does not explain why i will see great speeds for a while, then not (even on a single cpu system, on a DP its possible that the random affinity of the specific connections thread , at the time, could be causing this, but im seeing this on 2x single CPU systems), also its not that smb is maxing a core (except when i see the fast speeds!) , ie when its doing the slow ~600mb/s smb is at 50-65%)
I havent added any tunables or aux parameters (on any of these test installs), as i haven't found any that are not from 4 or 5 years ago (and most experts seem to post, that as of later FN versions you really dont need/ shouldnt use any tunables).
Can anyone offer any ideas or things to try (or test) to help ID this temporarily reduced speed 10g SMB read-only issue? I know this post maybe a bit all over the place, but i didnt know where to give specifics when working with so many different servers/devices on this one issue (wo making the OP sooo long that no one reads it, which it may already be! lol). pls keep in mind i can test almost any scenario as i have alot of un-used HW for now.
thanks for reading, and with any ideas or help, i appreciate it!
(btw, everything is physical, nothing virtual)
I realize this sounds exactly like a MTU issue, (ie transmit/MTU from FN, is the problem), but i really dont think that is the case, as i have tried 1500, 9000, 9014 (even direct nic to nic), and each time confirming mtu is properly set via -DF pings with 8972 (or 1484 sizes), on both sides. 9k MTU only adds about 5-10% boost, which is inline with what others have observed in my forums research.
as this is a test set up (really almost a full rack of test equipment), I have a ridiculous amount of (unused) hardware and drives at my disposal, until i settle upon what will run FN (and the rest of my setup/servers). So i have no real files/data, except for large test .rar , .zip and .dd files ive made.
i’ve confirmed I have the same issue on 4x different hardware setups running FN (2x SM X9 based setups, 1x x10 based , 1x xeon-d embedded system). all systems have 64 GB or more of ECC ram, ram is from the boards QVL.
the clients i’m testing this from (ie the SMB client that is doing the reading from freeNas, are x10 based- either running win10 or server 2016 , fresh installs. 2016 srv is usually a bit faster on R/W than win10, so i generally use that in these tests). the clients either have either large Enterprise nvme drives or hw raid 0 with 4x or more hgst sas3 ent. ssds). I do confirm that the clients can write above 1000 MB a second (aja or AS ssd), and I do watch resource monitor to be sure the disks aren’t being maxed while doing these SMB copies from FN.
The pools on the various free Nas machines are either single disk NVMe, or a 8x disk stripe of hgst sas3 ent. ssds. Benchmarks directly on FN show very high and consistent speeds, as expected (speeds greater than 10gbit would need).
For testing, I have atime disabled, compression disabled, and sync = disabled (not that writes are the issue).
The issue persists whether I am using a DAC direct between client and server, or if it’s going through either of 2x 10g switches (The switches aren’t connected together, and aren’t connected to anything else).
i’ve tested / tried - FN 11.1u6 , 11.2 , 11.2u2.1
the 10g nic s i’m using / moved around are chelsio T520-CR , and connect X2 nics (I try to stick to the 520s for FN, as I know those are the suggested nics, but have tried both ).
Iperf from server 2016 to FN = 9.4 GB/s
Iperf from FN to 2016 srv = 4.5 GB/s
(this is consistant, always, but it i do 3x iperf jobs/threads , ofcourse then FN to 2016 srv = 9.6 GB/s
The only rhyme or reason is that it seems to go in "waves" ie for 15 or 30min every test file i copy (ARC or from disk), will have great speeds (800MB/s +), then at some point, for another 15 or 30m or so, every file i copy from FN will top out at max 600 MB/s (always, ALWAYS, writes to FN are fast, ie 850MB/s +)
I know SMB on Fbsd is single threaded per connection, but that does not explain why i will see great speeds for a while, then not (even on a single cpu system, on a DP its possible that the random affinity of the specific connections thread , at the time, could be causing this, but im seeing this on 2x single CPU systems), also its not that smb is maxing a core (except when i see the fast speeds!) , ie when its doing the slow ~600mb/s smb is at 50-65%)
I havent added any tunables or aux parameters (on any of these test installs), as i haven't found any that are not from 4 or 5 years ago (and most experts seem to post, that as of later FN versions you really dont need/ shouldnt use any tunables).
Can anyone offer any ideas or things to try (or test) to help ID this temporarily reduced speed 10g SMB read-only issue? I know this post maybe a bit all over the place, but i didnt know where to give specifics when working with so many different servers/devices on this one issue (wo making the OP sooo long that no one reads it, which it may already be! lol). pls keep in mind i can test almost any scenario as i have alot of un-used HW for now.
thanks for reading, and with any ideas or help, i appreciate it!
(btw, everything is physical, nothing virtual)