High Speed Networking Tuning to maximize your 10G, 25G, 40G networks

jgreco · Jan 31, 2023

jgreco submitted a new resource:

High Speed Networking Tuning - Make your 10G, 25G, 40G networking faster

Both FreeBSD and Linux come by default highly optimized for classic 1Gbps ethernet. This is by far the most commonly deployed networking for both clients and servers, and a lot of research has been done to tune performance especially for local area networks. The default settings are optimized to be efficient for both small and large servers, but because memory is often limited on smaller servers, some tunables that could improve performance for higher speeds (10Gbps and above) are not...

Read more about this resource...

smdftw · Feb 2, 2023

When I try to set these tunables in SCALE I get the "Sysctl 'xxx' does not exist in kernel" message.

Are these tunables called different in SCALE or are they only applicable to CORE?

Elliot Dierksen · Feb 2, 2023

I am curious about the "Do not try to use copper 10GBase-T" comment in the resource (which was very helpful). I am sure that there are scars associated with that comment. I am starting to see more of my customers wanting to use 10GBase-T and I would be interested in hearing about your experiences.

jgreco · Feb 2, 2023

Elliot Dierksen said:
I am sure that there are scars associated with that comment.

Actually not; I own a number of bits of gear with 10GBase-T ports, including a very nice X9DR7-TF+ board and some Dell PowerConnect 8024F switches. These were basically incidental acquisitions where I did not deliberately seek them out, and generally use the ports as conventional 1G copper ports.

The most immediate arguments against 10GBase-T are:

1) that it consumes more power than an equivalent SFP+ or DAC setup. Which might seem like *shrug*, except that once you get to the point of burning several extra watts per port on a 48-port switch, it becomes a meaningful ongoing operational expense. Newer estimates are two to five watts per port, whereas SFP+ is about 0.5-0.7. It is worth noting that in a data center environment, if you burn five extra watts in equipment, there is usually about a five to ten watt cost to provide cooling as well. The electrical costs for 10GBase-T add up.

2) that it experiences higher latency than an SFP+ or DAC setup. SFP+ is typically about 300 nanoseconds. 10GBase-T, on the other hand, uses PHY block encoding, so there is a 3 microsecond step (perhaps 2.5 us more than the SFP+). This shows up as additional latency in 10GBase-T based networks, which is undesirable, especially in the context of the topic of this thread, which is performance maximization. I'm sure someone will point out that it isn't a major hit. True, but there regardless.

3) that people argue for 10GBase-T because they've already got copper physical plant. The problem is that this is generally a stupid argument. Unless you installed Cat7 years ago, your copper plant is unlikely to be suitable for carrying Cat7 at distance, and that janky old 5e or 6 needs to be replaced. Today's kids did not live through the trauma of the 90's, where we went from Cat3 to Cat5 to Cat5e as ethernet evolved from 10 to 100 to 1000Mbps. Replacing physical plant is not trivial, and making 10GBase-T run at 100 meters from the switch is very rough; Cat6 won't cut it (only 55m), you need Cat6A or Cat7 and an installer with testing/certification gear, because all four pairs have to be virtually perfect, and any problems with any pair can render the connection dead.

By way of comparison, fiber is very friendly. OM4 can take 10G hundreds of meters very efficiently. It's easy to work with, inexpensive to stock various lengths of patch, and you can get it in milspec variants that are resistant to damage. You can run it well past the standards-specced maximum length in many cases.

On the flip side, 10GBase-T has the advantage of being a familiar sort of plug and play that generally doesn't require extra bits like SFP+ modules and may be easier to work with inside a rack.

Elliot Dierksen said:
I am starting to see more of my customers wanting to use 10GBase-T

I think the big driver for many people is the familiarity thing I just mentioned; they can wrap their heads around 10GBase-T because at the superficial level it feels very much like 1GbE. There's a lot of FUD that has slowly percolated into the industry over the last few decades about fiber and fiber installers, because terminating fiber in the field is specialist work that requires expensive gear and supplies. However, these days you can often cheat and get factory-terminated prebuilt assemblies that can avoid the field termination work. Very easy to work with.

Elliot Dierksen · Feb 2, 2023

Thanks for the additional info. I have seen that in my own limited experience in my home office. The 10GBase-T liked most of the cables in a 10M trunk bundle (nice thing from fs.com), but not all of them. Perhaps it was the coupler in the no-punch panel, I don't know. Yes I can punch when needed, but it isn't one of my favorite leisure activities. All that was part of trying to consolidate some of the stuff in my home office to reduce the background noise, but I am straying too far off topic. Thanks again for the excellent clarification.

jgreco · Feb 2, 2023

Elliot Dierksen said:
Yes I can punch when needed, but it isn't one of my favorite leisure activities.

Actually, this is where things are more likely to go off the rails. Category cable is all about twisted pair characteristics, and if you'll allow me a little liberty for inaccuracy to do a better layman explanation, there are RF components to the issue as well as physical factors:

Category 3 cable, which any monkey should be able to terminate, operates at 16 MHz, handling 10Mbps, and due to the long twist length, it was very common to find people untwisting excessive amounts of cable and just punching it. This continued at least into the Cat5 100Mbps era, and among other things, both 10/100Mbps only used two out of the four pair, which made sloppiness somewhat more forgiving since screwing up a pair still left you operable in many cases.

However, with 1GbE, better signal processing led to the use of all four pairs, AND all four pairs being used simultaneously in both directions. This is most of how we got to 1GbE without a significant increase in bandwidth; Cat5e was only 100MHz-350MHz depending on the era. Crosstalk (RF interference between pairs) and delay skew (difference in transmission times due to differing lengths of the pairs) become significant issues though, and therefore it became necessary for installers to up their games on the quality of field terminations. You had to bring the twist almost all the way up to the terminals, and also make sure that you weren't causing pair lengths to differ, or shortening one conductor of a pair more than another. Messing with this would cause weird problems and failures.

With 10GbE, we have once again boosted MHz to 500 MHz, and moved to a much more complicated encoding strategy that includes 16 discrete signal levels. This means that it is even more sensitive to field termination errors, and you really need a perfectionist grade punch technique followed by a thorough cable test/cerification to get this working reliably.

Tony-1971 · Feb 2, 2023

I was checking the values in my Core system, and there is this error:

Code:

root@freenas-sm[~]# sysctl net.inet.tcp.recvbuf_inc
sysctl: unknown oid 'net.inet.tcp.recvbuf_inc'

root@freenas-sm[~]# sysctl net.inet.tcp | grep -i recv
net.inet.tcp.recvspace: 524288
net.inet.tcp.recvbuf_max: 16777216
net.inet.tcp.recvbuf_auto: 1

The others oid are present.
Is it missing because
cc_cubic_load="YES"
cc_dctcp_load="YES"
are not present?

Best Regards,
Antonio

jgreco · Feb 2, 2023

No, it looks like they rejiggered it in FreeBSD 13 to not be needed. That's what I get for doing my testing on TrueNAS CORE 12 I guess.

See

net.inet.tcp.recvbuf_inc removed?

Hi there. Under FreeBSD 13, the command "sysctl net.inet.tcp.sendbuf_inc" replies as; net.inet.tcp.sendbuf_inc: 8192 However, "sysctl net.inet.tcp.recvbuf_inc" says; sysctl: unknown oid 'net.inet.tcp.recvbuf_inc' Any idea why it doesn't exist? Is such parameter removed in FreeBSD 13? If so...

forums.freebsd.org

It's still fine, the _max value is really the more important bit.

Elliot Dierksen · Feb 2, 2023

Tony-1971 said:
cc_cubic_load="YES"
cc_dctcp_load="YES"
are not present?

Those are loadable tunables, not sysctl tunables.

jgreco · Feb 2, 2023

Elliot Dierksen said:
Those are loadable tunables, not sysctl tunables.

I may have misunderstood the question. @Tony-1971 I thought you were asking if the _inc OID was missing because the modules weren't loaded. You can check what CC modules are loaded and which is active:

net.inet.tcp.cc.available: newreno
net.inet.tcp.cc.algorithm: newreno

Elliot Dierksen · Feb 2, 2023

I think I need to tune something else. It looks like my box is still using newreno even though the other methods are available.

Code:

freenas2% kldstat | grep cc
 7    1 0xffffffff82e76000     1e90 cc_cubic.ko
 8    1 0xffffffff82e78000     2850 cc_dctcp.ko
freenas2% sysctl -a | grep '\.\cc\.'
net.inet.tcp.cc.dctcp.slowstart: 0
net.inet.tcp.cc.dctcp.shift_g: 4
net.inet.tcp.cc.dctcp.alpha: 1024
net.inet.tcp.cc.newreno.beta_ecn: 80
net.inet.tcp.cc.newreno.beta: 50
net.inet.tcp.cc.abe_frlossreduce: 0
net.inet.tcp.cc.abe: 0
net.inet.tcp.cc.available: newreno, cubic, dctcp
net.inet.tcp.cc.algorithm: newreno

jgreco · Feb 2, 2023

Do you have a sysctl that sets

net.inet.tcp.cc.algorithm=cubic

Maybe I didn't make that clear in my resource... mmm

Elliot Dierksen · Feb 3, 2023

jgreco said:
Maybe I didn't make that clear in my resource... mmm

It is referenced in there, but it wasn't as obvious as the other things.

jgreco · Feb 3, 2023

Elliot Dierksen said:
It is referenced in there, but it wasn't as obvious as the other things.

I reworked it yesterday so that it's now mentioned twice, and more emphasized on one of those.

specter9mm · Feb 7, 2024

smdftw said:
When I try to set these tunables in SCALE I get the "Sysctl 'xxx' does not exist in kernel" message.

Are these tunables called different in SCALE or are they only applicable to CORE?

Most of the kernel options there are specific to the BSD based Truenas Core. There are similar options for the Debain Linux based Truenas Scale, but require some research.

Mlovelace · Feb 22, 2024

specter9mm said:
Most of the kernel options there are specific to the BSD based Truenas Core. There are similar options for the Debain Linux based Truenas Scale, but require some research.

On the 10g systems that I have switched to Scale, these are the network sysctl's I have implemented, fwiw. Make sure you add the modprobe tcp_dctcp to the post init command.

net.core.netdev_max_backlog = 300000

net.core.optmem_max = 268435456

net.core.rmem_default = 212992

net.core.rmem_max = 134217728

net.core.somaxconn = 8192

net.core.wmem_default = 212992

net.core.wmem_max = 134217728

net.ipv4.tcp_congestion_control = dctcp

net.ipv4.tcp_rmem = 4096 87380 134217728

net.ipv4.tcp_sack = 0

net.ipv4.tcp_wmem = 4096 65536 134217728

Mlovelace · Feb 22, 2024

Elliot Dierksen said:

In my environment on TrueNAS Core systems, I found the htcp congestion control algorithm to yield the best results. You could give it try and see if you have a similar experience. You'll need to load the htcp kernel modual before switching to htcp

Code:

kldload cc_htcp

sysctl net.inet.tcp.cc.algorithm=htcp

If that works out for you add the following tunables:

Variable	Value	Type	Enabled
cc_htcp_load	YES	LOADER	yes
net.inet.tcp.cc.algorithm	htcp	SYSCTL	yes

specter9mm · Feb 22, 2024

Alright, curious what, if anything, else I can do. I am topping out at 7.1Gbps. Attached are screenshots of my sysctl settings. The systems have identical processors, motherboards, memory, and SFP cards. In each instance the first connection is over my switch, which has plenty of capacity to handle multiple 10Gbps simultaneous throughput. The second in each screenshot is direct from server to server interface. The sysctl options are the same on both, which I have also attached a screenshot of. MTU on the server to server is set to 9000, through the switch is 1500, as nothing else on my network is jumbo packet. I just can't seem to get close to 10Gbps.

Mlovelace · Feb 22, 2024

specter9mm said:
Alright, curious what, if anything, else I can do. I am topping out at 7.1Gbps. Attached are screenshots of my sysctl settings. The systems have identical processors, motherboards, memory, and SFP cards. In each instance the first connection is over my switch, which has plenty of capacity to handle multiple 10Gbps simultaneous throughput. The second in each screenshot is direct from server to server interface. The sysctl options are the same on both, which I have also attached a screenshot of. MTU on the server to server is set to 9000, through the switch is 1500, as nothing else on my network is jumbo packet. I just can't seem to get close to 10Gbps.

The retransmits on that iperf screenshot (retr 7149) shows you've got some issues going on. The column Retr stands for Retransmitted TCP packets and indicates the number of TCP packets that had to be sent again. The lower the value in Retr the better. An optimal value would be 0, meaning that no matter how many TCP packets have been sent, not a single one had to be resent. A value greater than zero indicates packet losses which might arise from network congestion (too much traffic) or corruption, perhaps hardware issues.

specter9mm · Feb 22, 2024

Mlovelace said:
The retransmits on that iperf screenshot (retr 7149) shows you've got some issues going on. The column Retr stands for Retransmitted TCP packets and indicates the number of TCP packets that had to be sent again. The lower the value in Retr the better. An optimal value would be 0, meaning that no matter how many TCP packets have been sent, not a single one had to be resent. A value greater than zero indicates packet losses which might arise from network congestion (too much traffic) or corruption, perhaps hardware issues.

Yep, that's why I went direct interface to interface, the re transmits only happen through the switch. The only other difference is the gbics. One machine has finisar, and one machine has cisco gbics. The fiber cable is only .5 meter, and I have tried swapping it with other .5 meter cables I have. Only other thing I can think is to try cisco to cisco gbic, and finisar to finisar gbic.

Important Announcement for the TrueNAS Community.

High Speed Networking Tuning to maximize your 10G, 25G, 40G networks

Resident Grinch

Cadet

Guru

Resident Grinch

Guru

Resident Grinch

Contributor

Resident Grinch

Guru

Resident Grinch

Guru

Resident Grinch

Guru

Resident Grinch

Dabbler

Guru

Guru

Dabbler

Attachments

Guru

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "High Speed Networking Tuning to maximize your 10G, 25G, 40G networks"

Similar threads