Resource icon

Resource High Speed Networking Tuning to maximize your 10G, 25G, 40G networks

Both FreeBSD and Linux come by default highly optimized for classic 1Gbps ethernet. This is by far the most commonly deployed networking for both clients and servers, and a lot of research has been done to tune performance especially for local area networks. The default settings are optimized to be efficient for both small and large servers, but because memory is often limited on smaller servers, some tunables that could improve performance for higher speeds (10Gbps and above) are not defaults, because they consume large amounts of memory. Many FreeBSD and Linux servers, especially virtual machines, do not have gobs of memory, so the defaults lean pragmatically towards the smaller end. A TrueNAS system, on the other hand, has at least 8GB of RAM, and often much more, and many users are interested in optimal performance.

Additionally, the congestion control algorithm used by TCP can play a large role in how quickly information is transmitted (and re-transmitted) over the network. While Linux defaults to cubic, FreeBSD is more conservative and has not (yet?) made this a default, instead relying on the older "newreno" algorithm, which is very good at generalized congestion control on gigabit networks, but more research has resulted in other options being available as well.

The alternative congestion control kernel modules are not loaded by default in FreeBSD, and need to be loaded via loader tunables.

TrueNAS CORE (FreeBSD)TrueNAS SCALE (Linux)
It is recommended to set both
cc_cubic_load="YES"
cc_dctcp_load="YES"

in the Tunables section as "loader" type and then reboot. You will then be able to set a sysctl tunable net.inet.tcp.cc.algorithm=cubic
or
net.inet.tcp.cc.algorithm=dctcp
to enable your preferred congestion control module.
Linux defaults to cubic
To load dctcp, set a post-boot task of
modprobe tcp_dctcp
and also run that command from the shell prompt to load it right away. You will then be able to set a sysctl tunable
net.ipv4.tcp_congestion_control=cubic
or
net.ipv4.tcp_congestion_control=dctcp
to enable your preferred congestion control module.

Cubic is recommended for most uses, but high performance networking such as dedicated layer 2 storage networks handling block storage for virtual machines may benefit more from dctcp as both NFS and iSCSI use TCP and are impacted by minor improvements in TCP stack behaviour.

For high speed networking, you really need 10 gig or faster networking. Do not try to use link aggregation of sub-10G circuits. Do not try to use copper 10GBase-T . Jumbo frames are dumbo frames in many cases. These things are problematic. We have an excellent resource in the 10 Gig Networking Primer that will discuss everything you need to get an excellent physical network setup. A chain is only as strong as its weakest link, so if you are trying to get your crummy Aquantia adapter to work well, ... good luck, you'll need it.

ARC is incredibly important for high speed networks. Unless you have a lot of vdevs, most HDD arrays do not have enough oomph to sustain lots of traffic, and this gets worse if you have small files, fragmentation, or other seek-inducing workloads. When you have more ARC, there's more stuff cached and more room for readahead. You are much less likely to get great 10G performance on 16GB of ARC than on 64GB ARC. And remember that SCALE only uses half its memory for ARC by default, so you may need twice as much RAM.

For fast sequential file access, please remember to use a large ZFS block size such as 1MB on your datasets. This will force contiguous allocation and fetching of those blocks when possible.

This brings us around to sysctl tunables. Most of the below buffer size tunables consume additional memory, and many do so on a PER CONNECTION basis. Just like ARC, you ideally need more memory for 10G. The larger your buffer sizes, the more work your kernel can do on your behalf to just keep data flowing smoothly.

You may inspect the current values of sysctl OID's at the TrueNAS shell prompt

TrueNAS CORE (FreeBSD)TrueNAS SCALE (Linux)
Code:
truenas# sysctl net.inet.tcp.cc.available
 net.inet.tcp.cc.available: newreno, cubic, dctcp
truenas# sysctl net.inet.tcp.cc.algorithm
 net.inet.tcp.cc.algorithm: newreno

means the modules are loaded and available, but newreno is still active. You need to assign either " cubic " or " dctcp " to the sysctl tunable "net.inet.tcp.cc.algorithm" to make a different one active. Click on the in-line links for additional information on these great congestion control algorithms.
Code:
truenas# sysctl net.ipv4.tcp_available_congestion_control
 net.ipv4.tcp_available_congestion_control = reno cubic dctcp
truenas# sysctl net.ipv4.tcp_congestion_control
 net.ipv4.tcp_congestion_control = cubic

Linux defaults to cubic, and if you prefer dctcp, then you need to assign "dctcp" to the sysctl tunable "net.ipv4.tcp_congestion_control" to make that active. More detailed information about these algorithms is available from the in-line links in the left column.

In general, if you are operating entirely on a non-routed switched network, doing iSCSI or TCP NFS, dctcp may be your best option. Where connections from the TrueNAS host across the Internet are involved, cubic may be better.

kern.ipc.maxsockbuf: 16777216

Is the maximum combined socket buffer size (recv + send). 16MB is reasonable for 10G. 32MB may be reasonable for 40G. Since this is a limit for the combined recv + send buffers, if you have a bidirectional application such as iSCSI or TCP NFS doing block storage, setting this somewhat higher may be sensible especially if you have lots of memory to work with. Otherwise, don't go nuts, use observation of used buffer sizes in "netstat" to help guide reasonable choices.

net.inet.ip.intr_queue_maxlen: 2048

is the maximum length of the IP input queue. The default may be a bit small for 10G; just go and set it to 2048. You can identify if yours is too small by checking the value of "sysctl net.inet.ip.intr_queue_drops"; if it is greater than zero, try increasing maxlen to the next power of two.

net.inet.tcp.recvspace: 4192304

The default size of the receive buffer. See the next two entries before making any changes to this!

net.inet.tcp.recvbuf_inc: 524288

The receive buffer starts out small, and grows -- very rapidly -- out to recvbuf_max (below). As data flows in faster than the system can handle, the recvbuf is grown by this size. When the recvbuf has overflowed 23 times, it will have grown to 16777216. This happens very rapidly and there is no reason to get aggressive with this unless you have massive amounts of RAM you want to waste. IMPORTANT NOTE: recvbuf_inc is no longer present in FreeBSD 13/TrueNAS CORE 13. Do not worry, its absence is not a problem.

net.inet.tcp.recvbuf_max: 16777216

Is the maximum receive buffer size. 16MB is reasonable for 10G. 32MB may be reasonable for 40G. Don't go nuts, use observation of used buffer sizes in "netstat" to help guide reasonable choices. Yes, this is included in maxsockbuf. No, it doesn't really matter that you will never technically hit this exact number.

net.inet.tcp.sendbuf_inc: 32768
net.inet.tcp.sendbuf_max: 16777216
net.inet.tcp.sendspace: 2097152

These are the corresponding transmit values to the recvbuf values above.

There are also some per-device tunables for specific ethernet chipsets. I hope to fill some in here soon.
Author
jgreco
Views
12,132
First release
Last update
Rating
5.00 star(s) 2 ratings

More resources from jgreco

Latest reviews

The finer details that can help so much! Good to know the nitty gritty when we want to try and get the most out of our systems.
Invaluable resource and post. Most facilities in M&E live and die by storage and network. In all my experience ~30 some odd years, I didn't even know these particular tid-bits. Thanks you!!!
Top