My problem with that logic is that a NIC should be handling the TCP/IP offload. So the MTU doesn't matter for a local network as much as people want to believe. Your example is for a virtual NIC if I'm not mistaken. In that case(and the big push for MTU back in the day) was that the CPU did the actual processing of the TCI/IP offload work, so fewer packets were a major boon. This is why I mentioned that even 4 year old hardware can saturate a Gb NIC.
I have yet to see someone change the MTU on physical hardware and see a significant increase, with one exception. That exception was a database server that always transferred data in sets of 2048-bytes. So changing the connection from the database and the application server to 2048-bytes(plus the 14 bytes for the packet frame if I remember correctly) gave slightly better results. But even then it was only a few percent for a server that wasn't limited by network speed anyway.
But you are absolutely right that changing the MTU when you are having problem is a big mistake. Plenty of hardware doesn't handle jumbo packets correctly, and more still only support certain sizes.
I remember buying 4 Intel Gb NICs and a $1000 Gb network switch back in 2003ish. Disabling the TCP/IP offload would max out my CPU at the time and couldn't hit 400Mb/sec. But enabling it saved the day. I still have 2 of those PCI cards here somewhere. But since I was stuck with crappy PCI 32-bit/33Mhz PCI bus(one card which was my SCSI controller) I couldn't regularly get ultra high speeds except to/from my RAM disks.
Is this assessment wrong?
Yes, sorry, at least in part wrong. But at least nice and logical!
I'm going to try to keep this accessible for everyone else suffering through this. I'm including some helpful links and being a little more wordy than usual (is that possible?)
First, yes, the example was for entirely virtualized interfaces, which means that there was no actual pesky physical hardware to slow things down, or on the flip side, to speed things up with offload.
Second, hardware offload at gigabit speeds was much more important when we had slower CPU's. Modern CPU's do not necessarily need an assist and can do the heavy lifting at link saturation speeds. On the other hand, gigabit has been around for about 15 years now. The first generation of adapters, like the Tigon 1 (see:
Netgear GA620, still have some of those somewhere), it was still an era where we were happy to get interfaces smart enough to handle bus mastering efficiently. The idea of offloading the entire processing into silicon was left to the realm of Windows and Novell drivers.
Now the real big problem is that actual full offload is impractical without a manufacturer's assistance in developing it. With only a few exceptions (I'm thinking of
Kip Macy's Chelsio drivers), FreeBSD doesn't support TCP Offload. It is largely seen as something that's very complicated to do in a generalized manner for only modest performance improvements. The Linux people have fought against TOE for years, and the
summary on Wikipedia is more complete than anything I could whip up in a few minutes. Please do check it out!
So what you have left on FreeBSD are
basically two major subcategories of non-full offload:
1)
Checksum Offload - many better interfaces include support for TCP and UDP Checksum Offload. This can be a pretty big deal on slower CPU's. It is less of a factor on faster CPU's. Kind of like how we just assume ZFS doing RAID parity calculations in software is okay but it makes us prefer faster CPU's.
2a)
TCP Segmentation Offload (TSO) - generally supported on the same sorts of chips that support checksum offload. The idea is that data written to a socket is handed to the device driver in larger chunks (64K?) and instead of the code doing the chopping up and updating the header fields and queuing packets for transmit, all of that is left to the silicon. It is basically handling the most common, rudimentary task... everything else, like retransmits, has to be handled by the driver.
2b)
Large Receive Offload (LRO) - takes and merges packets back into larger chunks (more or less the opposite of TSO).
So anyways, TSO and LRO do work together to significantly reduce the benefits of jumbo frames. On a network segment where all jumbo traffic is local, that's nearly the end of the story for now... but if you have intervening equipment or
WAN requirements, like routers, then it is absolutely jumbo for the win - TSO and LRO are basically drug dealers enabling us to continue our addictions to these teeny packets in a less painful manner.
1500 made lots of sense at 10Mbps, but now we're coming into the era of 10Gbps, fully 1000x faster. We ought to fix CRC32 (the use of which means checksums lose effectiveness if you go much above 9000 MTU) and go for massive packets.
I will note with some interest that there's a complex set of factors that seems to have held us at around 1Gbps for commodity connections for over a decade, and some small part seems to be that it is now pretty easy to cope with gigabit level traffic even on smallish devices. By way of comparison, we made the jump from 10Mbps (commodity in 1993) -> 1Gbps (1998, actually appearing as commodity by about 2003) So what I think may end up happening is that we're going to eventually see the adoption of 10Gbps as commodity over the next several years, but that means that the previous two order-of-magnitude increases took ten years total, while this single order-of-magnitude increase is likely to be more like fifteen years (10Gbps, commodity around 2018?)
I think part of the trouble is that 1Gbps is really fast enough for so many purposes. You can run video over it, or remote desktops, or all sorts of things. Demand to go past 1Gbps is just kind of soft. So if we're lucky, we might see 100Gbps become a commodity when we're old men. Or possibly never.
As for checksum offload, that's generally useful for the obvious reasons.
Intel's ethernet chips usually support checksum, TSO, and LRO ... and do so with a driver authored by Intel. This means that generally their desktop cards perform nearly as well under FreeBSD as their server cards.
So I rate your message as "in part wrong" because you never use TOE Offload with FreeBSD (Chelsio excepted), but you do use lesser offload features such as checksum, TSO, and LRO. Those three things form a core that gives you a good percentage of what TOE might do for you, so you are also "in part right."