Ive spent a significant amount of time testing Intel and Chelsio 10gig hardware mainly on reasonably spec'ed systems with a fair chunk of RAM so can't comment about lower spec'ed systems however the one thing I have found is the auto-scaling window stuff seems to be slow to react under a number of circumstances, i.e by the time its realized it needs to open up the transfer is already complete. Disabling auto window scaling and sizing the buffers appropriately helped significantly and reduced the often seen ramp up you see on many performance graphs.
Yeah, that's kind of expected with bursty behaviour on TCP connections. It would be reasonable to bump recvspace/sendspace on 10G platforms, but the hazard in doing this is that it means you're forcibly allocating memory to something that might never need it. This would be a problem on a filer with hundreds or thousands of connections.
Some other things that autotune ought to do:
hw.igb.max_interrupt_rate=16384
- doubles the interrupt rate for igb, which will help with small packet processing workloads
hw.ix.max_interrupt_rate=65536
hw.ix.enable_aim=0
- may improve ix (Intel 520) performance
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.random=0
- Only relevant in FreeBSD 11 and beyond. This may actually be a bug.
http://bsdrp.net/documentation/technical_docs/performance
net.inet.tcp.tcbhashsize=????
- Tunable. The hash size of 512 is too small for systems with lots of TCP connections. No good reason not to go to 2048, but a large system might benefit more from 16384.