10Gb tunables on 9.10

cyberjock · Apr 6, 2016

Someone, somewhere here on the forums posted a link to a bunch of tunables and such that allowed them to saturate 10Gb LAN over CIFS (I think it was CIFS). Does anyone know where that thread or the off-site link is? I'm trying to do internal testing and I am looking for the values. I could have sworn I bookmarked the info, but I cannot find it.

Thanks.
-Cyberjock

Mlovelace · Apr 6, 2016

https://forums.freenas.org/index.php?threads/10-gig-networking-primer.25749/page-7#post-272885

The somaxconn tunable was changed to kern.ipc.soacceptqueue in freeBSD 10, just FYI.

cyberjock · Apr 6, 2016

Thank you @Mlovelace that is what I was looking for. Going to do some testing with this. ;)

If I can validate these changes are worthwhile and there's no apparent downside I am going to recommend them be the defaults.

Mlovelace · Apr 6, 2016

cyberjock said:
Thank you @Mlovelace that is what I was looking for. Going to do some testing with this. ;)

If I can validate these changes are worthwhile and there's no apparent downside I am going to recommend them be the defaults.

The values seem reasonable given the person is running a properly spec'd system for 10Gbe, and I believe the net.inet.sendbuf_auto/recvbuf_auto are currently set to 1 by default.

jgreco · Apr 6, 2016

cyberjock said:
Thank you @Mlovelace that is what I was looking for. Going to do some testing with this. ;)

If I can validate these changes are worthwhile and there's no apparent downside I am going to recommend them be the defaults.

No, please, please, please do NOT just go and do that. What actually needs to happen is for someone to actually test them, on a somewhat slowish platform, starting at lower numbers than those and moving them upward. Just grabbing the numbers and making them stupid-big will be counterproductive for users who do not have a lot of memory but do have a lot of clients connecting.

cyberjock · Apr 6, 2016

jgreco said:
No, please, please, please do NOT just go and do that. What actually needs to happen is for someone to actually test them, on a somewhat slowish platform, starting at lower numbers than those and moving them upward. Just grabbing the numbers and making them stupid-big will be counterproductive for users who do not have a lot of memory but do have a lot of clients connecting.

And do you think I'm not going to go doing all of those things you want to be done? You disappoint me sir!

bartnl · Apr 6, 2016

I'm testing my FreeNAS setup (see sig) that so far can not be saturated with these settings so apparently it is slow enough for testing

Still these settings are an improvement. Went from 180 MB/s to 300 MB/s. It is still bottlenecking somewhere in my FreeNAS system as I getidentical R/W speeds from both the single SSD en 12x 4 TB pools from different sources.

cyberjock · Apr 6, 2016

I have a system that has a Nahelem Xeon CPU (it was the crappiest Xeon of the bunch) that works great for these kinds of tests. Although when I talked to someone about this they said that even my main system (E3-1230v2) is almost certainly going to be CPU bottlenecked. :/

wtfR6a · Apr 6, 2016

Ive spent a significant amount of time testing Intel and Chelsio 10gig hardware mainly on reasonably spec'ed systems with a fair chunk of RAM so can't comment about lower spec'ed systems however the one thing I have found is the auto-scaling window stuff seems to be slow to react under a number of circumstances, i.e by the time its realized it needs to open up the transfer is already complete. Disabling auto window scaling and sizing the buffers appropriately helped significantly and reduced the often seen ramp up you see on many performance graphs.

Mlovelace · Apr 6, 2016

wtfR6a said:
Ive spent a significant amount of time testing Intel and Chelsio 10gig hardware mainly on reasonably spec'ed systems with a fair chunk of RAM so can't comment about lower spec'ed systems however the one thing I have found is the auto-scaling window stuff seems to be slow to react under a number of circumstances, i.e by the time its realized it needs to open up the transfer is already complete. Disabling auto window scaling and sizing the buffers appropriately helped significantly and reduced the often seen ramp up you see on many performance graphs.

What setting do you run with so we can do some side-by-side testing? Of course the final sysctls will have to be "tuned" for the system they are on but it would be good to see where you're at with yours.

jgreco · Apr 7, 2016

cyberjock said:
And do you think I'm not going to go doing all of those things you want to be done? You disappoint me sir!

I'm not sure whether to apologize for that, or to point out that I simply took what you said at face value.

Either way, I've been unhappy with those defaults for some time now. It'd be really nice if autotune was updated to be a little smarter at the same time, but I know you're not a code guy.

Mlovelace said:
What setting do you run with so we can do some side-by-side testing? Of course the final sysctls will have to be "tuned" for the system they are on but it would be good to see where you're at with yours.

It looks like maybe the ARC sizing stuff in autotune got made better, so the biggest hurts right now are that autotune is the socket sizing should factor in whether or not there's 10G available in the system, and the static L2ARC 10M/40M values are just crap. A modern L2ARC ought to be able to sustain 64M for write_max and 128M for write_boost, which even at those numbers could be deemed conservative for a modern SSD.

It'd also be great if it kicked up the aggressiveness of the socket sysctl's in the case where you actually do have a larger memory system and 10G. On a large memory system, going out to a 16M or 32M buffer size isn't as damaging as it would be on an 8GB system.

wtfR6a · Apr 7, 2016

I'll dig em out this weekend MLovelace. They were generated on my "old" X9 system, I built this X10 system (in my sig) since I moved and haven't looked at optimising 10gig on this although I expect the same principles will apply somewhat. The other thing you have to bear in mind is the network is a pipe and you can optimise the hell out of FreeNAS, but if your switches and PC on the other end don't have compatible settings you are never going to see max/line speed. I certainly know that the Mac network stack as of Mountain Lion was heavily biased to optimise 1G and also needed some tweaks to make 10gig fly. I haven't spent much time with Windows so can't comment on that setup.

Mlovelace · Apr 7, 2016

wtfR6a said:
I'll dig em out this weekend MLovelace. They were generated on my "old" X9 system, I built this X10 system (in my sig) since I moved and haven't looked at optimising 10gig on this although I expect the same principles will apply somewhat. The other thing you have to bear in mind is the network is a pipe and you can optimise the hell out of FreeNAS, but if your switches and PC on the other end don't have compatible settings you are never going to see max/line speed. I certainly know that the Mac network stack as of Mountain Lion was heavily biased to optimise 1G and also needed some tweaks to make 10gig fly. I haven't spent much time with Windows so can't comment on that setup.

No worries if the system is unavailable. I spent time optimizing my system and was curious where other people landed with their settings.

I understand why the defaults are low, you don't want to break lower-spec'd systems out of the box.

Mlovelace · Apr 7, 2016

I've been running with these values for some time now and have been happy with my throughput. With the change to 9.10 I had to change kern.ipc.somaxconn to kern.ipc.soacceptqueue as the sysctl changed in freeBSD 10. Your mileage may vary depending on your system memory and the number of clients connecting to the system.

sysctl kern.ipc.soacceptqueue=1028
sysctl kern.ipc.maxsockbuf=33554432
sysctl net.inet.tcp.recvbuf_max=33554432
sysctl net.inet.tcp.recvspace=4194304
sysctl net.inet.tcp.recvbuf_inc=524288
sysctl net.inet.tcp.recvbuf_auto=1
sysctl net.inet.tcp.sendbuf_max=33554432
sysctl net.inet.tcp.sendspace=2097152
sysctl net.inet.tcp.sendbuf_inc=262144
sysctl net.inet.tcp.sendbuf_auto=1

jgreco · Apr 7, 2016

wtfR6a said:
Ive spent a significant amount of time testing Intel and Chelsio 10gig hardware mainly on reasonably spec'ed systems with a fair chunk of RAM so can't comment about lower spec'ed systems however the one thing I have found is the auto-scaling window stuff seems to be slow to react under a number of circumstances, i.e by the time its realized it needs to open up the transfer is already complete. Disabling auto window scaling and sizing the buffers appropriately helped significantly and reduced the often seen ramp up you see on many performance graphs.

Yeah, that's kind of expected with bursty behaviour on TCP connections. It would be reasonable to bump recvspace/sendspace on 10G platforms, but the hazard in doing this is that it means you're forcibly allocating memory to something that might never need it. This would be a problem on a filer with hundreds or thousands of connections.

Some other things that autotune ought to do:

hw.igb.max_interrupt_rate=16384

- doubles the interrupt rate for igb, which will help with small packet processing workloads

hw.ix.max_interrupt_rate=65536
hw.ix.enable_aim=0

- may improve ix (Intel 520) performance

kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.random=0

- Only relevant in FreeBSD 11 and beyond. This may actually be a bug. http://bsdrp.net/documentation/technical_docs/performance

net.inet.tcp.tcbhashsize=????

- Tunable. The hash size of 512 is too small for systems with lots of TCP connections. No good reason not to go to 2048, but a large system might benefit more from 16384.

jgreco · Apr 7, 2016

jgreco said:
Yeah, that's kind of expected with bursty behaviour on TCP connections.

No, stupid, that's not expected, that's just a bad behaviour. Maybe instead you should get a fricking clue and actually look at the modern sysctls introduced post 1995.

In particular, if it needs adjusting, you could actually adjust it quickly. Maybe something like

net.inet.tcp.recvbuf_inc=2097152
net.inet.tcp.sendbuf_inc=4194304

With a particular emphasis on boosting the send speed to a client. I bet that'd work, you stupid green idjit. While yer at it, start off with something more like

net.inet.tcp.recvspace=2097152
net.inet.tcp.sendspace=4194304

rather than what @Mlovelace suggested, since it is more likely that the NAS will be sending files to end users than it is to be receiving files from clients. Now go get yourself some frakking coffee.

jgreco · Apr 7, 2016

jgreco said:
No, stupid, that's not expected, that's just a bad behaviour.

Now, now, there's no need to be mean to me!

Mlovelace · Apr 7, 2016

jgreco said:
rather than what @Mlovelace suggested, since it is more likely that the NAS will be sending files to end users than it is to be receiving files from clients.

I boosted receives over sends because this NAS is mostly a backup target with a smaller amount of sends to clients. What you suggested is how I have my home NAS tuned as it's almost all reads and makes sense for most NAS work loads.

jgreco · Apr 7, 2016

Mlovelace said:
I boosted receives over sends because this NAS is mostly a backup target with a smaller amount of sends to clients. What you suggested is how I have my home NAS tuned as it's almost all reads and makes sense for most NAS work loads.

And of course for that use case, that's exactly correct. That's why they call it tuning! ;-)

zambanini · Apr 7, 2016

jgreco and the need for speed

Important Announcement for the TrueNAS Community.

10Gb tunables on 9.10

Inactive Account

Guru

Inactive Account

Guru

Resident Grinch

Inactive Account

Dabbler

Inactive Account

Explorer

Guru

Resident Grinch

Explorer

Guru

Guru

Resident Grinch

Resident Grinch

Resident Grinch

Guru

Resident Grinch

Patron

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "10Gb tunables on 9.10"

Similar threads