Resource icon

LACP ... friend or foe?

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Too many LACP discussions lately. Here's what you need to know.

1) Link aggregation groups are an interesting way to boost network throughput. They're typically managed by a protocol called Link Aggregation Control Protocol (LACP). Link aggregation is designed to bundle several interfaces together to allow networking gear, servers, etc., to benefit from multiple links. Link aggregation can be used as master/standby for network redundancy, or it can be used to increase network bandwidth. Master/standby is easily accomplished through "failover" mode. The remaining discussionin this post primarily focuses on increasing network bandwidth.

2) LACP REQUIRES BOTH SIDES OF A CONNECTION TO SUPPORT LACP. This means you need a managed or smart switch. Typical consumer grade switches do not support LACP.

3) High speed ethernet network performance is substantially impacted by out-of-order packet delivery. While this happens quite a bit on long hauls across the Internet, it causes massive problems for the TCP segment reassembly on the far end, and substantially reduces performance.

3a) As a result, IEEE 802.3ad reads:

This standard does not mandate any particular distribution algorithm(s); however, any distribution algorithm shall ensure that, when frames are received by a Frame Collector as specified in 5.2.3, the algorithm shall not cause:
a) misordering of frames that are part of any given conversation, or
b) duplication of frames.

The above requirement to maintain frame ordering is met by ensuring that all frames that compose a given conversation are transmitted on a single link in the order that they are generated by the MAC Client; hence, this requirement does not involve the addition (or modification) of any information to the MAC frame, nor any buffering or processing on the part of the corresponding Frame Collector in order to reorder frames.

3b) In FreeBSD, a well-designed hash is used to distribute flows between ports. All traffic for a given flow will be transmitted by one particular interface. The hash includes the Ethernet source and destination address and, if available, the VLAN tag, and the IPv4 or IPv6 source and destination address. This is a nice stateless method to manage distribution of flows. The astute reader will note that a stateful mechanism could possibly be used on a server, but FreeBSD fills many networking roles, and for many of those roles, a stateful mechanism would be inappropriate.

3c) The attached switch probably implements a completely different hashing algorithm, so return traffic may well not flow back to the same interface.

3d) It is possible to instruct FreeBSD to use "roundrobin" hashing, to accomplish what most end users seem to expect LACP to do. Unfortunately, this ultimately results in violations of the misordering prohibition noted in 3a), and while it often appears to work swimmingly well on a lightly loaded network, it typically devolves into rapid hell as the network gets busy, because packet queuing delays become a major factor - just the thing you DON'T want to happen.

3e) If you don't understand that point 3) encompasses and requires all of this, please reread until you do.

4) So the basic problem many people experience is that in order to get a good distribution of traffic, you need more than just two clients. If you have two clients, there is a 50% chance that both flows will end up on the same link of a dual-link LACP connection. If you have three clients, at least two will end up on one connection while another probably has a connection all to itself - but! It is still possible for all three to end up on the same link! And what client generates network traffic 100% of the time? In practice, it is usually difficult to get link aggregation to work well until you have at least a dozen clients and a very busy network.

5) LACP adds another layer of complexity to the networking configuration. In practice, we've seen users try to combine two different interface types (such as sis and em) and while this should theoretically work, it is bad to do in practice. Ideally a lagg interface should be created from two neighboring interfaces of the same type, configured identically, without any IP addresses, just options "up".

6) I find that most users would be better served by a quick upgrade to 10G connectivity to their server. Done properly, this allows a server to deliver 1G reliably to or from multiple clients simultaneously.

7) This discussion is also closely related to putting multiple interfaces on a single subnet, a strategy people often believe works until they find out it doesn't work the way they were expecting.

8) FreeBSD's implementation is handled through the LAGG mechanism, which offers several operating modes that are not (the standards-defined thing called) LACP. One exciting capability for anyone who builds redundancy into their networks is to create a failover configuration. When you have two or more switches that comprise a network, you can hook up your NAS to both switches, with a failover configuration. I use this aggressively in data center configurations along with ESXi failover to create one switch that generally handles storage and backend tasks, another that handles Internet and upstream traffic, so that storage traffic is normally localized on one switch, but in the case of a switch reload, crash, or failure, all traffic just moves to the other switch. This works fine with VLANs, etc., though the topic is perhaps a bit esoteric. Note that you need to set the sysctl net.link.lagg.failover_rx_all for this to work as transparently as possible.

I will be aggressively pruning responses to this thread, especially pointlessly argumentative ones. It is what it is. I didn't write the standard though I agree with it, and also agree with the implementation decisions. I have been operating a LACP based network for many years and find the technology quite agreeable, but I also understand that it is hard for a newcomer to totally grasp. I don't mind expanding this if it turns out my explanation is deficient in some way (which it probably is).
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,155
The only thing I'd add would be a short explanation and/or reference material as to why/where a stateful flow control and distribution is inappropriate as a general solution.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Maintaining state can lead to a denial of service condition. Maintaining state is only appropriate when you have a controlled number of connections and the resulting connection table is a reasonable (manageable) size. This can be okay if you're running an application service on the server, for example, it is fine to have stateful firewalling protecting inbound connections to a Web server. However, if you put a stateful firewall on a router, you can easily melt the thing by flooding it with traffic for many different connections (generally, but especially if we consider line rate spoofed traffic).

So, if you turn that around, it could be argued that you could certainly introduce statefulness for LACP, but if you did so in any context where the traffic was not strictly controlled (which would be almost all of the cases where FreeBSD is deployed for high volume traffic), you would be introducing a catastrophic potential attack vector. Further, it would require a lot of kernel resources to track. For most high volume servers, where you're connecting at greater-than-gigE speeds to a network, traffic is being generated by many separate flows, so the normal LACP hashing is quite sufficient to the task. Only in the specific case of NAS do we often experience a situation where users expect to be able to dominate an entire gigE port with a single client's data. And it's mostly the home users who are resistant to upgrading to 10GbE to fix the problem the natural way. So it's unlikely that any business is going to donate developer resources to add such an ugly feature to the link aggregation code.

So, more briefly, I see this as being an issue that only really impacts home users trying to use FreeBSD as a NAS. I expect that if someone wrote a competent flow based alternative to hashing for LACP that it could be contributed to FreeBSD and would eventually appear in FreeNAS, but this is such an edge case that I don't expect to see it happen anytime soon, because it makes so little sense in general.
 

Bmck26

Dabbler
Joined
Dec 9, 2013
Messages
48
Too many LACP discussions lately. Here's what you need to know.
I will be aggressively pruning responses to this thread, especially pointlessly argumentative ones. It is what it is. I didn't write the standard though I agree with it, and also agree with the implementation decisions. I have been operating a LACP based network for many years and find the technology quite agreeable, but I also understand that it is hard for a newcomer to totally grasp. I don't mind expanding this if it turns out my explanation is deficient in some way (which it probably is).

Have you ever experienced any latency during SSH sessions with LACP configured on FreeNAS? I'm using a netgear prosafe smart switch with LACP configured. Everything seems to be working as expected with activity on all ports with multiple clients connected. However, there is some latency during SSH sessions often resulting in a 10060 socket error especially if the session is a wireless laptop. I would love to use 10g uplink switch but the cost is still a little more than I would like to invest at the moment just to address a latency issue with LACP. I did see a 20 port D-LINK stackable switch with 2 10g SFP+ on Amazon for $334 recently so maybe I'll be able to find a decent used model for a good price in later on.
 

RegularJoe

Patron
Joined
Aug 19, 2013
Messages
330
Hi All,

When I setup a LACP connection on FreeNAS I have no options, when I type ifconfig lagg1 I see someone has decided a hashing algorithm for me and I cannot pick a different one:

laggproto lacp lagghash l2,l2,l4

Most of the newer and smarter Cisco switches allow me to pick these hashing algorithms for LACP:

Switch(config)#port-channel load-balance ?
dst-ip Dst IP Addr
dst-mac Dst Mac Addr
dst-mixed-ip-port Dst IP Addr and TCP/UDP Port
dst-port Dst TCP/UDP Port
extended Extended Load Balance Methods
src-dst-ip Src XOR Dst IP Addr
src-dst-mac Src XOR Dst Mac Addr
src-dst-mixed-ip-port Src XOR Dst IP Addr and TCP/UDP Port
src-dst-port Src XOR Dst TCP/UDP Port
src-ip Src IP Addr
src-mac Src Mac Addr
src-mixed-ip-port Src IP Addr and TCP/UDP Port
src-port Src TCP/UDP Port

Switch(config)#port-channel load-balance extended ?
dst-ip Dest IP
dst-mac Dest MAC
dst-port Dest Port
ipv6-label IPV6 Flow Label
l3-proto L3 Protocol
src-ip Src IP
src-mac Src MAC
src-port Src Port
<cr>

==================================================

Is there any chance to just get src-dst-ip to work between Cisco and FreeNAS? I would never use round robin as out of order is evil and the Cisco switch will tell you that on the console or in the logs. I am using VLAN's and multiple addresses on the FreeNAS side and VMware side and multiple datasets with the least significant bit different for the IP addresses. I think this is a bug with LACP in FreeBSD. The Cisco switch makes the LAGG but some stats appear to be missing, i.e. traffic load balancing info.

Thanks,
Joe
 
Last edited:

RegularJoe

Patron
Joined
Aug 19, 2013
Messages
330
The solution for me was to change the LACP hash on the Cisco switch to Src-Dst-Mac and all is well, 351 meg a second peak writes and 358 meg a second peak reads. iSCSI and NFS working fine. ;)

When I setup a LACP connection on FreeNAS I have no options, when I type ifconfig lagg1 I see someone has decided a hashing algorithm for me and I cannot pick a different one:
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah, the options aren't really offered since FreeNAS is an appliance. Of course, if you want to code in the option of choosing the best setting, then I'm sure the developers would appreciate the pull request in github. ;)

Honestly, I think you are the first person to ever ask for this. I'm a bit shocked too because it sounds like something that I'd expect to be in the GUI. I'm not an LACP wizard though, so I don't even know what all of the options are. :P
 

RegularJoe

Patron
Joined
Aug 19, 2013
Messages
330
Cyberjock,

I think the limitations are on the FreeBSD side of things. It indeed does work but the feedback from the Cisco 3560 switch is not enough to see the packet distribution. In my lab I am going to install 2 FreeBSD 10.x systems and see how LACP works between them with a Trunk and 4 VLANs for the 4 subnets. I hope to see some statistics on the Cisco end.

I truly hate the garbage tier 2 switches(Broadcom or Marvell) that say they can do X, Y and Z. When the rubber meets the road I found out they only kind of do X, Y and Z... A 7 year old 3750g-24tss on ebay for $250 is a much better way to work in a LAB than with this crap from Linksys, Belkin, Dell or HP. What a mess! I thought LACP would level the switching field so everyone could port-channel without issues! WRONG! These same wanna be switch vendors are still trying to figure out VLANs and Jumbo frames too! LOL

Thanks,
Joe
 

Nicolas1988

Dabbler
Joined
Mar 22, 2015
Messages
12
Hello,

Is it a good idea to use LACP with ESX?
Actually I have two ESX and 20VMs on it

Thank you?
 

RegularJoe

Patron
Joined
Aug 19, 2013
Messages
330
Nicolas,

There is only ONE way to get LACP to work with VMware ESXi 5.0-5.5 when using a 4 port gig ethernet adapter and FreeNAS 9.3 stable with iSCSI. I am sure a paid support ticket with TrueNAS would get you setup in less than 1 hour.

I hope VMware adds some brains to VMware so it can do a better job of LACP and load balancing network interfaces for iSCSI, NFS and vMotion. It seems that some of the big companies really lag in what the software and hardware industry is going. If you try the same thing with Hyper-V you will have even more roadblocks and issues.

The best thing that can happen is FreeNAS 10 using FreeBSD 10.2 or higher. Once we get that we can do virtual machines via the bhyve hypervisor and have the correct LACP hashing to get traffic back to FreeNAS 9.3 or 10.x

http://bhyve.org/
https://en.wikipedia.org/wiki/Bhyve

Thanks,
The Average Joe
 

no_connection

Patron
Joined
Dec 15, 2013
Messages
480
My experience with LACP is mixed. As posts above says you have little control in how traffic is really going so it offers unknown benefit in terms of bandwidth but high headache factor in finding problems.

MAC based hash is the default, and considering that if you have routing all traffic might just pass though one link anyway if the router MAC is used.
IP based would solve that but as above, still no guarantee to split the load.

The bottom line from our CCNP teacher is: If you need it for bandwidth use something else/better. If you need it for redundancy there are other alternatives.
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
@no_connection, :D a good one.

On more serious note, your CCNP teacher should have analyzed with the class the scenarios when deploying LACP is advantageous and when doing so is clearly not...
 

no_connection

Patron
Joined
Dec 15, 2013
Messages
480
He did go through that, and as I said there are not many reasons to use it if you want bandwidth, especially with 1G links.
Go 10G instead.
You could find a usage for it, and that's fine, especially if you tweak it to you needs and get reliable gain from it.

For redundancy it's great, but not problem free.

So I pretty much affirm what jgreco wrote.
 

RegularJoe

Patron
Joined
Aug 19, 2013
Messages
330
In a specialty use case where VMware ESXi 5.5 is trying to make multiple connections to multiple portals and doing the VMware round robin across all 4 ports to a FreeNAS box with 4 1 gig NIC's I can see where this is desirable and works in my case. There are 100 other cases where the bandwidth is not using more than 1 of the 1gig links. With 10 gig links I am sure that I can saturate 4 of those as well as long as VMware also has 4 to use and all the stars line up. ;)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
In a specialty use case where VMware ESXi 5.5 is trying to make multiple connections to multiple portals and doing the VMware round robin across all 4 ports to a FreeNAS box with 4 1 gig NIC's I can see where this is desirable and works in my case. There are 100 other cases where the bandwidth is not using more than 1 of the 1gig links. With 10 gig links I am sure that I can saturate 4 of those as well as long as VMware also has 4 to use and all the stars line up. ;)

Assuming that you mean on a single network, with FreeBSD, this is not going to work as expected, because the OS will pick one interface as being the correct link to the network, and even though you might have four interfaces on the FreeBSD box, traffic will egress through just one. Discussed in https://forums.freenas.org/index.php?threads/multiple-network-interfaces-on-a-single-subnet.20204/

VMware and some storage manufacturers have butchered it into working because the average storage admin is too much of a doof to also have to learn proper IP networking, but it doesn't work on FreeBSD because it's fundamentally broken and LACP is designed to handle this sort of issue.
 

VictorR

Contributor
Joined
Dec 9, 2015
Messages
143
3c) The attached switch probably implements a completely different hashing algorithm, so return traffic may well not flow back to the same interface.

4) So the basic problem many people experience is that in order to get a good distribution of traffic, you need more than just two clients. If you have two clients, there is a 50% chance that both flows will end up on the same link of a dual-link LACP connection. If you have three clients, at least two will end up on one connection while another probably has a connection all to itself - but! It is still possible for all three to end up on the same link! And what client generates network traffic 100% of the time? In practice, it is usually difficult to get link aggregation to work well until you have at least a dozen clients and a very busy network.

ooooh boy, I wish I had read this before spending $3k on a Netgear XS728T 10GbE managed switch and staying up until 5am last night trying to get LACP to work between my 9.3.1 box(3 x Intel X540 DA2 - X540T2BLK) and it. According to Netgear's manual, it is supposed to be simple....not so much

A Mac Pro direct-connected (via Sonnet Twin 10G Thunderbolt to 10GbE converter) to the NAS is getting ~860MB/sec reads and ~500-600MB/sec writes. The other 5 NIC ports are combined into a single LAGG/LACP channel. A same model Mac Pro connected(via same model Sonnet) to the XS728T gets 20MB/sec read and writes.

Our situation is kind of unique, in that there will only be 6 clients using this network/NAS for online editing/post-production of ultra hi-def video. Depending on the codecs used, each camera shot could require 40-140MB/sec bandwidth. Multi-camera sequences require multiples of that. Final edit assemblies and color correction will be demanding. Granted, those high bandwidth periods will be few and far between, but they will happen.

Of course, I could simply direct-connect the six clients to the six available ports and be done with it. The problem arises when they add more users. And that will be happening soon.

I was too tired to go any farther. Heading back to the office now to try and make some sense of it all
 

RegularJoe

Patron
Joined
Aug 19, 2013
Messages
330
This is the ONLY case I can get 4x the speed with 4 interfaces and FreeNAS :
*) VMware ESXi iSCSI <no LACP> Cisco <LACP> FreeNAS

For your Mac users you might want to make 2 targets on FreeNAS and manually put 1/2 your users on one IP and 1/2 on the other IP so you get both interfaces used. LACP just makes it so that some traffic might get bound to a single NIC.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
According to Netgear's manual, it is supposed to be simple....not so much

Of course it's simple, after all, it's magic. :smile:

A Mac Pro direct-connected (via Sonnet Twin 10G Thunderbolt to 10GbE converter) to the NAS is getting ~860MB/sec reads and ~500-600MB/sec writes. The other 5 NIC ports are combined into a single LAGG/LACP channel. A same model Mac Pro connected(via same model Sonnet) to the XS728T gets 20MB/sec read and writes.

20MB/sec? Something's wrong.

Our situation is kind of unique, in that there will only be 6 clients using this network/NAS for online editing/post-production of ultra hi-def video. Depending on the codecs used, each camera shot could require 40-140MB/sec bandwidth. Multi-camera sequences require multiples of that. Final edit assemblies and color correction will be demanding. Granted, those high bandwidth periods will be few and far between, but they will happen.

Of course, I could simply direct-connect the six clients to the six available ports and be done with it. The problem arises when they add more users. And that will be happening soon.

5 ports as a LAGG? Get rid of one. Or maybe even three. Betcha you have much better luck with a 20Gbps LAGG. The switch silicon probably doesn't even scale to handling 5 ports.
 
Top