How to set up LACP/LAGG interface?

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,466
Apparently I need someone to break it down Barney-style for me, because I'm just not able to make this work. I'm wanting to set up both ports on my Chelsio T420-SO-CR NIC as a LAGG interface under SCALE 22.02.4. This was working under CORE 12 with exactly the same hardware (OK, with the exception of the boot SSD), but stopped working when I upgraded to SCALE. I reconfigured one of the ports on the NIC as its own interface, gave it a static IP (once someone else pointed out that "aliases" was the way to do that--hardly intuitive), and it's been working since then, but I'd like to get the LAGG back.

My switch is a Brocade ICX-6610-48P. It's running the latest firmware, and has LAGG configured on two ports--this configuration hasn't been touched since it was working with CORE:
Code:
lag TrueNAS dynamic id 1
 ports ethernet 1/3/7 to 1/3/8 
 primary-port 1/3/7
 deploy


So using the console menu, I choose option 1, configure network interfaces. In the menu that comes up, I delete enp130s0f4. Then I create a new interface of type LACP, name bond0, protocol LAGG, ports enp130s0f4 and enp130s0f4d1. Then I edit that interface and assign it an "alias" of the CIDR IP that had previously been assigned to enp130s0f4 (192.168.1.10/24). Then I apply the settings, and plug both ports from the NIC into the two designated ports on my switch.

At this point, it should work, right? I should be able to ping 192.168.1.10 from other hosts on my LAN, and ping them from the NAS. But no dice; I get timeouts. And yes, I have a default gateway (and DNS server, though it shouldn't be relevant here) set under Network -> Global. Surely I'm missing something simple here--any ideas?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,740
Did you reconfigure your switch ports for LACP? I was able to get LACP working on SCALE in exactly this way. Switch is Cisco 2960L series.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,466
Did you reconfigure your switch ports for LACP?
I haven't touched the configuration on those ports since they were working as a LAGG under CORE. The single interface as I'm currently using it is plugged into a different port on the switch.
 

zenon1823

Explorer
Joined
Nov 13, 2018
Messages
66
@danb35

EDIT: I was continuing testing as i wrote this post and I think i figured out what the problem was and im betting its the same you ran into. I have rewrote this to be a tutorial describing what i experienced and how i solved/resolved it. I hope it helps others as it took me hours to figure out what was a VERY simple problem that should be documented or have input validation done.

TLDR: when entering the static IP for the bond interface (or any for that matter) you must include the subnet at the end of the IP in CIDR format!
(EDIT: ive since noticed in the gui it defaults to /24 when you make it there so this is really only an issue for those of us doing CLI configuration)

Did you get anywhere with this? I think I'm seeing the same behaviour, I'm not a linux networking guru, but I'm starting to think there are some LACP issues in 22.02.4 and also that the networking in general might also have some bugs but none break "normal connectivity" under normal situations (see point 1) . Like you this exact hardware and config was functional with LACP on 12U2. I did a fresh build on this hardware using scale 22.02.4 and can not get LACP to function.

Hardware platform:
Dell R210ii server using onboard broadcom nics with Dlink DGS-1210-28p switch. Setting up 2 member LACP link.

Here is a summary of what I noticed:
POINT 1
On initial boot switch is not configured for LACP, Truenas sets both interfaces to DHCP by default and both get an IP in the same subnet, the gui is up and available on both IP's.
---> Nothing about this stuck me as unusual but after messing around and setting one of the adapters as static it wont let you go back and set it as dhcp giving error "interface_update.ipv4_dhcp: Only one interface can be used for DHCP."
+++> Since both nics were DHCP by default after install which is a config not allowed later. I feel pretty convinced this behaviour is a bug of some sort. Unfortunately I dont have a screenshot showing this as i didnt realize the rabbit hole i was about to go down...

An additional nuance I ran into was that it wont let you have both interfaces set as static IP's (aliases) on the same subnet, but they can be on the same subnet when one is dhcp and the other is static.
**I'm not saying you should be able to do this, but the logic is not consistent to prevent this from happening.

OK lets get to the LACP part and where i think the cause of the issues are:
STEP1 - Here is my initial ifconfig print with eno1 set as DHCP and eno2 as a static address (the address originally given to that interface by DHCP. Everything looks good and works (no lagg setup on switch yet).
1dhcp_2static.PNG


STEP 2 - I setup lagg on switch and setup the following LACP interface (left). Got the following interface after applying (right)
bc_config_interface.png

This resulted in no connectivity however so i check the ifconfig and see this:
d_ifconfig.PNG

I notice that the MAC addresses are all different even the eno1 and eno2. As well I noticed the incorrect netmask and broadcast for the bond interface.

STEP 3 - I then decide to setup the bond interface as DHCP as shown below (left) and for the following interface (right):
1_config_interface.png

1_WorksDHCP_ifconfig.PNG

No surprise here a new IP is given since its a new MAC (the "generated bond mac")
What is a surprise is that everything is working over LACP and the netmask and broadcast are also correct.
....SEVERAL HOURS LATER.....
I notice that the IP on the network interfaces page when using dhcp includes the /24 at the end. The subnet mask is not asked for or specified when we are entering a static IP address in the the alias field of the bond configuration. When I setup the bond as below with a static IP everything works perfectly.
3_config_interface.png

3_WorkStatic_ifconfig.PNG


One last note that might be as issue as well - you (and I) were reusing an IP that was previously used on an unbonded nic and therefore it would be in the local ARP tables to the MAC of that physical interface. When you assign that IP to the bonded VMAC the switch may not refresh its arp table quick enough, i noted my switch still had the old physical macs cached. you might try clearing the ARP cache on the switch/devices if you going to reassign the same IP from a physical to virtual MAC or try a different IP and see if you get connectivity then.
 

Attachments

  • bc_config_interface.png
    bc_config_interface.png
    82.7 KB · Views: 78
  • 1_WorksDHCP_ifconfig.PNG
    1_WorksDHCP_ifconfig.PNG
    32.4 KB · Views: 67

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,466
Did you get anywhere with this?
No, I've pretty well given up on it, at least for the time being.

when entering the static IP for the bond interface (or any for that matter) you must include the subnet at the end of the IP in CIDR format!
Yep, I'm aware of that.

I'm starting to think there are some LACP issues in 22.02.4 and also that the networking in general might also have some bugs
I think so too, though the LACP issues apparently aren't universal (Patrick was able to make it work, as he notes above). And then there are complicating factors:
  • TrueNAS insists on spamming the console with log messages about any kind of change to network status (and UPS status, and lots of other things that just don't need that level of attention). If you have several apps running (as I do), this can mean dozens of messages all over the screen, making the console menu/configuration system near-unusable.
  • But no problem, there's a GUI, right? Sure there is; it's pretty much what makes TrueNAS TrueNAS, and it's (at least arguably) easier to navigate than the console menu. But if you lose your network connection (e.g., because you're changing the network configuration and moving cables around), you can't use the GUI any more.
    • And here's where it gets weird. To try to address that problem, I set up a "management" interface. After all, I have four gigabit NICs in the machine that aren't doing anything, I have a managed switch, and I have a VLAN-capable router. So I set up a "management" VLAN, assigned a port on the switch to that VLAN, plugged one of the gigabit NICs into that port, and configured that NIC with a static IP on that network. Well and good--that interface is up, I can browse to it, log in to the UI, etc.--until I pull the plug on the 10G NIC. When I do that, I can no longer reach the "management" IP, even to ping it.
So, yeah: there's definitely some networking weirdness going on in SCALE. And since I don't really need LACP anyway, I haven't pursued it further.
 
Top