LACP lag link MAC address same across SCALE installs

ckb2

Cadet
Joined
Apr 23, 2021
Messages
2
I have added a second SCALE box to my network and run into an unexpected problem as a result. Both are running 22.02 RC2. Everything works fine with the first machine set up with its LACP link, addressed via DHCP. When setting up this system, I noted that SCALE creates its own MAC address for the LACP link (as discussed in Jira issue NAS-112401) but didn't think much of it after updating my DHCP static mapping. This is what that looks like to my switch (lg2 is the link to the first box). Note the MAC address ending in 4333:

Code:
SSH@coresw#show mac-address
Total active entries from all ports = 40
MAC-Address     Port                 Type         VLAN
3cec.ef0d.2112  lg2                  Dynamic      10   
3cec.ef0d.2113  lg2                  Dynamic      10   
3ef6.d99a.4333  lg2                  Dynamic      10   


The issue came when I switched the second SCALE box over from a single link to LACP. As you might have guessed by now, its LACP link has the same MAC address as the first box. Both are on the same VLAN and this creates havoc with the DHCP server making both machines inaccessible. From my switch's perspective, the 4333 MAC bounces back and forth between lg2 and lg3, with the MAC table oscillating between the following over time:

Code:
SSH@coresw#show mac-address
Total active entries from all ports = 38
MAC-Address     Port                 Type         VLAN 
3cec.ef0d.2112  lg2                  Dynamic      10   
3cec.ef0d.2113  lg2                  Dynamic      10   
3ef6.d99a.4333  lg3                  Dynamic      10  


or

Code:
SSH@coresw#show mac-address
Total active entries from all ports = 40
MAC-Address     Port                 Type         VLAN
0cc4.7a9d.80a0  lg3                  Dynamic      10   
0cc4.7a9d.80a1  lg3                  Dynamic      10     
3cec.ef0d.2112  lg2                  Dynamic      10   
3cec.ef0d.2113  lg2                  Dynamic      10   
3ef6.d99a.4333  lg2                  Dynamic      10   


From the NAS-112401 Jira issue, it sounds like the Core behavior was for the LACP interface to adopt the MAC of one of the member interfaces, which would avoid this issue. My questions are therefore:
  1. Is there a plan to implement some type of MAC address randomization or promotion so I can have multiple SCALE boxes with LACP and DHCP working on the same VLAN?
  2. Is there a way to change the virtual MAC of the LACP link either through the TrueNAS console or in Linux itself (such that it won't get reverted after a reboot/update)?

Let me know if I can provide any additional information to help get to the bottom of this and thanks in advance.
 

ckb2

Cadet
Joined
Apr 23, 2021
Messages
2
Some additional musings:

I tried forcing a change of the MAC with
Code:
ip link set dev eth0 address 3e:f6:d9:9a:43:34
and this successfully changed the MAC of the LACP interface as confirmed by both ifconfig and my switch's MAC table. Unfortunately, it seems like the DHCP client doesn't like manual ifupdown activity so my bond0 interface never receives an IP.

After some poking around the TrueNAS middleware repository, it seems like pyroute2 is being used to manage networking, specifically with the NDB interface module being called to create the bond interface (here). NDB can assign a MAC to virtual interfaces (see the docs) so this option could be added to the UI if desired. I haven't started digging into the NDB source to understand how it derives its default MAC for new interfaces yet, but that's probably my next step.
 
Top