VLANs Bridges and LAG Interface best practice questions

beaster · May 17, 2021

I have a working Truenas 12.0U3 installation.

The installation has as follows
(note I used this reference documentation https://www.truenas.com/community/threads/how-to-setup-vlans-within-freenas-11-3.81633/ )

Switch => LAG (gig24 + gig 25) => Trunk vlan 1-1000 => TRUENAS Server

VLAN 1 is the native VLAN on the trunk (untagged)
VLAN 1 has no IP information on the switch side
VLAN 1 has no IP information on the TRUENAS server side

VLAN 3 is tagged on the TRUNK
VLAN 3 has a /24 IP interface on the switch side
VLAN 3 has the management interface for the TRUENAS Server and default gateway as well as a summary route 10/8 to the same gateway

I have a Firewall VM operating with 3 NIC interfaces mapped to 3 Bridges on the TRUENAS SERVER
The PFSense firewall has Bridge 400 (WAN), BRIDGE 2 (LAN) and BRIDGE 12 (DMZ)

The firewall works perfectly well in this setup

The TRUENAS Server a TRUNK LAG with 6 VLANs on it.
The TRUENAS Server has Bridge 400 mapped to VLAN 400 on the LAG
VLAN Tag: 400 VLAN Parent Interface: lagg0
The TRUENAS Server has Bridge 2 mapped to VLAN 2 on the LAG
VLAN Tag: 2 VLAN Parent Interface: lagg0
The TRUENAS Server has Bridge 12 mapped to VLAN 12 on the LAG
VLAN Tag: 12 VLAN Parent Interface: lagg0

I also have
The TRUENAS Server has Bridge 14 mapped to VLAN 14 on the LAG
VLAN Tag: 14 VLAN Parent Interface: lagg0
VLAN 14 has no ip address initially, however I want to add a locally specific one to that VLAN for LAN local ARP and IP traffic.

If I try to add an IP Address to the TRUENAS server to more than 1 VLAN (in this case VLAN 3 is the first IP address)
In this case I tried to add an IP address to VLAN 14 interface
Then after committing the changes the switch/server connection destabilizes.
I loose all connections other than the one to the TRUENAS Server on VLAN 3 and I suspect this is because the ARP is still cached on the switch.

The IP Switch in this case a Meraki device, looses all other connectivity to the firewall and other hosts on all other VLANs

The only thing that I can see that is immediately obvious is that all the VLAN IP interfaces on the TRUENAS server have the same MAC address.
However the Bridge Interfaces have unique MAC addresses for each Bridge Interface.

What I noted when I added the VLAN IP address for VLAN 14 was that MAC address was same for VLAN 14 as VLAN 3 with the new IP address for the TRUENAS
However the whole setup failed shortly afterwards.
The clients table for the port shows the following for a short time.
IP address 10.10.3.13 on VLAN 3 mac address 40:16:7e:36:0e:50
IP address 10.10.14.2 on VLAN 14 mac address 40:16:7e:36:0e:50

I am wondering if the TRUENAS Server is the issue here.

My Question is as follows
I want to setup an IP on other VLAN's to talk locally to that subnet (ie no use the default gateway/route)

When adding 2nd local IP address for a VLAN, should I be adding that IP address to
a) The Bridge 14 Interface (which has a unique MAC)
or
b) The VLAN 14 Interface. (which has a shared MAC)
or
c) I should be investigating Jails for this requirement

I am trying to figure how the TRUENAS deals with switching between the Interface VLAN and the BRIDGE and if that is even possible with the default packages/build in 12.0U3

my next option is to pull out a packet sniffer however if there is a quicker solution here I would be keen to test it out.

Patrick M. Hausen · May 18, 2021

General advice:

don't mix untagged and tagged traffic on the same port - just don't use the "native VLAN"
put the IP addresses on the bridge interfaces, not on the VLAN interfaces for all FreeBSD based systems

This setup works perfectly well here, although I use OPNsense, not pfSense. The reason for the IP address on the bridge requirement is that it is documented for FreeBSD that any bridge member interface must not have an IP address. It breaks multicast.
The reason not to use the "native VLAN" is that first pf can get confused (rules on the untagged interface "catch" tagged traffic) and second, as soon as you configure a bridge on the physical interface, you cannot use VLANs, anymore.

physical --> LAGG --> VLAN --> bridge

Layer 3 addresses on the topmost interface, i.e. the brigde, only.

beaster · May 18, 2021

> don't mix untagged and tagged traffic on the same port - just don't use the "native VLAN"
I'm not and it's good to see I've got that part right

> put the IP addresses on the bridge interfaces, not on the VLAN interfaces for all FreeBSD based systems
This is exactly the advice I was chasing, I was going to test this anyway but given the amount of contradicting documentation around it was not clear if this was a waste of my time or not.

> IP address on the bridge requirement is that it is documented for FreeBSD that any bridge member interface must not have an IP address. It breaks multicast.
Not a problem for me since most Meraki Switches don't support PIM Routing, the newer equipment does but not the older stuff, in any case I don't need multicast.

@Patrick thanks for the feedback it supports basically what I suspected and I'll test this out and reply once I've done so.

> physical --> LAGG --> VLAN --> bridge
> Layer 3 addresses on the topmost interface, i.e. the bridge, only.

This simple information would make the word of difference for most users with by bhyve and VLAN trunking.
I'd recommend that something like this be pushed to the documentation as a reference use case or similar as its the one area of the Virtual Machine support that is the most problematic and the one that people moving from VMWare and KVM will have the most trouble finding parity with.

beaster · May 18, 2021

After some messing around I able to get this configuration working with the IP addresses on the Bridges.
It required a reboot and quite a while for the virtual machines to be able to be detected by the switch and for external hosts to be able to ARP the virtual machines. I suspect that some sort of MAC/ARP flood is required from the hypervisor in order to for this to happen more quickly.

The configuration is as follows

physical --> LAGG --> VLAN --> bridge
physical --> LAGG --> VLAN --> bridge
physical --> LAGG --> VLAN --> bridge
physical --> LAGG --> VLAN --> bridge

beaster · May 18, 2021

sorry last post got cut off...
The configuration is as follows

physical --> LAGG --> VLAN --> bridge
physical --> LAGG --> VLAN --> bridge
VLAN 1 MTU 1500 configured by not used (untagged VLAN from L2 switch)
VLAN 2 MTU 1500 --> bridge default setup - 1 VM FIREWALL LAN gateway to L3 switch
VLAN 3 MTU 1500 --> bridge IP setup static 10.10.3.13/24 with gateway/routes to 10.10.3.1
VLAN 10 MTU 1500 --> bridge default setup - spare
VLAN 11 MTU 1500 --> bridge default setup - spare
VLAN 12 MTU 1500 --> bridge default setup - multiple VMs
VLAN 13 MTU 1500 --> bridge default setup - 1 VM FIREWALL DMZ to L2 switch VLAN
VLAN 14 MTU 9000 --> bridge IP setup DHCP 10.10.14.2/24 (setup for SMB/NFS sharing to servers with jumbo MTU)
VLAN 300 MTU 1500 --> bridge default setup - 1 VM FIREWALL WAE gateway to L2 switch VLAN & cable modem

The graph above shows the sustained transfer from a host on the switch to move the contents of a Disk to the SMB process on the TRUENAS server.
Since this is a single host/mac to single host/mac it won't load balance on the LAG however that's fine.

It confirms the TRUENAS server can correctly switch/route traffic as needed based on locally connected IP subnets vs default route/gateway addresses.

The PFSense firewall I have setup operates well, download tests support 200 Mbps ingress.
When pushing traffic into and out of the firewall it's worth noting that transfer speeds into and out of the TRUENAS server are reduced
They're not massive but they are noticeably reduced, this is likely a by product of the Bridging setup.
Its not clear if the LAG is helping here or not I would need to do more testing to validate that statement.

The following is a shot of the interfaces setup to confirm this configuration sticks and is working

Hope this helps as a reference case for this sort of setup.
I moved my home Labs and Media setups from VMWare to TRUENAS with virtualization in order to get away from ESXi.
This confirms it can be done, and is actually a lot simpler to maintain if you have some basic Linux / BSD experience.

In order to support ZFS on ESXi I was using a bunch of RDMs and other messing things that ultimately didn't add a lot of value and in the end was hard to update and maintain. The one benefit the old setup had was it did perform well the switching of traffic was a line rate for most instances.

For my requirements this is "good enough for now"

Patrick M. Hausen · May 18, 2021

Did you change the interface assignments of the VM's NICs from the VLAN to the bridge interfaces?
My VMs come online instantly.

beaster · May 18, 2021

Patrick M. Hausen said:
Did you change the interface assignments of the VM's NICs from the VLAN to the bridge interfaces?
My VMs come online instantly.

All the VM's that I had configured in the TRUENAS server were already setup with NIC's that were based on the BRIDGE XXX
Aside from VLAN 1 it was not actually possible to choose VLAN XXX as the interface for the NIC in the configuration

beaster · May 18, 2021

Patrick M. Hausen said:
Did you change the interface assignments of the VM's NICs from the VLAN to the bridge interfaces?
My VMs come online instantly.

"My VMs come online instantly."

From reboot to reboot my VM's also come online quickly however if I make any sort of change to the network setup on the TRUENAS server's network setup, there is normally a period of ~ 2-3 minutes of loss of connectivity for all the VM's that are on the TRUENAS server, the rest of the network operates without incident.

When I first made the cut over to dual IP interfaces and the change from the VLAN to BRIDGE setup for those IP, all the VM's lost connectivity for all the VM's for ~ 5-10 minutes.

My firewall which gets it's WAN IP via DHCP through the bridge and into a switched VLAN and then to a cable model seems to take the most time to solicit this IP allocation.

Once the setup is running, then when I do a reboot everything comes up very quickly and this is fine, I suspect insufficient time time has elapsed for those ARP/MAC tables to time out to the hosts can renew/forward with out any issues.

This is why I suspect something is up in regards to MAC/ARP flooding since I think the switch is not learning about various changes and is waiting until those hosts start sending some for of outbound traffic at layer 2 allowing them to initiate a DHCP request once the MAC is seen on the switch ports.

It's not enough of an inconvenience at the moment to warrant messing with a working setup. However if there is some learning here I would be keen to understand more what is happening on the Hypervisor side of the TRUENAS server that might be the cause of this.

The rule of thumb I am operating on is that any change to the network setup on the TRUENAS server will result in a 1-3 minute loss of connnectivity on the virtual machines operating on the server. Provisioning new VM's does not impact this but any reconfiguration of the network setup seems to trigger this.

beaster · May 18, 2021

It should be noted that 2-3 minutes after a change is when the TRUENAS server is online and I can interact with it via the same TRUNK interfaces as the VM's it is hosting.

Patrick M. Hausen · May 18, 2021

For me and some other folks changing the TrueNAS network settings requires a reboot to get VMs and jails back online. Sorry, forgot to mention that. I rarely mess with the server itself.

beaster · May 18, 2021

@Patrick M. Hausen thanks for your advice here it was invaluable saved me stubbing around in the dark.
Hopefully others will find this setup useful to validate a working environment is possible.
My next plan is to upgrade my NIC to 10G dual SR fiber some time later this year, so I can move data around a bit quicker
Looking at a used Intel X520-SR2 NIC for that...

Patrick M. Hausen · May 19, 2021

Your reply got me thinking. After I first hit that problem and some poking around I took it as fact that changing the NAS network config deletes and recreates the bridges (I thought) or some such. When stop and restart of a VM did not restore connectivity for the VM, I developed the habit of rebooting after major network changes.

But if your VMs come online after a couple of minutes, I probably should investigate further. I can (almost certainly) rule out ARP. I have a static MAC address for the bridge interface itself and of course the MAC addresses of the VMs don't change. They are part of the VM device config (Devices --> NIC --> Edit) - the MAC is right there and constant. You can manually change it or transfer it to a new installation if you move a VM.

So something else is "weird" here. I'll take that to the bhyve production users call tomorrow.

To "nail down" the bridge interface MAC addresses, add these two tunables to your NAS:

Variable: if_bridge_load, Value: YES, Type: LOADER
Variable: net.link.bridge.inherit_mac, Value: 1, Type: SYSCTL

Kind regards,
Patrick

beaster · May 19, 2021

These are the system Tunables I have setup, some of these I was just trying them to see if they resolved my problems, some are based on other FreeBSD threads and may well not be functional/useful.

beaster · May 19, 2021

The "EM load updated one" was super important however as my default NIC driver was malfunction resulting in only a trickle of traffic going through the system. The moment I patched the driver and added the tunable, I was able to push traffic at line rate. The Driver for was quite out of date in the Truenas 12.0U3 build. I did not patch anything outside of this in the OS itself.

beaster · May 19, 2021

"I can (almost certainly) rule out ARP "

I have the ability to tune the MAC learning on the switch and the ARP timeouts on the remote hosts higher to resolve this too.
However, I think the MAC / ARP issue may be in the Bridge/VLAN itself on the TRUENAS server and not the Switch.

i.e. the TRUENAS bridge is not learning the remote switch MAC and not sending the host DHCP solicitation until some processes have completed.

It feels like it can take 1-2 minutes for a process to start before things start listening / learning that the switch is present via the VLAN/Bridge
Almost like a secondary process has to start/load before that part can begin.

The other obvious observation is that any time the configuration is changed the host seems to flush all of those processes, and reload the updated configuration rather than update the existing processes/memory states for those processes.

This is not based by research or fact, but more of a Qualitative observation about what happens each time I click save on the Network Config UI.
Certainly if I had the time I could validate what is happening with better test data. ie. running an external continuous PING after a config save that would see that 1-3 minute restoration time. (I actually did that prior but did not save the output)

I work for a Large CA based Network Equipment Vendor that uses a version micro Kernal of BSD for the hidden base OS, so some of these things are similar in past issues I've had to deal with blindly by collecting Qualitative data about what is happening and having the Engineering team setup test cases to validate the user experience.

beaster · Jun 16, 2021

Patrick M. Hausen said:
General advice:

don't mix untagged and tagged traffic on the same port - just don't use the "native VLAN"

put the IP addresses on the bridge interfaces, not on the VLAN interfaces for all FreeBSD based systems

This setup works perfectly well here, although I use OPNsense, not pfSense. The reason for the IP address on the bridge requirement is that it is documented for FreeBSD that any bridge member interface must not have an IP address. It breaks multicast.
The reason not to use the "native VLAN" is that first pf can get confused (rules on the untagged interface "catch" tagged traffic) and second, as soon as you configure a bridge on the physical interface, you cannot use VLANs, anymore.

physical --> LAGG --> VLAN --> bridge

Layer 3 addresses on the topmost interface, i.e. the brigde, only.

Hi Patrick,
Part of your feedback above is in regards to "it breaks multicast"

I have a requirement to run multicast UPNP to one of my virtual machines over the bridge interface on a VLAN that the NAS does not have an IP interface on. I've done a bunch of testing and it looks like none of the interfaces are forwarding Multicast correctly to any of the virtual machines.

I've tried multi-cast in the same vlan to the Virtual machine via the adjacent switch, which should be pure layer 2 forwarding.
I've tried multi-cast accross VLANs to the Virtual machine via the adjacent switch VLAN IP interface which is Layer 3 forwarding.
In the second case I tested both (IGMP snooping as well as flooding mode)

I confirmed the interfaces are enabled for multi-cast in ifconfig in the NAS, however I am not super clear if this should or should not actually work.
In either case the VLAN and the bridge interface in question have no IP address on them. That bridge is for use by a single VM (pfsense)

I'll run some more PCAP's on the NAS to see if I can figure out specifically which interface is not forwarding things correctly.
But if the is some reference learning here let me know.

Patrick M. Hausen · Jun 16, 2021

@beaster, let's start with the output of your "ifconfig" on the TrueNAS.

I run mDNS in all of my VMs and all of my jails, plus of course IPv6, so there is a lot of multicast going on that works perfectly well here.

Spyderdyne · Oct 31, 2021

Patrick M. Hausen said:
General advice:

don't mix untagged and tagged traffic on the same port - just don't use the "native VLAN"

put the IP addresses on the bridge interfaces, not on the VLAN interfaces for all FreeBSD based systems

This setup works perfectly well here, although I use OPNsense, not pfSense. The reason for the IP address on the bridge requirement is that it is documented for FreeBSD that any bridge member interface must not have an IP address. It breaks multicast.
The reason not to use the "native VLAN" is that first pf can get confused (rules on the untagged interface "catch" tagged traffic) and second, as soon as you configure a bridge on the physical interface, you cannot use VLANs, anymore.

physical --> LAGG --> VLAN --> bridge

Layer 3 addresses on the topmost interface, i.e. the brigde, only.

You are speaking strictly from the TrueNAS perspective, aren't you? These happened to catch my eye:

don't mix untagged and tagged traffic on the same port - just don't use the "native VLAN" - Ummm, what? This is literally what its for. 802.1Q is attempting to provide a carrier band/control band and pipe data bands across it. You know, like a fiber trunk basically. This also counts as hardware segmentation for security purposes, and allows us to run management traffic and workloads across the leaf/spine achitecture backbone transparently (VLAN1) without forcing us to encapsulate SNMP, Low-Level Infrastructure DHCP, TFTP, etc...
put the IP addresses on the bridge interfaces, not on the VLAN interfaces for all FreeBSD based systems - Huh? So, I can have up to 4,096 VLANs in various states of "trunk" (802.1q tagged) or "access" (in BSD is there even a port mode = access? Almost looks like it skips .1Q entirely...) and you are telling me that I need some combination of bridge/phy interfaces defined 4,096 times because the logical thing I am defining shouldn't be given an address? That's not feasible so I must be misunderstanding something here.

ALSO

so, say I have a couple LACP LAG Groups of 2 (or more) physical interfaces each. Each LAG goes from PF/OPN to a spine switch stack. Are you saying that not only is the logical VLAN forced to a single bridge on OPNSense because it has a default gateway address, but I can only use that VLAN on members of that bridge... and if I want that VLAN on a second bridge to somewhere else or to complete the HA ring of my spine stack, I have to what, create a second VLAN with the same ID and a different gateway address somehow?

Methinks BSD networking is confusing sir! LOL

Thanks.

Patrick M. Hausen · Nov 1, 2021

Spyderdyne said:
You are speaking strictly from the TrueNAS perspective, aren't you?

Of course. This is the TrueNAS community forum and the OP asked for help with their setup on TrueNAS.

Spyderdyne said:
Methinks BSD networking is confusing sir! LOL

Not quite as much as you think. Let's go through your points (and the stack) layer by layer - your questions seem to me to mix up different things so they are difficult to answer concisely. Of course FreeBSD and hence TrueNAS is not a switch. So it's not as flexible and scalable as one. It's a file server and/or hypervisor host. So it does "host networking" mostly. And I have not experienced Linux to be much different.

I'll use Cisco terminology because that is what I am familiar with.

A port that carries only untagged frames and on a switch is assigned a single fixed VLAN is an access port.
A port that carries tagged frames is a trunk port.
Two or more ports combined for resiliency and distribution are a port-channel or a link aggregation - some vendors call this a trunk.
A port-channel can be an access or a trunk port.

So let's start with the port-channel. You create an interface of type lagg and assign a number of physical interfaces as members to that. Pick a protocol (LACP in most cases) and a distribution hash. Perfectly well supported and not any different from e.g. Cisco:

Code:

interface Port-channel2
 description TrueNAS CORE
!
interface GigabitEthernet0/7
 description TrueNAS CORE
 channel-group 2 mode active
!     
interface GigabitEthernet0/8
 description TrueNAS CORE
 channel-group 2 mode active

That's just layer 2 aggregation like in any other network or host OS. So I think we can leave that out of the VLAN and bridge discussion from here on. Whenever I write "physical" that can be a single port or a lagg - all the same (as long as the other side agrees on the LACP).

Now if the host uses this port to access the net we assign an IP address and we are done.

So what about VLANs? There are two scenarios when you want to use them. The more simple one: the host wants to communicate over that VLAN to provide services to a number of clients somehow connected to the layer 2 infrastructure. So obviously the host needs a virtual "vlanX" interface that you can assign an IP address to.

In TrueNAS you create these, give them a name like "vlan10", map them to a certain parent interface (physical) and set a VLAN tag, e.g. 10. Then you can assign an IP address and the frames will reach your switch infrastructure tagged and can be received by all access ports you have in your DC that are assigned the same VLAN. Trunk ports are an inter-switch thing or switch-server thing. You don't use trunks for clients, do you?
And if the TrueNAS host does not communicate in VLAN x, why should it care about it? You don't need to create an interface for every VLAN you have in your infrastructure. Only for those you want the host to be able to communicate in via a trunk port, i.e. tagged frames.
If it's just a handful and you have the ports, of course you can connect the host to one access port per VLAN and not care about VLANs at all.

Now on to the second use of host networking - and here FreeBSD is just a bit different in terminology, the fundamental concepts are the same. Providing layer 2 access to VMs or FreeBSD native containers called "jails". You need a virtual switch for that, right? I assume you are familiar with VMware networking? Actually a FreeBSD virtual switch behaves more like a port group than a vswitch, because you cannot partition it further using VLANs. Confusing? I mean a virtual switch in FreeBSD is just one switch - one broadcast domain. So to connect VMs to N different interfaces - be them physical or VLANs, aggregated or not - you need N virtual switches. A limit, yes, but not an unreasonably one - how many VLANs hosting VMs does a single hypervisor host run? 5? 10? In the TrueNAS context, please? Anyone providing multi-tenant private cloud services with tenant isolation and BGP and hundreds to thousands of VMs will be running VMware or OpenStack or some such and not TrueNAS.

Oh, and in FreeBSD the virtual switches are called "bridges". Because a (layer 2) switch is a bridge and a bridge is a switch. "Switch" is just a marketing term. I have Dr. Radia Perlman on my side - this is a hill I would die on.

So the VMs have a (para)virtual interface inside for the guest OS to configure as it pleases and an outside "plug" that needs to be bridged to get layer 2 connectivity to "something". So to connect a VM to the VLAN 10 which runs on the "vlan10" interface which sends the frames over the link aggregation "lagg0" you create "bridge10" and put the "vlan10" interface and all VM interfaces you want to connect in there. See how it's a virtual switch, really? I hope so. And of course you can create a virtual switch/bridge without a physical member at all. Just VMs. And then one VM with two interfaces providing routing and firewalling for all others connected to that "connectionless" bridge.

These are the basic concepts and I don't think they are confusing at all. Pretty much a 1:1 mapping of the network stack to OS artefacts.

The reason why this is so frequently a cause of problems and vivid discussion with TrueNAS is manyfold:

iXsystems decided to hide all of this for the most part and silently create a "bridge0" covering the supposedly single physical interface and all jails and VMs. And also there is not much documentation on it in the TrueNAS context. There is of course the FreeBSD handbook.
If you create additional virtual switches/bridges to assign VMs to different physical/VLAN interfaces sometimes that "magic" runs wild and creates a bridge across multiple physical ports.
While FreeBSD is perfectly capable of running STP (thank you, Dr. Perlman! ) the default setting is "off". So frequently people trying out more complicated setups end up with broadcast storms.
Many users don't know all of these concepts inside out like I claim to do.

Having said that I will finish with the two or three things that are definitely FreeBSD specific, that iXsystems for some reason ignore in TrueNAS and that have become a pet peeve of mine.

If you have a virtual switch/bridge with a couple of VM endpoints and a physical/VLAN as members and you also want the TrueNAS host to be able to communicate in that same VLAN, then it is mandatory that the IP address is configured on the virtual switch/bridge interface and not on the VLAN. That's what I am repeating over and over here. This is an idiosyncrasy of the FreeBSD network stack. If you run layer 3 addresses on a bridge member, multicast will break. Documented, confirmed by developers, just the way the FreeBSD stack works.
But let's revisit Cisco about that - do you put an IP address on an access or trunk port in VLAN X? Of course not. You have:

Code:

interface VLAN 10
 ip address 1.2.3.4 255.255.255.0

And what ist "interface VLAN 10"? Your virtual switch/bridge all over again.

So in TrueNAS just put all layer 3 config on the bridge IF if there is a bridge IF. And a bridge IF is only necessary if you connect VMs or "jails". If it's just for the host, create VLAN, set IP address, go.

And yes, TrueNAS does it all wrong when you let the default "magic" work. You have one interface, you setup your NAS, you assign an IP address, you create some VMs not knowing about all this bridging stuff - you end up with the IP address on the physical and a bridge spanning that physical and a couple of VMs. Wrong wrong wrong. Why is this still the case, then? Because most people run only IPv4 and multicast is not nearly as essential here as with IPv6. So users only notice when running IPv6 or setting up mDNS or similar services for easy service discovery in a home LAN.

The second FreeBSD special: once a physical interface is member of a bridge (virtual switch) you cannot put VLANs on it. Plain does not work. That's why I wrote: physical/lagg --> VLAN --> bridge --> VM. Simple as that. N VLANs for VMs, N bridges (virtual switches).

And last: you need to disable hardware offloading because e.g. TCP checksum calculation should only take place when the host is the endpoint of the TCP connection. If you have a bridge on a physical interface that is not the case. The frame needs to go up to the bridge and then (via MAC address, of course) to the destination VM or the destination host physical/VLAN but it needs to cross the physical once without any hardware offloading interfering. That's why you must disable it.

That leaves open the philosophical question if you should run VLAN 1 untagged across trunk ports. I and all network professionals I know seem to agree that you should not. Definitely not if you have untrusted hosts connected to trunk ports. You don't want them to throw untagged frames at your management plane that are not limited by allowed-vlans (again Cisco speak).
I don't run untagged frames across trunk ports in my infrastructure at all. I have a dummy VLAN that is not used anywhere and I run switchport trunk native vlan 1001 on all trunk ports.
You can have as many access ports as necessary assigned to VLAN 1 to get at your management plane - I really don't see a problem with that. But you do you - if all hosts are trustworthy ...

And so due to my experience and the mentioned limitation that you cannot run VLANs and untagged on a physical IF if that IF will simultaneously be a bridge member, I recommend to people here to run either all untagged (an "access port") or all tagged (a "trunk port") in TrueNAS, too. Most server mainboards come with 2-4 interfaces, anyway, so you can get your untagged access to the TrueNAS control plane and have a separate interface for all your VM VLANs to your heart's content.

Hope that explains most if not all issues.
Patrick

beaster · Nov 3, 2021

Patrick M. Hausen said:
Of course. This is the TrueNAS community forum and the OP asked for help with their setup on TrueNAS.

Not quite as much as you think. Let's go through your points (and the stack) layer by layer - your questions seem to me to mix up different things so they are difficult to answer concisely. Of course FreeBSD and hence TrueNAS is not a switch. So it's not as flexible and scalable as one. It's a file server and/or hypervisor host. So it does "host networking" mostly. And I have not experienced Linux to be much different.

I'll use Cisco terminology because that is what I am familiar with.

A port that carries only untagged frames and on a switch is assigned a single fixed VLAN is an access port.
A port that carries tagged frames is a trunk port.
Two or more ports combined for resiliency and distribution are a port-channel or a link aggregation - some vendors call this a trunk.
A port-channel can be an access or a trunk port.

So let's start with the port-channel. You create an interface of type lagg and assign a number of physical interfaces as members to that. Pick a protocol (LACP in most cases) and a distribution hash. Perfectly well supported and not any different from e.g. Cisco:

Code:
interface Port-channel2 description TrueNAS CORE ! interface GigabitEthernet0/7 description TrueNAS CORE channel-group 2 mode active ! interface GigabitEthernet0/8 description TrueNAS CORE channel-group 2 mode active

That's just layer 2 aggregation like in any other network or host OS. So I think we can leave that out of the VLAN and bridge discussion from here on. Whenever I write "physical" that can be a single port or a lagg - all the same (as long as the other side agrees on the LACP).

Now if the host uses this port to access the net we assign an IP address and we are done.

So what about VLANs? There are two scenarios when you want to use them. The more simple one: the host wants to communicate over that VLAN to provide services to a number of clients somehow connected to the layer 2 infrastructure. So obviously the host needs a virtual "vlanX" interface that you can assign an IP address to.

In TrueNAS you create these, give them a name like "vlan10", map them to a certain parent interface (physical) and set a VLAN tag, e.g. 10. Then you can assign an IP address and the frames will reach your switch infrastructure tagged and can be received by all access ports you have in your DC that are assigned the same VLAN. Trunk ports are an inter-switch thing or switch-server thing. You don't use trunks for clients, do you?
And if the TrueNAS host does not communicate in VLAN x, why should it care about it? You don't need to create an interface for every VLAN you have in your infrastructure. Only for those you want the host to be able to communicate in via a trunk port, i.e. tagged frames.
If it's just a handful and you have the ports, of course you can connect the host to one access port per VLAN and not care about VLANs at all.

Now on to the second use of host networking - and here FreeBSD is just a bit different in terminology, the fundamental concepts are the same. Providing layer 2 access to VMs or FreeBSD native containers called "jails". You need a virtual switch for that, right? I assume you are familiar with VMware networking? Actually a FreeBSD virtual switch behaves more like a port group than a vswitch, because you cannot partition it further using VLANs. Confusing? I mean a virtual switch in FreeBSD is just one switch - one broadcast domain. So to connect VMs to N different interfaces - be them physical or VLANs, aggregated or not - you need N virtual switches. A limit, yes, but not an unreasonably one - how many VLANs hosting VMs does a single hypervisor host run? 5? 10? In the TrueNAS context, please? Anyone providing multi-tenant private cloud services with tenant isolation and BGP and hundreds to thousands of VMs will be running VMware or OpenStack or some such and not TrueNAS.

Oh, and in FreeBSD the virtual switches are called "bridges". Because a (layer 2) switch is a bridge and a bridge is a switch. "Switch" is just a marketing term. I have Dr. Radia Perlman on my side - this is a hill I would die on.

So the VMs have a (para)virtual interface inside for the guest OS to configure as it pleases and an outside "plug" that needs to be bridged to get layer 2 connectivity to "something". So to connect a VM to the VLAN 10 which runs on the "vlan10" interface which sends the frames over the link aggregation "lagg0" you create "bridge10" and put the "vlan10" interface and all VM interfaces you want to connect in there. See how it's a virtual switch, really? I hope so. And of course you can create a virtual switch/bridge without a physical member at all. Just VMs. And then one VM with two interfaces providing routing and firewalling for all others connected to that "connectionless" bridge.

These are the basic concepts and I don't think they are confusing at all. Pretty much a 1:1 mapping of the network stack to OS artefacts.

The reason why this is so frequently a cause of problems and vivid discussion with TrueNAS is manyfold:

iXsystems decided to hide all of this for the most part and silently create a "bridge0" covering the supposedly single physical interface and all jails and VMs. And also there is not much documentation on it in the TrueNAS context. There is of course the FreeBSD handbook.

If you create additional virtual switches/bridges to assign VMs to different physical/VLAN interfaces sometimes that "magic" runs wild and creates a bridge across multiple physical ports.

While FreeBSD is perfectly capable of running STP (thank you, Dr. Perlman! ) the default setting is "off". So frequently people trying out more complicated setups end up with broadcast storms.

Many users don't know all of these concepts inside out like I claim to do.

Having said that I will finish with the two or three things that are definitely FreeBSD specific, that iXsystems for some reason ignore in TrueNAS and that have become a pet peeve of mine.

If you have a virtual switch/bridge with a couple of VM endpoints and a physical/VLAN as members and you also want the TrueNAS host to be able to communicate in that same VLAN, then it is mandatory that the IP address is configured on the virtual switch/bridge interface and not on the VLAN. That's what I am repeating over and over here. This is an idiosyncrasy of the FreeBSD network stack. If you run layer 3 addresses on a bridge member, multicast will break. Documented, confirmed by developers, just the way the FreeBSD stack works.
But let's revisit Cisco about that - do you put an IP address on an access or trunk port in VLAN X? Of course not. You have:

Code:
interface VLAN 10 ip address 1.2.3.4 255.255.255.0

And what ist "interface VLAN 10"? Your virtual switch/bridge all over again.

So in TrueNAS just put all layer 3 config on the bridge IF if there is a bridge IF. And a bridge IF is only necessary if you connect VMs or "jails". If it's just for the host, create VLAN, set IP address, go.

And yes, TrueNAS does it all wrong when you let the default "magic" work. You have one interface, you setup your NAS, you assign an IP address, you create some VMs not knowing about all this bridging stuff - you end up with the IP address on the physical and a bridge spanning that physical and a couple of VMs. Wrong wrong wrong. Why is this still the case, then? Because most people run only IPv4 and multicast is not nearly as essential here as with IPv6. So users only notice when running IPv6 or setting up mDNS or similar services for easy service discovery in a home LAN.

The second FreeBSD special: once a physical interface is member of a bridge (virtual switch) you cannot put VLANs on it. Plain does not work. That's why I wrote: physical/lagg --> VLAN --> bridge --> VM. Simple as that. N VLANs for VMs, N bridges (virtual switches).

And last: you need to disable hardware offloading because e.g. TCP checksum calculation should only take place when the host is the endpoint of the TCP connection. If you have a bridge on a physical interface that is not the case. The frame needs to go up to the bridge and then (via MAC address, of course) to the destination VM or the destination host physical/VLAN but it needs to cross the physical once without any hardware offloading interfering. That's why you must disable it.

That leaves open the philosophical question if you should run VLAN 1 untagged across trunk ports. I and all network professionals I know seem to agree that you should not. Definitely not if you have untrusted hosts connected to trunk ports. You don't want them to throw untagged frames at your management plane that are not limited by allowed-vlans (again Cisco speak).
I don't run untagged frames across trunk ports in my infrastructure at all. I have a dummy VLAN that is not used anywhere and I run switchport trunk native vlan 1001 on all trunk ports.
You can have as many access ports as necessary assigned to VLAN 1 to get at your management plane - I really don't see a problem with that. But you do you - if all hosts are trustworthy ...

And so due to my experience and the mentioned limitation that you cannot run VLANs and untagged on a physical IF if that IF will simultaneously be a bridge member, I recommend to people here to run either all untagged (an "access port") or all tagged (a "trunk port") in TrueNAS, too. Most server mainboards come with 2-4 interfaces, anyway, so you can get your untagged access to the TrueNAS control plane and have a separate interface for all your VM VLANs to your heart's content.

Hope that explains most if not all issues.
Patrick

All of this made sense to me.
none of the recommendations are unreasonable or difficult to implement.
All of your best practices are ones that I use in my installation at home as well as at work so I'd have to agree with you on all counts.

Important Announcement for the TrueNAS Community.

VLANs Bridges and LAG Interface best practice questions

Dabbler

Hall of Famer

Dabbler

Dabbler

Dabbler

Hall of Famer

Dabbler

Dabbler

Dabbler

Hall of Famer

Dabbler

Hall of Famer

Dabbler

Dabbler

Dabbler

Dabbler

Hall of Famer

Cadet

Hall of Famer

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "VLANs Bridges and LAG Interface best practice questions"

Similar threads