TrueNas Core 13 - Broken bridge config triggered by bhyve/VM crash

jpdw

Cadet
Joined
Sep 26, 2022
Messages
1
Hi, ( Apologies if this case has been covered elsewhere but couldn't see anything directly relevant by searching )

TrueNas Core 13 seems to be mis-configuring the Bridge between Jails & VM following a crash of bhyve, leaving the Jails running but disconnected from the network.

There are 2 physical interfaces, and 2 vlan interfaces
igb0 - connected to the main network. No vlans. igb0 has a static IP address (10.1.1.32/24) configured in Network/Interfaces
igb1 - configured (but currently NOT connected) to be connected to a network containing 2 VLANs (11 and 99) that do not route to the 10.1.1.1/24 network). Physically disconnected in order to aid debugging this problem.
vlan11 - Assigned to VLAN 11 on igb1. Down as igb1 is disconnected
vlan99 - Assigned to VLAN 99 on igb1. Down as igb1 is disconnected

There are 3 running jails have a single vnet interface configured (by default, I think) to use VNT, with ipv4 interface set to "vnet0", network properties' interfaces set to 'vnet0:bridge0'

There are 2 VMs:
- A FreeBSD VM not running (but configured) with a VirtIO interface onto igb0 and also interfaces onto 'vlan11' and 'vlan99'
- A Windows 10 VM running, with a VirtIO interface onto igb0

(All config has been done via the UI, not via the command line.)

In "working" state, bridge0 is exists, is up and has all the member interfaces I'd expect:

Code:
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    ether 58:9c:fc:10:ff:e1
    id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
    maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
    root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
    member: vnet0.3 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 11 priority 128 path cost 2000
    member: vnet0.2 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 10 priority 128 path cost 2000
    member: vnet0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 9 priority 128 path cost 2000000
    member: vnet0.1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 8 priority 128 path cost 2000
    member: igb0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 1 priority 128 path cost 20000
    groups: bridge
    nd6 options=9<PERFORMNUD,IFDISABLED>


But there is a 'random' problem with Windows or bhyve crashing (not sure which, might be both) where TrueNAS keeps running, the jails all keep running BUT when bhyve starts again, bridge1 is created, the VM assigned to bridge1 and the igb0 connection moved from bridge0 to bridge1. Leaving the jails in bridge1 without a connection to igb0,

Code:
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    ether 58:9c:fc:10:ff:e1
    id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
    maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
    root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
    member: vnet0.3 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 12 priority 128 path cost 2000
    member: vnet0.2 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 11 priority 128 path cost 2000
    member: vnet0.1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 8 priority 128 path cost 2000
    groups: bridge
    nd6 options=9<PERFORMNUD,IFDISABLED>
bridge1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    ether 58:9c:fc:10:ff:fa
    id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
    maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
    root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
    member: vnet0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 10 priority 128 path cost 2000000
    member: igb0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 1 priority 128 path cost 20000
    groups: bridge
    nd6 options=9<PERFORMNUD,IFDISABLED>


A further reboot of FreeNAS and the bridging looks back as it should be -- but rebooting the whole thing because Windows (or bhyve) has had an issue doesn't seem like it ought to be necessary. I know the trigger for this is Windows/bhyve failing (and I'm trying to find root cause for that) but it seems something in TrueNAS is also not handling it right and severing the jails is compounding the problem. I'm also making the assumption that the reconfig of the bridge is a symptom of Windows/bhyve failing and not the other way around....

Any clues/suggestions?

Some specific questions...

1/
Should I manually configure the bridge, explicitly, and manually set it for all jails & VM?

2/
Should I configure a 2nd bridge (on an unlikely auto number e.g. bridge9) specifically for the Windows VM - so that TrueNAS has no need to mess with the 'main' bridge if Windows/bhyve fails?

Thanks!
 
Top