10 Gbps in bridge drops when client reboots

Status
Not open for further replies.

Jon K

Explorer
Joined
Jun 6, 2016
Messages
82
I have an atypical networking setup and I realize that this is not supported. I have two ESXi hosts that are directly connected to my FreeNAS 9.10 box. The FreeNAS box has a dual port SFP+ Chelsio adapter with the interfaces bridged with the following "post init" script:

ifconfig cxgbe1 mtu 9000; ifconfig cxgbe1 up; ifconfig bridge create; ifconfig bridge0 addm cxgbe0 addm cxgbe1 up

Here is a diagram of the situation:
FreeNAS_10Gbps_Diagram.jpg


I am running FreeNAS-9.10.2-U1 (86c7ef5) as of 2-13-2017. I don't remember this happening on 9.10-STABLE-201606270534 that I was running on since 7-10-2016.

When I reboot ESX Host 1, for some reason, it seems like the bridge interface on the FreeNAS node goes down or becomes inaccessible to ESX Host 2. I can see in the console that I am remote from this setup, so it's hard to test live since I lose my access for a moment. When I reboot the ESXi host, I see the corresponding port on FreeNAS console report down (cxgbe0). I see cxgbe0 and cxgbe1 up when both hosts are up (right now). The bridge0 interface is up as well. In my GUI I see the following:

TK7X5cY.png


I only see that cxgbe0 has the IP associated with it, then cxgbe0 and cxgbe1 live together as bridge0. Does this mean that the IP is bound to cxgbe0 exclusively and the bridge is fully dependent on that interface? If that's the case, then that's why my storage goes down from ESX Host 2's perspective.

Thoughts?
 

Jon K

Explorer
Joined
Jun 6, 2016
Messages
82
OK - so, I think I figured this out.

I have 3 interfaces of interest: cxgbe0, cxgbe1, and bridge0. Bridge0 is created at boot via a post-init script.

I had originally followed as per a thread I created many months ago https://forums.freenas.org/index.php?threads/bridging-10gbe-with-chelsio-t420-cr.44605/

The problem is that I was setting the IP up on cxgbe0 (00:07:43:11:22:50) and thus my NFS server for datastore mounts on ESXi hosts points to 00:07:43:11:22:50. So, this is important: ESX1 is cabled to cxgbe0 and ESX2 is cabled to cxgbe1 - if the IP is configured on cxgbe0 then whenever ESX1 reboots, cxgbe0 loses the link status and 00:07:43:11:22:50 no longer replies at 172.16.100.102.

This although vMotion/NFS/etc. all worked like this, I couldn't truly reboot ESX1. I could reboot ESX2 because cxgbe1 had no IP configured and ESX1 maintains connection on cxgbe0 where the IP lives. This won't work in a HA cluster, obviously.

So, what I did was, against recommendation, delete the configuration for cxgbe0 from the GUI, that obviously dropped all my NFS mounts from both hosts (I had to svMotion all my VMs to iSCSI LUNs on the FreeNAS box to do this with no downtime). Once I removed the IP config, I SSH'd into the FreeNAS box and did:

Code:
ifconfig bridge0 inet 172.16.100.102/24; ifconfig cxgbe0 up; ifconfig cxgbe1 up


After that, I can now do ifconfig and see:

Code:
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
		ether 02:33:08:54:b4:00
		inet 172.16.100.102 netmask 0xffffff00 broadcast 172.16.100.255
		nd6 options=1<PERFORMNUD>
		id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
		maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
		root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
		member: cxgbe1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
				ifmaxaddr 0 port 4 priority 128 path cost 2000000
		member: cxgbe0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
				ifmaxaddr 0 port 3 priority 128 path cost 2000


I could probably turn down maxaddr to limit the ARP table but whatever not a big deal for this "static" interface. Anyway, with this done, I can now reboot EITHER half of the 2-port bridge and not lose connection on the other half! What a relief.
 
Status
Not open for further replies.
Top