SweetAndLow
Sweet'NASty
- Joined
- Nov 6, 2013
- Messages
- 6,421
Ok, so let's start by saying I don't work with this stuff for a job and it's basically all brand new to me. After some time fighting things I have finally gotten everything set up and running(I think, it's only been a couple hours). I wanted to make a post documenting the struggles I had and the way things behaved.
My goal: I don't exactly need a 10gig network card but I thought it was going to be fun and was something new to play with. I already own the switch and was using lacp with my 1gig interfaces. Moving to 10gig would remove the network bottleneck and now my pool is the slow piece of the puzzle.
Hardware:
X10SRL, E5-1620v3, 128GB Samsung Memory, Ubiquity Unifi Switch 48, Supermicro 846 chassis, LSI 9211i(20.00.07.00 firmware), Chelsio T520-CR
The Chelsio T420-CR was what I initially bought but the ebay seller sent me the T420-SO-CR(no on-board memory and supports fewer connections) by mistake so I returned it. After that I noticed that the T520-CR is basically the same price now so that is what I eventually bought.
There are 3 parts to this adventure.
1. Broadcast storm
Ok so in my setup I have several jails that use vnet and vimage. When I first plugged in the new nic I just plugged in both interfaces and did not configure any link aggregation on the switch or in freenas. I figured I'll set it up as I go. Having 2 interfaces on the same subnet isn't necessary correct but I didn't think it was going to cause any problems, O boy was I wrong. In the process of setting everything up I needed to restart my Unifi5 jail so I could access the webui and setup the ports on the switch for aggregation. Well when you restart a jail and there are 2 interfaces on the same subnet this somehow creates a loop and broadcast storms the entire network and takes the whole thing down. And on top of this STP did not catch the loop because I guess it was in the jail networking stack or something(if anyone knows more please comment below). At one point my wifi access point even went offline from all the traffic, it was red in unifi.
Well after unplugging everything and rebooting things I got the network back up. To configure things I first setup aggregation on the switch ports, then plugged in the 10gig card and connected it to the switch. From here I used the console to configure lacp link aggregation and setup the interface, I used ipv4 and dhcp since I use mac address assigned ip's from the dhcp server on my network. So at this point I think I got things working.
2. Disable offloading
Taking a little side step, I noticed that in the console i was seeing
This really confused me but after some research I found out that it's most likely caused by my jails and how they use their own networking stack. So in the end this is no big deal other than some performance loss from no offloading.
3. Link State changed UP/DOWN
Now for the one that caused me the most problems. After I got the LAG setup and jails up and running I kepty seeing these log messages spamming /var/log/messages.
Logs from FreeNAS switch in /var/log/messages
Logs from Ubiquity switch in /var/log/messages
These would never stop. When I did a ping to the server I got about 25% packet loss so the interface was actually going down and affecting my network. When I unplugged cxl0 the messages went away and everything worked just fine using the cxl1 interface of the LAG. <= this part still confuses me why it was only the first interface. I took the card out and tried it in a linux server with a mikrotik switch and it worked just fine on both interfaces. So I knew the card and direct attach copper cables I was using worked and are not flawed. So I just started searching and reading, it took me about a week but I eventually found this thread:
https://community.ubnt.com/t5/UniFi...nt-2m-DAC-gt-Chelsio-T520-CR-link/m-p/2494470
And it was almost my exact problem. I bought some chelsio transceivers and 10gtek ubiquity transceivers and some multi mode fiber. Hooked everything up and instantly all my problems went away. There must be some kind of compatibility issue between Ubiquity switches and Chelsio. There is talk of vendor lock in with sfp+ stuff but the feeling I got when looking things up was that was a thing of the past, well turns out it is not a thing of the past.
I'm sure I have forgotten some stuff so I might update this post. I hope this helps someone in the future or feel free to ask questions because I spent way to much time working through everything and doing debugging.
My goal: I don't exactly need a 10gig network card but I thought it was going to be fun and was something new to play with. I already own the switch and was using lacp with my 1gig interfaces. Moving to 10gig would remove the network bottleneck and now my pool is the slow piece of the puzzle.
Hardware:
X10SRL, E5-1620v3, 128GB Samsung Memory, Ubiquity Unifi Switch 48, Supermicro 846 chassis, LSI 9211i(20.00.07.00 firmware), Chelsio T520-CR
The Chelsio T420-CR was what I initially bought but the ebay seller sent me the T420-SO-CR(no on-board memory and supports fewer connections) by mistake so I returned it. After that I noticed that the T520-CR is basically the same price now so that is what I eventually bought.
There are 3 parts to this adventure.
1. Broadcast storm
Ok so in my setup I have several jails that use vnet and vimage. When I first plugged in the new nic I just plugged in both interfaces and did not configure any link aggregation on the switch or in freenas. I figured I'll set it up as I go. Having 2 interfaces on the same subnet isn't necessary correct but I didn't think it was going to cause any problems, O boy was I wrong. In the process of setting everything up I needed to restart my Unifi5 jail so I could access the webui and setup the ports on the switch for aggregation. Well when you restart a jail and there are 2 interfaces on the same subnet this somehow creates a loop and broadcast storms the entire network and takes the whole thing down. And on top of this STP did not catch the loop because I guess it was in the jail networking stack or something(if anyone knows more please comment below). At one point my wifi access point even went offline from all the traffic, it was red in unifi.
Well after unplugging everything and rebooting things I got the network back up. To configure things I first setup aggregation on the switch ports, then plugged in the 10gig card and connected it to the switch. From here I used the console to configure lacp link aggregation and setup the interface, I used ipv4 and dhcp since I use mac address assigned ip's from the dhcp server on my network. So at this point I think I got things working.
2. Disable offloading
Taking a little side step, I noticed that in the console i was seeing
Code:
cxl0: tso4 disabled due to -txcsum. cxl0: tso6 disabled due to -txcsum6. cxl0: enable txcsum first cxl1: tso4 disabled due to -txcsum. cxl1: tso6 disabled due to -txcsum6. cxl1: enable txcsum first
This really confused me but after some research I found out that it's most likely caused by my jails and how they use their own networking stack. So in the end this is no big deal other than some performance loss from no offloading.
3. Link State changed UP/DOWN
Now for the one that caused me the most problems. After I got the LAG setup and jails up and running I kepty seeing these log messages spamming /var/log/messages.
Logs from FreeNAS switch in /var/log/messages
Code:
Jan 14 10:45:09 tubby kernel: cxl0: link state changed to DOWN Jan 14 10:45:09 tubby kernel: cxl0: link state changed to DOWN Jan 14 10:45:10 tubby kernel: cxl0: link state changed to UP Jan 14 10:45:10 tubby kernel: cxl0: link state changed to UP Jan 14 10:45:10 tubby kernel: cxl0: link state changed to DOWN Jan 14 10:45:10 tubby kernel: cxl0: link state changed to DOWN Jan 14 10:45:11 tubby kernel: cxl0: link state changed to UP Jan 14 10:45:11 tubby kernel: cxl0: link state changed to UP
Logs from Ubiquity switch in /var/log/messages
Code:
Jan 14 16:14:40 US-48 daemon.notice switch: TRAPMGR: Link Up: 0/50 Jan 14 16:14:40 US-48 daemon.notice switch: TRAPMGR: Link Down: 0/50 Jan 14 16:14:41 US-48 daemon.notice switch: TRAPMGR: Link Up: 0/50 Jan 14 16:14:41 US-48 daemon.notice switch: TRAPMGR: Link Down: 0/50 Jan 14 16:14:42 US-48 daemon.notice switch: TRAPMGR: Link Up: 0/50 Jan 14 16:14:43 US-48 daemon.notice switch: TRAPMGR: Link Down: 0/50
These would never stop. When I did a ping to the server I got about 25% packet loss so the interface was actually going down and affecting my network. When I unplugged cxl0 the messages went away and everything worked just fine using the cxl1 interface of the LAG. <= this part still confuses me why it was only the first interface. I took the card out and tried it in a linux server with a mikrotik switch and it worked just fine on both interfaces. So I knew the card and direct attach copper cables I was using worked and are not flawed. So I just started searching and reading, it took me about a week but I eventually found this thread:
https://community.ubnt.com/t5/UniFi...nt-2m-DAC-gt-Chelsio-T520-CR-link/m-p/2494470
And it was almost my exact problem. I bought some chelsio transceivers and 10gtek ubiquity transceivers and some multi mode fiber. Hooked everything up and instantly all my problems went away. There must be some kind of compatibility issue between Ubiquity switches and Chelsio. There is talk of vendor lock in with sfp+ stuff but the feeling I got when looking things up was that was a thing of the past, well turns out it is not a thing of the past.
Code:
Jan 15 17:04:45 tubby kernel: cxl0: link state changed to UP Jan 15 17:04:45 tubby kernel: cxl0: link state changed to UP Jan 15 17:06:26 tubby cxl1: transceiver unplugged. Jan 15 17:06:26 tubby kernel: cxl1: link state changed to DOWN Jan 15 17:06:26 tubby kernel: cxl1: link state changed to DOWN Jan 15 17:06:31 tubby cxl1: 10Gbps SR transceiver inserted. Jan 15 17:10:34 tubby kernel: cxl1: link state changed to UP Jan 15 17:10:34 tubby kernel: cxl1: link state changed to UP root@tubby:~ # ifconfig lagg0 lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=ac00b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6> ether 00:07:43:31:44:00 inet 192.168.1.183 netmask 0xffffff00 broadcast 192.168.1.255 nd6 options=9<PERFORMNUD,IFDISABLED> media: Ethernet autoselect status: active groups: lagg laggproto lacp lagghash l2,l3,l4 laggport: cxl0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> laggport: cxl1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> root@tubby:~ # ifconfig cxl0 cxl0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=ac00b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6> ether 00:07:43:31:44:00 hwaddr 00:07:43:31:44:00 nd6 options=9<PERFORMNUD,IFDISABLED> media: Ethernet 10Gbase-SR <full-duplex,rxpause,txpause> status: active root@tubby:~ # ifconfig cxl1 cxl1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=ac00b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6> ether 00:07:43:31:44:00 hwaddr 00:07:43:31:44:08 nd6 options=9<PERFORMNUD,IFDISABLED> media: Ethernet 10Gbase-SR <full-duplex,rxpause,txpause> status: active
I'm sure I have forgotten some stuff so I might update this post. I hope this helps someone in the future or feel free to ask questions because I spent way to much time working through everything and doing debugging.
Last edited: