LACP Flapping

ali_v001

Dabbler
Joined
Aug 10, 2020
Messages
37
Hi, I keep getting flapping issues, my nic ran flawlessly for over 12 months, its been through multiple revisions of freenas and truenas and is always kept upto date.

Recently its started flapping a lot, the machine will be fine for a few hours then flapp and drop out altogether leaving me having to reboot just for the same thing to happen again. I just swapped my 10tek dac's for intel compatible fibre modules to test and the same thing happens?

Any idea's? I've ruled out the switch as i have other machines connected to it which work fine Mikrotik - CRS305-1G-4S+IN

So the only things I can think thats left are truenas software bugs or a faulty nic? It ran faultlessly for over 12 months, why now? I doubt it could be incompatability?

Chelsio T520 - CR is the nic, currently running U7 release

This is the alert I recieve after rebooting, then shortly after the ports will drop again

The following alert has been cleared:
* These ports are not ACTIVE on LAGG interface lagg0: cxl0, cxl1. Please check cabling and switch.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Most of the faster-than-gigabit ethernet cards run hot. A dual X520, for example, will happily dissipate about 12 watts, and if you haven't arranged for airflow over cards like HBA's and ethernets, you can inadvertently cook your controller.

You haven't described the chassis in use, but a common problem is for people to place these in tower/deskside chassis where the cards end up laying horizontally, and then for you to have used normal solid PC PCIe slot blanks. This won't work. At least not well.

Some random Google Image searching around yielded this photo:

Lenovo-ideacentre-Y900-GTX-1080-12.jpg


and you will notice both the black vented PCIe slot blanks, AND the use of vented brackets for the two cards (maybe just one double card, dunno).

The other thing that needs to happen, besides venting, is for there to be actual airflow across the cards and out the back. Failure to arrange cooling is the most common cause of HBA/ethernet card failures, and, unfortunately, it IS possible to do permanent damage to these cards if you cook them too hot.

Note that airflow includes topics such as dust remediation. I've come across horribly dusty servers over the years, one of my favorites being

filthy-tq-chassis.jpg


For cleaning, lock up the fans with a spudger, and then use something like a DataVac ED-500-ESD, or a can of air if you must.
 

ali_v001

Dabbler
Joined
Aug 10, 2020
Messages
37
Hi interesting and thanks! I did think maybe overheating especially with it happening over time,
The case is a fractal xl r2, its already has vented pci slot blanking plates.

I have now re-arranged my pci slots, ive put the two ssd's at the top no room in between them to save air space for the nic and hb. Now both the hba and nic now have at least one slot's worth of gap top and bottom. I'll see how this runs and report back. If not I may have to add more fans,
The internals of the chasis are squeeky clean and really roomy, so I should be able to get plenty of circulation,

What are your thoughs on pci slot air blowers? Something like a Antec cyclone blower? Just the first thing that popped up on google, but maybe neat as i could blow air directly at the nic with one of those

This chasis probably isnt ideal, 'it is a desktop case' however its about the biggest i could fine and has plenty of fan slots, i could still fit two more at the top.

Do you think I have damaged the nic in some way? Or would it just not work at all if I had really fried it?

I initially ruled out overheating as this thing held its own, over the whole summer, the machine suffered some very less than ideal temperatures, but without a glitch, however it;s very cold now so is playing up at the coldest part of the year
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Antec cyclone blower?

These (probably including Antec's) seem to all be produced by just a handful of manufacturers in Asia, and they will typically seize up in a year or three.

Here in the shop, we have several little 4" AC desktop fans, they're metal with a base that lets you point them straight down, which is super-convenient when working on 1U servers. Most of these seem to now be made out of plastic and powered by USB. You might want to see if you can find something like that if you can get it positioned to temporarily blow air across the card while inside the case. Use electrical or doublesided tape to hold it in place if needed. Arranging cooling on a permanent basis inside PC cases can be annoying and frustrating, and if it turns out that cooling wasn't the problem, a waste of time, too. Hence temporary fans. :smile:

think I have damaged the nic in some way? Or would it just not work at all if I had really fried it?

This stuff is all susceptible to being partially fried. The transistor count in modern electronic devices is amazing, and that you could reliably etch billions of transistors blows my mind. SMD components are tiny. They make TV's out of LED's?!?!? And the risk of one component being marginal or busted ruining the whole show makes it just amazing that any of it works at all, ever.
 

ali_v001

Dabbler
Joined
Aug 10, 2020
Messages
37
Just reporting back, after reconfiguring the pci slots, the nic's stayed up overnight for the first time in a while,
All i have done is give them more space, and whilst i was in the chasis a quick air spray although it was already pretty clean.
I'll try and leave on for 7 days and see if it drops at all, but looking promising so far
 

ali_v001

Dabbler
Joined
Aug 10, 2020
Messages
37
Hey just updating here, no problems since, I did purchase one of those cheap pci slot blowers to be on the safe side 'also to know ive got air being pulled over the nic' very quiet aswell, and it does pull some serious air for a few pennies, it may or may not last that long but we'll see. Super cheap to replace if i get a year or so out of it.
So i'd definitley put it down to the nic overheating,
thanks for the help
 
Top