Critical Interrupt #0xfe Asserted Bus Degraded.

brando56894

Wizard
Joined
Feb 15, 2014
Messages
1,537
I just switched over to SCALE nightlies on Friday and ever since then Critical Interrupt #0xfe Asserted Bus Degraded keeps showing up in my alerts. I have six of them showing up, but they seem to be sporadic. I have three from August 15th that occur at 13:11, 13:15, and 13:17 then nothing until August 20th and the happen at 11:56, 11:57, and 12:17, then nothing after that. Googling the error brings me to two reddit threads about FreeNAS/TrueNAS and people are just suggesting it being a cabling error. None of my pools are degraded and I don't notice any performance issues. I have 14 drives connected to my HBA and 3 connected to the SATA bus (two cables on the SFF-8087 cable were bad and I haven't felt like swapping them out yet, so I just used SATA cables). I don't ever remember seeing this in Arch Linux.

Anyone know how to track this down?
 

shadofall

Contributor
Joined
Jun 2, 2020
Messages
100
not sure why it would of started with nightlines. but i had that with a Dell Branded HBA, it would lock in to 4x pcie speed, swapped in a non branded LSI card of the same chip set and error went away and bus speed was reported at full speed. no clue if it was just a faulty card or some weird speed limit dell has on a dell card on running on non dell hardware, or some strange Debian thing. but check your Bus Speeds using lspci
 

brando56894

Wizard
Joined
Feb 15, 2014
Messages
1,537
Yep looks like you're right, I had bought this one off of ebay like 2 years ago. Apparently this has been limiting me since then, but I never noticed it. This may be the root cause of mine (and a handful of others) issue where it wasn't being detected quickly enough.

43:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] (rev 02)
Subsystem: Broadcom / LSI SAS 9201-16i
...
LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s <64ns
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-

LnkSta: Speed 5GT/s (ok), Width x4 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

Edit: swapping PCI slots didn't help. I had it in an x8 slot and moved it to a dedicated x16 slot but it made no difference.
 
Last edited:

shadofall

Contributor
Joined
Jun 2, 2020
Messages
100
you might be right. cause i had swapped that dell card out with the generic LSI card i had in my OMV machine, and i think i started having issues with the pools not being available at boot every now and then on the OMV system
 

brando56894

Wizard
Joined
Feb 15, 2014
Messages
1,537
I know the firmware is the latest on my HBA, but I think the BIOS is a version or two behind, maybe if it flash the newest one it will fix it. When I bought it, it was already flashed to IT mode. I hope it's not a faulty one, because I really don't wanna drop like $480 on a new one, I could always take my chances and buy a used one off of ebay, but then I run the risk of the same problem I guess.
 

shadofall

Contributor
Joined
Jun 2, 2020
Messages
100
not sure. my bios was up to date (dec 2020) at that time. pretty sure latest on the HBA. since i couldn't find a definitive answer online i just went back and got a card that matched the one i swapped in, pretty sure it used the same driver as well, so I've just always leaned towards either bad card or dell added something it that restricts width when not paired with a dell MB. either is sadly possible. luckily i only run an 8i so significantly cheaper than a 16i
 

brando56894

Wizard
Joined
Feb 15, 2014
Messages
1,537
I'll try flashing the new BIOS at some point, and also try it in my desktop rig just to make sure it's not the motherboard, it don't see why it would be though. Probably an issue with the card itself. Thanks for the ideas.
 
Top