Solarflare NIC causing kernel panic under load

allen402

Cadet
Joined
Jan 16, 2023
Messages
3
Hello all, I am running TrueNAS Scale Bluefin Release with a Solarflare S7120 Dual 10Gbps NIC, and when transferring a large amount of files, after about 5 seconds the system freezes and kernel panics on this and the Angelfin Release. I've tried to force a crash using iperf3, but it ran for 10 minutes at full 10 gigabit without any issues, and when switching to the 1 Gbps NIC on the motherboard, the upload goes smoothly. Any other changes I made to the hardware resulted in no difference, the issues only occurs with the solarflare NIC. Any ideas on what the problem could be?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Per forum, please supply the rest of the hardware configuration and the pool layout.

Grasping at straws, (without further information), it is possible that you have a memory problem. Using the 1Gbps NIC simply does not use as much memory under load. But, the 10Gbps NIC does.

Have you run a very long memtest session?
 

allen402

Cadet
Joined
Jan 16, 2023
Messages
3
Per forum, please supply the rest of the hardware configuration and the pool layout.

Grasping at straws, (without further information), it is possible that you have a memory problem. Using the 1Gbps NIC simply does not use as much memory under load. But, the 10Gbps NIC does.

Have you run a very long memtest session?
Oh sorry about that! The computer model is a Dell Precision T7910.
Hardware:
Dual E5-2667v4
256GB DDR4 ECC 2400
60 GB Boot SSD
4x12TB Seagate Exos HDD in a RAIDZ1

I haven’t run a memtest on it, but I did run the Dell Hardware Diagnostic and it returned no issues on the memory. I’ll run the memtest now.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Thanks for the info. Since you are using real server hardware, (with ECC memory), it is probably not the memory. But, little harm in running a check.

If the memory test does not return anything, my only other suggestion is heat.

Does the Solarflare card get hot?

Perhaps it needs some extra cooling. Or the system in general, (as older servers sometimes collect dust that should be blown out before re-use).
 
Last edited:

allen402

Cadet
Joined
Jan 16, 2023
Messages
3
Thanks for the info. Since you are using real server hardware, (with ECC memory), it is probably not the memory. But, little harm is running a check.

If the memory test does not return anything, my only other suggestion is heat.

Does the Solarflare card get hot?

Perhaps it needs some extra cooling. Or the system in general, (as older servers sometimes collect dust that should be blown out before re-use).
Yeah, no issues in the memory so far, but the card does get hot enough that I added a fan over the heatsink. However, checking the temperatures it seems that the fan didn't help much, the regulator temperature reports that it idles at 71C
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
I don't have any other ideas, perhaps someone else can make a suggestion.
 
Top