Intel 10Gb NICs drop under heavy load

Status
Not open for further replies.

kurtc

Dabbler
Joined
Dec 17, 2017
Messages
39
We have a recently deployed FreeNAS 11.1 system that is serving up data to a single Windows host over iSCSI from zvol extents/targets. The original NIC was an Intel X520-DA2 with 10G-SR fiber connection through an HP 2920 switch to the host. We were seeing that after heavy load for 20-45 minutes (4-5Gbit per second of continuous sequential iSCSI traffic), the connection would completely drop and both the FreeNAS system and the HP 2920 switch would log the interface going offline and then shortly thereafter coming back online. Almost like the NIC driver crashed. This obviously wreaks havoc with the VM running on that host when the storage is ripped out from underneath it.

Our first troubleshooting steps were to replace fiber cables, then SFP+ modules at both ends, then the 2920 Switch (there are two stacked to choose from), and finally replacing the NIC altogether with an X710-DA2 that we also had in stock. We can repeat the issue on demand within that referenced time window. We tried switching media from fiber to copper and replacing the NIC with an Intel X540-T2 into the same switches, with the same behavior. I don't have any other brands like Chelsio on hand to test with. We also tried disabling TSO and the drop/reset/whatever is happening, still happens.

It almost seems like this issue referenced here https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221919 but I don't know enough about .diff's and compiling to even try it.

Any help would be greatly appreciated.

Thanks,
Kurt
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Do they have proper airflow over their heatsinks? That's also a possibility.

You can see if 11.0 works for you, with the older driver, until this gets fixed in 11.1.
 

kurtc

Dabbler
Joined
Dec 17, 2017
Messages
39
I forgot to mention we were running an update level of 11.0 originally and upgraded to 11.1 as another troubleshooting step. As for the heat, during one of the NIC swaps we tried a different slot with even more airflow.

Thanks!
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Well, that's weird. Intel 10GbE has been working well for many people on 11.0.
 

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
With the X710 have you updated the NVM image to 6.01 and tried ixl driver version 1.9.5?
 

kurtc

Dabbler
Joined
Dec 17, 2017
Messages
39
I did upgrade to the latest NVM image posted a couple days ago. sysctl shows I am running Intel(R) Ethernet Connection XL710/X722 Driver, Version - 1.7.12-k
 

kurtc

Dabbler
Joined
Dec 17, 2017
Messages
39
I also received this error in my nightly logs...

> ixl0: WARNING: queue 1 appears to be hung!
> ixl0: WARNING: Resetting!
> ixl0: Malicious Driver Detection event 2 on TX queue 4, pf number 0
> ixl0: MDD TX event is for this function!


It corresponds to when I lose all iSCSI connectivity
 

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
pm sent
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
That seems to match the bug report. You might want to ping it so they know it's not an isolated case.
 

kurtc

Dabbler
Joined
Dec 17, 2017
Messages
39
OK, added a +1 to the FreeBSD bug report.

I am trying a driver fix sent by bigphil but before I installed that the FreeNAS box had an "Unauthorized system reboot". Where in the logs do I look to see if this issue caused the system to panic or if it was something else?
 

kurtc

Dabbler
Joined
Dec 17, 2017
Messages
39
Looks like I have a crash now under the same iSCSI load. Even after swapping out the the 10Gb Intel NICs for Chelsio.

Screen Shot 2017-12-29 at 12.24.31 AM.png
 

paranaz

Cadet
Joined
Jan 27, 2018
Messages
2
Hello all, I'm experiencing the same issue with a Dell R640 and the Intel X710 adapter with latest fw; the box crashes after a few minutes of heavy load and I get the "Malicious Driver Detection" message in the syslog.

I noticed this at boot:
Code:
ixl1: fw 6.80.48603 api 1.7 nvm 6.00 etid 800034e6 oem 18.4608.6
ixl1: The driver for the device detected a newer version of the NVM image than expected.
Please install the most recent version of the network driver.


Is there a way to install the latest ixl driver? It looks to me it's compiled into the kernel, i.e. the kldload/tunable trick doesn't work; I'm using FN11.1-U1, the nightly train and the HEAD train have an old driver too.
Code:
dev.ixl.1.fw_version: fw 6.80.48603 api 1.7 nvm 6.00 etid 800034e6 oem 18.4608.6
dev.ixl.1.%desc: Intel(R) Ethernet Connection XL710/X722 Driver, Version - 1.7.12-k


Cheers
 

mxj

Cadet
Joined
Feb 19, 2018
Messages
1
I wish i did my research about the x710 cards.... :mad:

Using driver 1.7.12:
root@fnas02:/var/log # dmesg | grep 710
ixl0: <Intel(R) Ethernet Connection XL710/X722 Driver, Version - 1.7.12-k> mem 0xf8800000-0xf8ffffff,0xf97f0000-0xf97f7fff irq 32 at device 0.0 on pci4
ixl1: <Intel(R) Ethernet Connection XL710/X722 Driver, Version - 1.7.12-k> mem 0xf9800000-0xf9ffffff,0xf97f8000-0xf97fffff irq 32 at device 0.1 on pci4

I got "Malicious driver" in my logs..
Feb 15 19:10:31 fnas02 ixl1: MDD TX event is for this function!ixl1: WARNING: queue 3 appears to be hung!
Feb 15 19:10:37 fnas02 ixl1: MDD TX event is for this function!ixl1: Interface stopped DISTRIBUTING, possible flapping
Feb 15 19:13:19 fnas02 ixl1: MDD TX event is for this function!ixl1: WARNING: queue 6 appears to be hung!
Feb 15 19:14:58 fnas02 ixl1: MDD TX event is for this function!ixl1: Malicious Driver Detection event 2 on TX queue 774, pf number 1

Bigphil wrote "X710 have you updated the NVM image to 6.01 and tried ixl driver version 1.9.5?".

I'm new to FreeNAS, how do I update the nvm image?
Does anyone have the 1.9.5 driver compiled?

Thanks!
//m
 

bigphil

Patron
Joined
Jan 30, 2014
Messages
486

jvioncek

Cadet
Joined
Nov 21, 2017
Messages
8
Same issue with X722. Has anyone found a fix or know if the issue has been resolved in 11.1-U2?
 

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
Same issue with X722. Has anyone found a fix or know if the issue has been resolved in 11.1-U2?
11.1-U2 includes the latest ixl driver for that card (1.9.5). I'm not sure if the X722 has a serviceable firmware, but if it does you should update it to the latest. You can also try the firmware for the XL710 and see if it detects the X722 as a compatible update.
 
Status
Not open for further replies.
Top