Dedicated iSCSI Network with VLAN and LACP config not working.

eptesicus

Dabbler
Joined
Sep 12, 2018
Messages
20
I am building a new NAS to serve up flash storage over iSCSI to my vSphere cluster, running TrueNAS Core 13.0-U5. Below is my current networking config on my new NAS. The management piece works, but the iSCSI network config does not. I have confirmed with other hardware that the switch-side is configured correctly, but for whatever reason with TrueNAS, I am not able to communicate with the SAN over my iSCSI VLAN. The iSCSI config must have VLAN 54 tagged and the NICs in an LACP/dynamic LAGG. Even in testing without LACP configured on the switch side with only one configured NIC in TrueNAS, I was unsuccessful. I've also removed the VLAN from both sides of the config without success.

What must I do differently to get iSCSI network config working? No MAC addresses show up under VLAN 54 on the mac address table of my switches.

Management Network:
Interface: ix0
IP Address: 10.0.50.35/24
This is connected back to my switch on an interface with VLAN 50 untagged as the native VLAN. The default route in my global config is the gateway on this network, 10.0.50.1.
1686090348533.png




iSCSI Network:
Interface: mlxen0 & mlxen1
VLAN: 54
LAGG: LACP
I have my 40GbE NICs configured with LACP and on VLAN 54 which is my iSCSI VLAN. All interfaces are up, and the port-channel on the switch side is working as expected with this VLAN as confirmed with other hardware (not TrueNAS). There is no gateway for this network. Setting the IP on the VLAN versus the LAGG makes no difference. Yes, MTU should be set to 9216 to match the switch-side config.
1686090479822.png


1686090513715.png
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The LACP driver is dependent on support from the ethernet device driver, and there are a number of ethernet cards which do not support full functionality when you start twisting all the knobs to 11. I am not certain that the Mellanox cards are capable of simultaneous LACP, VLAN, and jumbo functioning correctly all simultaneously. You should probably rip apart the LAGG and see if you can get it working with just a single mlxen interface directly on vlan 54, then see if it works with MTU 9216, then try again with LAGG on both interfaces. My suspicion is that it will break when you introduce the LAGG.

I would only expect the Chelsio or Intel cards to work with such a complicated setup. It's a nightmare of layering issues involving firmware offload implementation quirks, and the Mellanox driver is already known to be subpar (think: now supported by Nvidia).
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Unless I am misuderstanding you.
You have the native VLAN on the switch ports as 54 - as its native VLAN its expecting untagged packets
You are tagging the packets on the TN as 54.......
 

eptesicus

Dabbler
Joined
Sep 12, 2018
Messages
20
The LACP driver is dependent on support from the ethernet device driver, and there are a number of ethernet cards which do not support full functionality when you start twisting all the knobs to 11. I am not certain that the Mellanox cards are capable of simultaneous LACP, VLAN, and jumbo functioning correctly all simultaneously. You should probably rip apart the LAGG and see if you can get it working with just a single mlxen interface directly on vlan 54, then see if it works with MTU 9216, then try again with LAGG on both interfaces. My suspicion is that it will break when you introduce the LAGG.

I would only expect the Chelsio or Intel cards to work with such a complicated setup. It's a nightmare of layering issues involving firmware offload implementation quirks, and the Mellanox driver is already known to be subpar (think: now supported by Nvidia).

Thanks for the info. I've already confirmed that the setup does not work when doing a single interface and removing the port-channel on the switch-side.

This NAS was supposed to have a Chelsio card, but it was DOA so I'm awaiting a replacement which may take a couple weeks. I'm too impatient for that, so I'm hoping I can get it working with this card for now, then replace with the Chelsio later.
 

eptesicus

Dabbler
Joined
Sep 12, 2018
Messages
20
Unless I am misuderstanding you.
You have the native VLAN on the switch ports as 54 - as its native VLAN its expecting untagged packets
You are tagging the packets on the TN as 54.......

The switch interface configs have the VLAN tagged, not set natively.

I've already tried removing the VLAN config entirely, with no luck. I don't believe I can set the interfaces on the switch with VLAN 54 being native. Reason being is that these are MX9116N switches in a Dell MX7000 chassis, utilizing SmartFabric. It appears as though I can only tag the VLAN or untag it, but not make it native.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
This NAS was supposed to have a Chelsio card, but it was DOA so I'm awaiting a replacement which may take a couple weeks. I'm too impatient for that, so I'm hoping I can get it working with this card for now, then replace with the Chelsio later.

I'm pretty sure that you will find at least one of the complications to be a fatal problem. Try doing without them one at a time.

What happened back in the day with the 1G Intel drivers, as an example, was that certain complicated configurations would work fine, but if you introduced (I don't recall exactly) VLANs into a LAGG setup, the LAGG stuff is an artificial virtual ethernet interface in software, but the VLAN tagging was offloaded to the ethernet hardware, and the layering support needed some patching in the Intel driver and LAGG driver to pass that through in a manner that the hardware offload could support. There are frequently combinations of complications that result in busted networking, but it has gotten better over the years.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
The switch interface configs have the VLAN tagged, not set natively.

I've already tried removing the VLAN config entirely, with no luck. I don't believe I can set the interfaces on the switch with VLAN 54 being native. Reason being is that these are MX9116N switches in a Dell MX7000 chassis, utilizing SmartFabric. It appears as though I can only tag the VLAN or untag it, but not make it native.

You did say "This is connected back to my switch on an interface with VLAN 54 untagged as the native VLAN."
 

eptesicus

Dabbler
Joined
Sep 12, 2018
Messages
20
You did say "This is connected back to my switch on an interface with VLAN 54 untagged as the native VLAN."
That was a typo and I've corrected it. Thanks.

So, I've set only a single interface up on a different VLAN, 51, that has a gateway and works elsewhere.

Here's the port config on the switch:
Code:
interface ethernet1/1/17:1
 description "VII-SAN01 iSCSI"
 no shutdown
 switchport mode trunk
 switchport trunk allowed vlan 51
 negotiation on
 flowcontrol receive off


Here's the output of ifconfig for the mlxen0 interface and VLAN 51 on the NAS:
Code:
mlxen0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9216
        options=ed07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        ether e4:xx:xx:xx:xx:xx
        media: Ethernet autoselect (40Gbase-CR4 <full-duplex,rxpause,txpause>)
        status: active
        nd6 options=9<PERFORMNUD,IFDISABLED>

vlan51: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9216
        options=680703<RXCSUM,TXCSUM,TSO4,TSO6,LRO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        ether e4:xx:xx:xx:xx:xx
        inet 10.0.51.99 netmask 0xffffff00 broadcast 10.0.51.255
        groups: vlan
        vlan: 51 vlanproto: 802.1q vlanpcp: 0 parent interface: mlxen0
        media: Ethernet autoselect (40Gbase-CR4 <full-duplex,rxpause,txpause>)
        status: active
        nd6 options=9<PERFORMNUD,IFDISABLED>


There is something wrong with my NIC or some issue with TrueNAS. It doesn't matter if I tag or untag either end, nothing appears to work. I've even removed my management interface, changed the default gateway in the config to be on the 10.0.51.0/24 VLAN 51 network, with no success.

I use Mellanox 10GbE SFP+ NICs in a couple other TrueNAS Scale boxes with no issues there...
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
There is something wrong with my NIC

Yes. It isn't completely compatible with FreeBSD. The reason I recommend particular interface types in the 10 Gig Networking Primer is because they are known to work well under all conditions in both FreeBSD and Linux. Mellanox isn't one of them, just as many other kinds of ethernet interfaces don't work {well, at all} in at least one of {FreeBSD, Linux}. Complaining about it here is unfortunately not going to get you very far. Mellanox authored their own drivers for FreeBSD, and with the merger with nVidia, it is quite likely that the folks that wrote it are no longer at Mellanox.

Please consider moving to a card that is known to do more of its stuff correctly.
 

eptesicus

Dabbler
Joined
Sep 12, 2018
Messages
20
Yes. It isn't completely compatible with FreeBSD. The reason I recommend particular interface types in the 10 Gig Networking Primer is because they are known to work well under all conditions in both FreeBSD and Linux. Mellanox isn't one of them, just as many other kinds of ethernet interfaces don't work {well, at all} in at least one of {FreeBSD, Linux}. Complaining about it here is unfortunately not going to get you very far. Mellanox authored their own drivers for FreeBSD, and with the merger with nVidia, it is quite likely that the folks that wrote it are no longer at Mellanox.

Please consider moving to a card that is known to do more of its stuff correctly.
I'm not complaining about compatibility issues. It's just surprising that the same model card with the same firmware works in another server that I also installed Core on this afternoon. I was able to get 40GbE networking working there as I wanted, but not on this hardware.

I'll consider this solved since it appears to be a hardware issue.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Check your firmwares. They're probably different. The Mellanox is yet another high speed card that is incredibly twitchy about firmware versions, as they try to offload a bunch of work onto the card.
 
Top