InfiniBand 40Gb Mellanox Installation on TrueNAS Scale?

RapidScampi

Cadet
Joined
Oct 15, 2021
Messages
9
Hi,

what’s the issue with ConnectX-3, I haven't found an OS where the nics don't work, at least if the ports are set to eth.
If you have trouble to do it with truenas, then install CentOS 7, the setting only has to be done once, then you can use the cards like any other nic.

# mlxconfig -d /dev/mst/mt4099_pciconf0 set LINK_TYPE_P1=2
# mlxconfig -d /dev/mst/mt4099_pciconf0 set LINK_TYPE_P2=2

I have stopped using EoIB since I switched to Manjaro with my PC, it is simply more convenient and 40GB eth is completely sufficient even for my ISCSI luns.

if you need more than 5 meters of cable, I can recommend 40GBASE-eSR4 QSFP+ transceiver and for connecting to a 10GB switch Mellanox Breakout Cables 40G > 4x10G

Regards
I also use Manjaro on my desktop. I found that some of the Arch literature relating to IB setup is a bit outdated but it wasn't difficult getting the above CX3 set up on my desktop. I'm also connected to an IB switch using a QSFP+ active optical IB cable that's 40m long and there's barely any difference in latency and throughput between this and a direct-attached copper QSFP+ 0.5m IB cable.
 

engineerdj

Dabbler
Joined
Jan 17, 2022
Messages
21
@engineerdj - well spotted

It's worth noting that userspace configuration changes made via CLI don't persist through reboots. As such, you need to set a simple command to run on startup through the GUI. This is done as follows:

System Settings > Advanced > Edit Init/Shutdown Script
Type = Command
Command = ``modprobe ib_ipoib && modprobe ib_umad``
When = Post init
Enabled = Yes
Timeout = 10

This ensures that the NICs are loaded automatically after a reboot.

Thanks for this. I had to put this aside for a sec but now have some more time to get back to it. I think where I left off was that I was able to get the card loaded and appearing in SCALE using your instructions above, but I don't think I was able to get it completely working possibly due to the mode that the card was in. I'll test this script out to see if it solves the persistence problem and keep at it.

I have other 25GB Mellanox cards in Windows 10 workstations and verified that those are in ethernet mode and working correctly when testing transfers between them. I just need to get this card working so I do the same with SCALE. So close.
 

engineerdj

Dabbler
Joined
Jan 17, 2022
Messages
21
This is the return from running the modprobe commands in the CLI.

1651323850680.png


1651326621134.png


This is what then appears in SCALE.

1651323773550.png


I try to bring up the NIC running the command you indicated:

1651326275743.png


But I'm not seeing a driver version above, so I think that could my problem, seeing mlx5_core but no mlx5en.

1651335475473.png


Have a couple of copy jobs running, so I can't test the post init script yet.
 
Last edited:

engineerdj

Dabbler
Joined
Jan 17, 2022
Messages
21
Mellanox card appears in ifconfig, but isn't identified as ethernet.

Could be in IB mode by default, as noted by others, but can't install any of the Mellanox tools to change it.

Is there a tunable that can be used to modify this parameter?

I checked sysctl and I can't see anything related to Mellanox or mlx.

1651588441338.png
 
Last edited:

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
SCALE is too locked down to install the Mellanox tools. You'll need to temporarily boot from USB to another Linux distro to run mlxconfig -y -d /dev/mst/<device port PCI config path> set LINK_TYPE_P1=2.
 

10H

Cadet
Joined
May 19, 2022
Messages
1
I would like to add another data point for someone still using the old GEN2 cards as mentioned in the thread.

The Mellanox ConnectX-2 EN example below work fine with TrueNAS SCALE 22.02.1 and is supported by the mlx4_core kernel driver as per the mainline Linux kernel.

Code:
lspci -v
83:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)
        Subsystem: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]
        Flags: bus master, fast devsel, latency 0, IRQ 36, NUMA node 1, IOMMU group 68
        Memory at fbe00000 (64-bit, non-prefetchable) [size=1M]
        Memory at fb000000 (64-bit, prefetchable) [size=8M]
        Expansion ROM at fbd00000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 3
        Capabilities: [48] Vital Product Data
        Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [148] Device Serial Number 00-02-c9-03-00-54-b8-e6
        Kernel driver in use: mlx4_core
        Kernel modules: mlx4_core



Code:
dmesg |grep mlx
[    2.788091] mlx4_core: Mellanox ConnectX core driver v4.0-0
[    2.788207] mlx4_core: Initializing 0000:83:00.0
[    5.136372] mlx4_core 0000:83:00.0: 32.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x8 link)
[    5.235522] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
[    5.235852] mlx4_en 0000:83:00.0: Activating port:1
[    5.236091] mlx4_en: 0000:83:00.0: Port 1: enabling only PFC DCB ops
[    5.248166] mlx4_en: 0000:83:00.0: Port 1: Using 24 TX rings
[    5.248256] mlx4_en: 0000:83:00.0: Port 1: Using 16 RX rings
[    5.248473] mlx4_en: 0000:83:00.0: Port 1: Initializing port
[    5.249124] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v4.0-0
[    5.249810] <mlx4_ib> mlx4_ib_add: counter index 1 for port 1 allocated 1
[    5.250246] mlx4_core 0000:83:00.0 enp131s0: renamed from eth0
 

cru-rootsyn

Cadet
Joined
Sep 12, 2023
Messages
5
I was able to get the mellanox cards flashed and did all that and was still at a loss as to why i was not able to make connect-x 3 cards with a Mellanox SX6036 switch to work.. Fought with it for a long time and what folks do not mention in any of the online resources is that an unlicensed switch will only work with infiniband. You have to license the MLNX-OS version in order to get it to run in VPI (dual ib and eth mode) or eth mode. Good luck navigating the nvidia site to find where you can by this license.

In the end I was able to get it done and if you search and hack around enough you will find a way to get your switch unlocked. I spent many hours making these things work.
 

BloodyIron

Contributor
Joined
Feb 28, 2013
Messages
133
I was able to get the mellanox cards flashed and did all that and was still at a loss as to why i was not able to make connect-x 3 cards with a Mellanox SX6036 switch to work.. Fought with it for a long time and what folks do not mention in any of the online resources is that an unlicensed switch will only work with infiniband. You have to license the MLNX-OS version in order to get it to run in VPI (dual ib and eth mode) or eth mode. Good luck navigating the nvidia site to find where you can by this license.

In the end I was able to get it done and if you search and hack around enough you will find a way to get your switch unlocked. I spent many hours making these things work.
What steps did you need to take and what were the benchmark/testing results, etc? C'mon you gotta share more pls! XD
 
Top