InfiniBand 40Gb Mellanox Installation on TrueNAS Scale?

roberto0610

Cadet
Joined
Nov 28, 2018
Messages
8
Hi every one. Roberto here trying to get my Mellanox InfiniBand dual port setup with my new TrueNAS Scale box.
I am just testing the platform with the Idea to use same server to as Plesk server or even JellyFin. Me and my wife have an small photo and video studio.
Currently we have a TrueNAS 12 hosting our production data. Recently we acquired a new Dell R730 with more storage space more CPU power and even has one nvme that we wan as z3 cache to use for all our video editing needs, proxy files, and 4k raw footage.

About the actual metal:
Dell R730XD with Dual Intel E5-2620 v3 (six cores cpus) [will upgrade to dual E5-2699 v4 as 22 cores]
64GB RAM DDR4 ECC, 16x SAS Spinning 8Tb drives as NON-RAID or HBA-Mode
1x Samsung 6.4Tb NVMe SSD as storage cache.
1x dual port 10Gb Mellanox internal detected and working.
1x Dual port Mellanox 40Gb InfiniBand detected as code: when using the lspci command
Code:
03:00.0 InfiniBand: Mellanox Technologies MT25408A0-FCC-QI ConnectX, Dual Port 40Gb/s InfiniBand


Then what we want to achieve when possible is get this TruNAS SCALE server to get connected to out 40GB InfiniBand Network initially to teste the SCALE platform and to use this same hardware in the future to host other small VMs for internal projects.

Any suggestion recommendation will be much appreciated.
 

ornias

Wizard
Joined
Mar 6, 2020
Messages
1,458
Infiniband is not officially supported. Nor has it many users here. It's debian so you might be able to hack something on your own risk, but don't expect it to keep working or many people able/willing to support you with it. Its a bad idea (tm)

I also don't get why the heck someone would buy connectx cards with inifiband instead of ethernet over (q)sfp+ the price different is not thathigh
 

roberto0610

Cadet
Joined
Nov 28, 2018
Messages
8
I am sorry by not been clear. My Mellanox card hast dual (q)sfp+ ports and to correct myself. I do plan to them in Ethernet mode. I am not sure if what do I need to do to install the driver for such a card. I'll try to compile from mellanox drivers. Will post back my findings. Thanks.
 
Last edited:

ornias

Wizard
Joined
Mar 6, 2020
Messages
1,458
I have no idea what you are doing, because there are 3 types of mellanox cards:
- eth only
- eth/infiniband multimode
- infiniband only

As far as I can find your card doesn't even support ethernet mode.
It also looks to be Connectx-2, which isn't supported out of the box.

So you're running an not-supported networking protocol, with a not-supported Network card.
Just don't.
 

shadofall

Contributor
Joined
Jun 2, 2020
Messages
100
What's the worst that could happen? :tongue:
space time collapses on its self. always the worse case on any action ;)

in seriousness. user thrashes the install/data trying to get it to work, repeatedly. or user gets it to work but since it would all be outside the middleware it gets undone every update. leading to annoyance and frustrations and calls for help with unfavorable answers.
 

roberto0610

Cadet
Joined
Nov 28, 2018
Messages
8
You @ornias are very knowledgeable. @bodly and @shadofall thank you and all for your comments and all for encouraging me to the right path.
Please excuse me as I thought all (q)sftp+ cards from Mellanox had the same capacity. So far I am replacing the MHQH29B-XTR (removed) for this other Mellanox model: CX354A. It's a ConnectX-3 FDR InfiniBand +40GigE.
as a side note: I have 2x Windows 10 workstations and one more Debian distro system as my current production NAS (Rockstor distro) and I had it working at 40GbE for over 2 years now running with the same CX354A using same firmware and hardware version.

I just did a clean install of TrueNAS SCALE with the CX354A installed and was not able to get it configure by default.
This is how it show up in the terminal when typing: # lspci | grep Mellanox
Code:
03:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]


Any guidance on how to get the driver installed will be much appreciated.
 

ornias

Wizard
Joined
Mar 6, 2020
Messages
1,458
Because the CX354A is set to infiniband by default and needs its firmware modified/reconfigured to run eth mode.

I would suggest google and some work of your own, because this really is not that hard to solve and you should already know it as you have picked a dual mode (VPI) card to replace a IB only card.
 

ZiggyGT

Contributor
Joined
Sep 25, 2017
Messages
125
You @ornias are very knowledgeable. @bodly and @shadofall thank you and all for your comments and all for encouraging me to the right path.
Please excuse me as I thought all (q)sftp+ cards from Mellanox had the same capacity. So far I am replacing the MHQH29B-XTR (removed) for this other Mellanox model: CX354A. It's a ConnectX-3 FDR InfiniBand +40GigE.
as a side note: I have 2x Windows 10 workstations and one more Debian distro system as my current production NAS (Rockstor distro) and I had it working at 40GbE for over 2 years now running with the same CX354A using same firmware and hardware version.

I just did a clean install of TrueNAS SCALE with the CX354A installed and was not able to get it configure by default.
This is how it show up in the terminal when typing: # lspci | grep Mellanox
Code:
03:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]


Any guidance on how to get the driver installed will be much appreciated.
Did you get this config to work? I have two Freenas servers and a few win 10 clients . I also have 3 of the melanox dual port combo cards. I would like to daisy chain them to connect 3 machines together via three DAC cables, no switch, with one freenas system acting as a bridge. I had given up, given the boards away, but they came back. Now I need to try again. Appreciate you feedback on the feasibility. First step is to get my 10gb stuff working. i hope to use the sw bridge with that as well.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399

ZiggyGT

Contributor
Joined
Sep 25, 2017
Messages
125
@ZiggyGT, did you try a Google search first? This is the first hit I got when searching for "Mellanox ethernet mode":


You just need a temporary Linux box with the appropriate Mellanox tools to change the mode of each NIC before deploying them to FreeNAS.
Thanks for the link. I tried this before. It really is a non intuitive way to configure. It was about a year ago. I could not get it to recognize it on FREENAS. I then tried it on windows. Boards were properly recognized as Ethernet but did not work. I gave the cards to a friend because I was focusing on 10Gb. He never tried them and he gave them back. I saw this thread and it is the same stuff. It gave me some hope. I also am trying to get my 2nd server back up. Last time I worked on it was November 2020 due to Covid. I am back at it.
 

RapidScampi

Cadet
Joined
Oct 15, 2021
Messages
9
Not sure why everyone's so flippant on this forum. It's not a ridiculous question and considering Scale is in Beta it strikes me as a pretty sensible time to test connectivity options.

Hopefully this provides something a little more constructive for others to have a dabble with. This is assuming that your IB cards have the correct FW on:

Check to see if your IB card is listed as an ethernet class device (prefixed with ib typically)
Code:
ls /sys/class/net
eno1  eno3  kube-bridge    lo            veth39ad2f67
eno2  eno4  kube-dummy-if  veth27b1269c


Check which Infiniband modules are loaded with the kernel:
Code:
lsmod | egrep '(^mlx|^ib)'
mlx4_ib               225280  0
ib_uverbs 167936 1 mlx4_ib
ib_core 409600 2 mlx4_ib,ib_uverbs
mlx4_core 376832 1 mlx4_ib


The above is the typical output from Debian 10 and 11, and was copied from TrueNAS Beta 21.08 shell. There are two kernel modules which are shipped with Debian but not loaded by default. These are ib_ipoib and ib_umad. You can load them by modprobing the modules, and you can ensure persistence by adding them to the /etc/modules file so that they're loaded at start-up:
Code:
modprobe ib_ipoip
modprobe ib_umad
cat <<EOT >> /etc/modules
ib_ipoib
ib_umad
EOT


Check to see if ethernet-class device is now present:
Code:
eno1  eno3  ibs5    kube-bridge    lo            veth39ad2f67
eno2  eno4  ibs5d1  kube-dummy-if  veth27b1269c


The console log in TrueNAS GUI should indicate that it's found the interfaces:
Code:
Oct 15 09:48:20 nas1 kernel: mlx4_core 0000:84:00.0 ibs5: renamed from ib0
Oct 15 09:48:20 nas1 kernel: mlx4_core 0000:84:00.0 ibs5d1: renamed from ib1


Set the interface to connected mode at shell and change the MTU to 65520, which is optimal for IP networks:
Code:
echo connected > /sys/class/net/ibs5/mode
ifconfig ibs5 mtu 65520 up


I have two hosts on the same 1Gb copper network (192.168.201.0/24) and the same IB network (10.0.20.0/24). They're both single-hopped with switches. Here's a ping comparison:
Code:
PING 10.0.20.221 (10.0.20.221) 56(84) bytes of data.
64 bytes from 10.0.20.221: icmp_seq=1 ttl=64 time=0.098 ms
64 bytes from 10.0.20.221: icmp_seq=2 ttl=64 time=0.099 ms
64 bytes from 10.0.20.221: icmp_seq=3 ttl=64 time=0.099 ms
64 bytes from 10.0.20.221: icmp_seq=4 ttl=64 time=0.093 ms

PING 192.168.201.221 (192.168.201.221) 56(84) bytes of data.
64 bytes from 192.168.201.221: icmp_seq=1 ttl=64 time=0.296 ms
64 bytes from 192.168.201.221: icmp_seq=2 ttl=64 time=0.224 ms
64 bytes from 192.168.201.221: icmp_seq=3 ttl=64 time=0.237 ms
64 bytes from 192.168.201.221: icmp_seq=4 ttl=64 time=0.254 ms


Qperf is the best tool for testing transfer rates with IB cards. As we're using TCP only and drawing comparisons with a 1Gb copper connection, we'll only run tests that work with both interfaces. Qperf isn't installed on Scale or available in the repos, so you'll need to install it using the deb file (or add the repo to sources). Latest version at the time of writing:
Code:
wget http://ftp.br.debian.org/debian/pool/main/q/qperf/qperf_0.4.11-2_amd64.deb
dpkg -i qperf_0.4.11-2_amd64.deb


On one host just run qperf to set it to server mode. Testing ethernet first:
Code:
qperf 192.168.201.202 tcp_bw tcp_lat udp_bw udp_lat
tcp_bw:
bw = 110 MB/sec
tcp_lat:
latency = 116 us
udp_bw:
send_bw = 119 MB/sec
recv_bw = 14.3 MB/sec
udp_lat:
latency = 100 us


Compare that to IBoIP:
Code:
qperf 10.0.20.201 tcp_bw tcp_lat udp_bw udp_lat
tcp_bw:
bw = 2.06 GB/sec
tcp_lat:
latency = 19.4 us
udp_bw:
send_bw = 2.21 GB/sec
recv_bw = 2.21 GB/sec
udp_lat:
latency = 16.4 us


Interested to hear the thoughts and findings of others.
 

AdamR01

Cadet
Joined
Apr 22, 2021
Messages
5
I'm using a MCX354A-QCBT ConnectX-3 in ethernet mode and it seems ok*. This card can only do 10 Gb in ethernet mode, not 40 Gb like in Infiniband mode though.

The drivers were available out of the box. I installed the Mellanox Firmware Tools deb into TrueNAS SCALE and was able to use it to adjust the port mode and flash firmware. I didn't leave it installed though as I didn't want to have any issues with upgrades. Probably not a big issue since I don't think it really pulled in anything extra dependency wise, but its been a while since I did it.

* I do have some weird PCIe link issues that I didn't have (or didn't notice) with other OSes on the same hardware. The card is in a PCIe 3.0 x8 slot.

Code:
truenas# dmesg | grep mlx
[    3.326391] mlx4_core: Mellanox ConnectX core driver v4.0-0
[    3.326462] mlx4_core: Initializing 0000:03:00.0
[    3.326573] mlx4_core 0000:03:00.0: enabling device (0000 -> 0002)
[   10.485791] mlx4_core 0000:03:00.0: DMFS high rate steer mode is: disabled performance optimized steering
[   10.486145] mlx4_core 0000:03:00.0: 16.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x8 link at 0000:00:03.1 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link)
[   10.541473] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
[   10.541748] mlx4_en 0000:03:00.0: Activating port:1
[   10.544719] mlx4_en: 0000:03:00.0: Port 1: Using 4 TX rings
[   10.544734] mlx4_en: 0000:03:00.0: Port 1: Using 4 RX rings
[   10.545045] mlx4_en: 0000:03:00.0: Port 1: Initializing port
[   10.545594] mlx4_en 0000:03:00.0: registered PHC clock
[   10.545965] mlx4_en 0000:03:00.0: Activating port:2
[   10.546787] mlx4_en: 0000:03:00.0: Port 2: Using 4 TX rings
[   10.546798] mlx4_en: 0000:03:00.0: Port 2: Using 4 RX rings
[   10.547477] mlx4_en: 0000:03:00.0: Port 2: Initializing port
[   10.549241] mlx4_core 0000:03:00.0 ens1: renamed from eth0
[   10.576801] mlx4_core 0000:03:00.0 ens1d1: renamed from eth0
[   10.580495] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v4.0-0
[   10.581153] <mlx4_ib> mlx4_ib_add: counter index 2 for port 1 allocated 1
[   10.581163] <mlx4_ib> mlx4_ib_add: counter index 3 for port 2 allocated 1
[   71.258039] mlx4_en: ens1: Steering Mode 1
 

ZiggyGT

Contributor
Joined
Sep 25, 2017
Messages
125
I had tried to get some Mellanox 40Gb cards working a long time ago. I received another card free when it was shipped to me by mistake. This article made me take a harder look at them. Looks like these won't be any faster than 10Gb like I hoped so probably abandon the project to use them. I have $60 invested (mostly the cable). I cannot figure out are these boards compatible with each other? I see IB is not supported but what is QDR-IB vs other IB protocols? The cards look identical to a newbie like me. I had hoped to connect two Freenas servers together with these cards. If only 10Gb, i'll just use my 10Gb cards I have been using. Looking at the Truenas documents for compatibility, I did not see any Mellanox cards of any variety listed. For a while before Chelsio, Mellanox seemed to be a favorite. Has that changed for the SPF+ 10Gb Mellanox X2 and X3 cards. I currently use them in my Freenas installations.
1634691241385.png
 

RapidScampi

Cadet
Joined
Oct 15, 2021
Messages
9
@ZiggyGT don't confuse IPoIB and an IB card in ethernet mode. QDR has a maximum data transfer rate of 40Gbps (it's actually 32Gb/s due to the way it's encoded). By using the included IPoIB drivers you can connect TCP and UDP sockets on top of IB.

Ethernet typically needs to 'negotiate' a connection speed and the most common delineations are 1Gb/s, 10Gb/s, 40Gb/s, 100Gb/s. QDR with IPoIB is able to negotiate a 40Gb/s ethernet connection even though the transfer speeds are likely to fall short of that (32Gb/s realistic). There is a CPU overhead in doing so, but if you've got the compute resource on your systems then this is a really cost effective way of increasing your network speeds.

In my previous post I shared the transfer speeds recorded with qperf without having done any tweaking or tinkering:
send_bw = 2.21 GB/sec
This is equates to 17.68Gb/s which is ~50% of what I'd expect for QDR, which makes me think I'm only using two of the available four lanes.

The maximum theoretical throughput of 10GbE is 1.25GB/s, so there's a massive difference even with a connection running significantly slower than expected.

There's a great answer on Stack Exchange that explains the difference between IPoIB and ethernet mode HCAs:
https://stackoverflow.com/questions/6051832/difference-between-ipoib-and-tcp-over-infiniband

It's also worth noting that the MCX354A can be flashed with FDR firmware. FDR is encoded differently and allows speeds of up to 56Gb/s in IB and also connects at 40Gb/s for ethernet traffic but with better real world speeds. I'm yet to experiment with this but if you're looking to get every ounce of speed possible from your card kit then it's worth taking a peek at.

Here's a short page that explains the different IB speeds (albeit only up to FDR):
https://www.advancedclustering.com/act_kb/infiniband-types-speeds/

Hope that adds some food for thought!

Cheers
 

ZiggyGT

Contributor
Joined
Sep 25, 2017
Messages
125
@ZiggyGT don't confuse IPoIB and an IB card in ethernet mode. QDR has a maximum data transfer rate of 40Gbps (it's actually 32Gb/s due to the way it's encoded). By using the included IPoIB drivers you can connect TCP and UDP sockets on top of IB.

Ethernet typically needs to 'negotiate' a connection speed and the most common delineations are 1Gb/s, 10Gb/s, 40Gb/s, 100Gb/s. QDR with IPoIB is able to negotiate a 40Gb/s ethernet connection even though the transfer speeds are likely to fall short of that (32Gb/s realistic). There is a CPU overhead in doing so, but if you've got the compute resource on your systems then this is a really cost effective way of increasing your network speeds.

In my previous post I shared the transfer speeds recorded with qperf without having done any tweaking or tinkering:

This is equates to 17.68Gb/s which is ~50% of what I'd expect for QDR, which makes me think I'm only using two of the available four lanes.

The maximum theoretical throughput of 10GbE is 1.25GB/s, so there's a massive difference even with a connection running significantly slower than expected.

There's a great answer on Stack Exchange that explains the difference between IPoIB and ethernet mode HCAs:
https://stackoverflow.com/questions/6051832/difference-between-ipoib-and-tcp-over-infiniband

It's also worth noting that the MCX354A can be flashed with FDR firmware. FDR is encoded differently and allows speeds of up to 56Gb/s in IB and also connects at 40Gb/s for ethernet traffic but with better real world speeds. I'm yet to experiment with this but if you're looking to get every ounce of speed possible from your card kit then it's worth taking a peek at.

Here's a short page that explains the different IB speeds (albeit only up to FDR):
https://www.advancedclustering.com/act_kb/infiniband-types-speeds/

Hope that adds some food for thought!

Cheers
Thanks for the great response. I will take a look at the references. I was definitely confused about the IPoIB and Ethernet mode on IB card. I am trying to recover my 10Gb network. If Mellanox x2 and x3 cards are not supported in TrueNas I am screwed. I have been using them with FREENAS just fine. Will load on my second machine and see what happens. After that I’ll try to link the two server with the 40gb cards and try to get that speed.
 

engineerdj

Dabbler
Joined
Jan 17, 2022
Messages
21
@RapidScampi thanks to your write-up, i'm making some progress. not quite there yet but it i'm finally seeing the interfaces in truenas scale gui.

i did want to mention that there's a typo in your code section that should be modprobe ib_ipoib rather than ib_ipoip which i figured out in the process.
 

janos66

Dabbler
Joined
Feb 18, 2022
Messages
21
Hi,

what’s the issue with ConnectX-3, I haven't found an OS where the nics don't work, at least if the ports are set to eth.
If you have trouble to do it with truenas, then install CentOS 7, the setting only has to be done once, then you can use the cards like any other nic.

# mlxconfig -d /dev/mst/mt4099_pciconf0 set LINK_TYPE_P1=2
# mlxconfig -d /dev/mst/mt4099_pciconf0 set LINK_TYPE_P2=2

I have stopped using EoIB since I switched to Manjaro with my PC, it is simply more convenient and 40GB eth is completely sufficient even for my ISCSI luns.

if you need more than 5 meters of cable, I can recommend 40GBASE-eSR4 QSFP+ transceiver and for connecting to a 10GB switch Mellanox Breakout Cables 40G > 4x10G

Regards
 

RapidScampi

Cadet
Joined
Oct 15, 2021
Messages
9
@engineerdj - well spotted

It's worth noting that userspace configuration changes made via CLI don't persist through reboots. As such, you need to set a simple command to run on startup through the GUI. This is done as follows:

System Settings > Advanced > Edit Init/Shutdown Script
Type = Command
Command = ``modprobe ib_ipoib && modprobe ib_umad``
When = Post init
Enabled = Yes
Timeout = 10

This ensures that the NICs are loaded automatically after a reboot.
 
Top