TrueNAS-12.0-U1.1 Infiniband support?

alpha754293

Dabbler
Joined
Jul 18, 2019
Messages
47
I am testing out TrueNAS Core 12.0-U1.1 and trying to set up a NFSoRDMA share running on top of ZFS on my system.Relevant

Relevant hardware is as follows:

CPU: Intel Core i7-3930K
Motherboard: Asus X79 Sabertooth
RAM: 32 GB (4x 8 GB DDR3-1600 Crucial Ballistix Sport I think. I actually don't remember anymore)
GPU: Nvidia Quadro 600
OS drive: 1x HGST 1TB SATA 3 Gbps 7200 rpm HDD for OS
Hard drives: 4x HGST 1TB SATA 3 Gbps 7200 rpm HDD for data (will be configured in raidz, but I am trying to get Infiniband up and running first)
Infiniband card: Mellanox ConnectX-4 dual port VPI 100 Gbps 4x EDR Infiniband (MCX456-ECAT)

Infiniband switch: Mellanox MSB-7890 externally managed switch

I do have another system on the Infiniband network that's currently running OpenSM on CentOS 7.7.

Output from pciconf -lv is as follows:
Code:
mlx5_core0@pci0:3:0:0:  class=0x020700 card=0x001415b3 chip=0x101315b3 rev=0x00 hdr=0x00
    vendor     = 'Mellanox Technologies'
    device     = 'MT27700 Family [ConnectX-4]'
    class      = network
    subclass   = InfiniBand
mlx5_core1@pci0:3:0:1:  class=0x020700 card=0x001415b3 chip=0x101315b3 rev=0x00 hdr=0x00
    vendor     = 'Mellanox Technologies'
    device     = 'MT27700 Family [ConnectX-4]'
    class      = network
    subclass   = InfiniBand


Output from dmesg | grep "Mellanox"
Code:
mlx5en: Mellanox Ethernet driver 3.5.2 (September 2019)
mlx5en: Mellanox Ethernet driver 3.5.2 (September 2019)


kldstat doesn't show any ib kernel modules being compiled/loaded.

I found this thread where they were able to get the ConnectX-2 40 Gbps IB adapters to work in TrueNAS 11, but they don't really say how they were able to get it up and running.

It just shows the screenshots where they were able to add the mellanox card as an interface, whereas I currently, still can't do that.

(cf. https://www.truenas.com/community/threads/40gb-mellanox-card-setup.51343/post-447336)

According to the hardware release notes for FreeBSD 12.0 (Source: https://www.freebsd.org/releases/12.0R/hardware/#support), it does say that the mlx5en_core driver supports my card, but what it doesn't say is whether the ports need to be in ETH mode or whether it can stay in IB mode.

If anybody has any ideas as to how I can get my Infiniband up and running in TrueNAS Core 12, that would be greatly appreciated.

(Please also let me know if I need to just run/execute the FreeBSD MLNX OFED driver installation procedure because my thought process was that since the mlx5en_core driver was already in TrueNAS 12, I thought that Infiniband would work "out of the box".)

If my thought processes are incorrect in regards to this, please feel free to let me know.

Thank you.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
TrueNAS should have all Infiniband code available as kernel modules (ibcore, mlx5ib, ipoib, etc.), just not loaded by default. But as far as I can tell, it is only usable for IPoIB now, since I don't think there are any RDMA services, such as NFS in the TrueNAS itself. Though I've never used Infiniband myself, so can't say much.
 

alpha754293

Dabbler
Joined
Jul 18, 2019
Messages
47
For those that might be interested, I got it up and running.

Please take and make note that I am running OpenSM on a different system (I didn't work on trying to get OpenSM up and running on TrueNAS Core 12.0 U1.1), so take that for what it is worth.

(You MIGHT be able to get away with it if you can pass your Mellanox card (or at least one of the ports to a VM running CentOS or something, because to get the OpenSM to run on CentOS is ridiculously easy.)

(To get the OpenSM to run on CentOS 7.7, do this:

# yum -y install opensm
# chkconfig opensm on)

That's it.

Please also note that you CANNOT run OpenSM on ANY Windows version (even Server/Datacenter 2019). Mellanox doesn't support that, so if you are running a point-to-point connection, if you want to run Windows-to-Windows, you WILL need to change the port type from IB to ETH, IF you are using VPI cards. Check with Mellanox's website and documentation to make sure whether the card that you are getting and/or that you have purchased supports this feature. In my testing, running the port as ETH type slows it down by about 1-3%. YMMV.

Also note that if you are running a Mellanox MANAGED switch (as opposed to an externally managed switch like the one that I am running), it will have OpenSM running on it (or you can configure it as such).

For me, to get my ConnectX-4 dual port VPI 100 Gbps 4x EDR Infiniband cards working, with an Infiniband switch, do the following:

Code:
# vi /boot/loader.conf

Add to the bottom of the file
mlx5ib_load="YES"
ipoib_load="YES"

save, quit

Reboot your system


Also note that a lot of the Mellanox tools like mlxconfig, flint, mst, etc. - as far as I have been able to tell, out of the box, they don't work.

In fact, those tools are absent from the "out of the box" driver that shipped with TrueNAS 12.0 (or at least as far as I've been able to tell anyways, when I try to search the system for it).

Therefore; once you assign your IPv4 to the IB port, the only way that I found that I was able to test it was using just regular/conventional ping.

(As such, ibping is also absent.)

Once you have that setup, I also set up my pool and my dataset and then exported that to NFS because I wanted to test the NFSoRDMA transfer.

However, my mechanically rotating hard drives (on both the source end and also on the target end) are too slow to be able to really make full use of the 100 Gbps interface/line speed.

I copied a 63 GiB file over in about 3 minutes 5 seconds for an average data rate of about 346 MiB/s. (or about 2.63 Gibps).

I'm not using SSDs because I've worn through the write endurance limit on 6 SSDs over the last 5 years, so I don't do that anymore.

*edit #2*
So it would appear that NFSoRDMA is NOT working because I cannot mount the NFS share with the rdma option.

Bummer.

I can write a 10 GiB file of zeros over NFS at 374 MB/s (3 Gbps) but I can read it back to /dev/null at 6.7 GB/s (53.6 Gbps).

Note that that might actually be the limit of my HGST 1 TB drives as they're SATA 3 Gbps drives.

Currently they're in a 4-drive, stripped ZFS pool and on the host system, it can only write about 581 MB/s anyways, and read it back to /dev/null on the host system at 865 MB/s.

*edit*
ibstat works.

Code:
root@truenas[/]# ibstat
CA 'mlx5_0'
        CA type: MT4115
        Number of ports: 1
        Firmware version: 12.24.1000
        Hardware version: 0
        Node GUID: 0xe41d2d030066e1ca
        System image GUID: 0xe41d2d030066e1ca
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 100
                Base lid: 2
                LMC: 0
                SM lid: 1
                Capability mask: 0x2651e848
                Port GUID: 0xe41d2d030066e1ca
                Link layer: InfiniBand
CA 'mlx5_1'
        CA type: MT4115
        Number of ports: 1
        Firmware version: 12.24.1000
        Hardware version: 0
        Node GUID: 0xe41d2d030066e1cb
        System image GUID: 0xe41d2d030066e1ca
        Port 1:
                State: Down
                Physical state: Disabled
                Rate: 10
                Base lid: 65535
                LMC: 0
                SM lid: 0
                Capability mask: 0x2651e848
                Port GUID: 0xe41d2d030066e1cb
                Link layer: InfiniBand


*edit*
There is a mlx firmware tool that does ship with the OS.

I forget the name of it now, but if you build an index of all of the files that are installed by the OS and then grep it, you should find it.

(e.g.

Code:
# cd /
# find . > root.txt
# cat root.txt | grep mlx


So you can do SOME firmware management stuff on it.

I forget the name of the firmware management tool now, but I did find one last night when I probed the system for anything/everything that's installed that contains the word "mlx" in its name.

(That was somewhat rather annoying to find because it's not exactly well documented and a lot of the documentation online refers to Mellanox's OFED driver as opposed to the driver that's shipped "in the box".)
 
Last edited:

alpha754293

Dabbler
Joined
Jul 18, 2019
Messages
47
So, I just tried to run opensm via /etc/rc.d/opensm onestart and it returns this error:

eval: /usr/bin/opensm: not found.

So there you have it, OpenSM will not run.
 

alpha754293

Dabbler
Joined
Jul 18, 2019
Messages
47
TrueNAS should have all Infiniband code available as kernel modules (ibcore, mlx5ib, ipoib, etc.), just not loaded by default. But as far as I can tell, it is only usable for IPoIB now, since I don't think there are any RDMA services, such as NFS in the TrueNAS itself. Though I've never used Infiniband myself, so can't say much.

Thank you.

I got it up and running as much as I am able to get it up and running.

Pity that NFSoRDMA doesn't work and/or isn't supported.

Bummer.
 

beebski

Cadet
Joined
May 1, 2021
Messages
2
Thank you.

I got it up and running as much as I am able to get it up and running.

Pity that NFSoRDMA doesn't work and/or isn't supported.

Bummer.
Hey Alpha,

Just wondering how did you configure this? I tried following your guide. but i cant get it to working. ibstat shows the port is active but on the Networking Port shows Unkown. Can you pleae help me out?
 

alpha754293

Dabbler
Joined
Jul 18, 2019
Messages
47
Hey Alpha,

Just wondering how did you configure this? I tried following your guide. but i cant get it to working. ibstat shows the port is active but on the Networking Port shows Unkown. Can you pleae help me out?

How are you connecting your systems?

Are you using a switch or are you using direct attached cables in a peer-to-peer connection?

Do you have a subnet manager running on your network? (At least one of the systems will need to be running OpenSM in Linux because I don't remember if I ever got the subnet manager running in TrueNAS/*BSD.)

Thanks.
 
Top