Building/Compiling FreeNAS with OFED/iSER support

republicus

Cadet
Joined
Sep 4, 2019
Messages
1
I have researched this topic as broadly as possible so I could learn how to make infiniband behave in FreeNAS as well as in hoping not to bother anyone else for support.
Unfortunately my research has left me more unsure than when I began. I can't even be sure my efforts will be fruitful if realized in my proceeding request.

I currently have ipoib working out of the box using the included mlx4 and ipoib modules for my 40Gbe Infiniband network card. I wish, though, to use ISCSI (iSER) and NFS over RDMA.

It appears iSER can be compiled into the kernel of FreeBSD (https://man.openbsd.org/FreeBSD-11.1/iser.4) or loaded as a module.
Since iSER appears not to be compiled into the FreeNAS kernel or as a module I wish to try to build FreeNAS with it, or with the OFED stack (https://wiki.freebsd.org/InfiniBand) which should also include iSER/RDMA support.

I found instructions to build FreeNAS (https://github.com/freenas/build) which does not exactly align with the instructions for compiling the kernel with the earlier mentioned support.

I'm new to building FreeBSD/FreeNAS, so I suppose I am requesting best practices for building FreeNAS with iSER/OFED.

Might I first build FreeBSD with the compiled kernel modifications then proceed to build FreeNAS as detailed above?
Can the same steps be made to include additional packages (which may be moot as they may be included when OFED is compiled) such as ibstat and other infiniband diags/tools?

In the end, I still am not sure either or both approaches will make iSER target support succeed. It appears some work has been made, first, to introduce iSER initiator support. But that was read in old mailing lists and not specifically mentioning OFED since Mellanox engineers started contributing. I am willing to experiment and report back for any others interested. Until then, I humbly ask for any guidance anyone might have that may lead me there.

Thanks for reading.
 

getcom

Dabbler
Joined
Jun 3, 2019
Messages
10

This ticket is closed now. :rolleyes:

I have very good experience with iSCSI over Infiniband and it would be interesting if iSER will push this again.

Since this blog post I cannot sleep anymore:
http://www.zeta.systems/blog/2016/09/21/iSCSI-vs-iSER-vs-SRP-on-Ethernet-&-InfiniBand/
:cool:

Last week I got 10 ConnectX-3 cards and a 36x56Gbit switch for a price I could not say no...I`m on the way to setup a new test system. The WD RED SA500 are already installed, it would be nice to have also iSER running...

The big question is:
Does anybody know if there is a roadmap/time line for iSER in FreeNAS?
Any news?

Thank you in advance.
Ralf
 

Rand

Guru
Joined
Dec 30, 2013
Messages
906
@Kris Moore
Do you have any kind of roadmap on iSer (and/or any of the other RDMA enabled sharing options)?
Or only on the nice to have list?
 

getcom

Dabbler
Joined
Jun 3, 2019
Messages
10
There is an additional question related to OFED:
As mentioned in the FreeBSD Infiniband Wiki, OFED is already included on the FreeBSD side. This means for me that also opensm is available out of the box. This is also missing in FreeNAS. If we are talking about iSER support it would also makes sense to provide the opensm binary.

At the moment I`m just testing a cluster setup again with FreeNAS as the iSCSI target provider together with multipath to increase the max throughput. In my test cluster I have three dual ConnectX-3 cards connected to a very cheap Mellanox SX6790 switch, which provides 36x56Gbit/s. FreeNAS IBoE has six subnets for that. Each cluster node has two dual ConnectX-3 cards and is then connected to four subnets. This means on the FreeNAS side each subnet has two endpoints. I`m just playing around with this setup.

FreeNAS-ZFS-iSCSI-Infiniband-Proxmox-Cluster.jpg


If such an unmanaged switch is used, one subnet manager has to run for each subnet. At the moment I realize that on the "client" side which means on each Linux cluster node there are running two opensm daemons for two subnets.

This is not ideal because if I have to boot one cluster node because of updates or physical cleanup, I loose two subnets until the server is up again.
I could setup a dedicated Linux server also with three dual ConnectX-3 cards but would then waste six ports on the switch plus additional power consumption. The next step for me is to compile OFED on a FreeBSD host with a running FreeNAS kernel and sources and try to get opensm running on FreeNAS.
Before I do that:
Do you want to provide also the opensm binary the next time?
From my perspective this would make sense.
 

getcom

Dabbler
Joined
Jun 3, 2019
Messages
10
There is an additional question related to OFED:
As mentioned in the FreeBSD Infiniband Wiki, OFED is already included on the FreeBSD side. This means for me that also opensm is available out of the box. This is also missing in FreeNAS. If we are talking about iSER support it would also makes sense to provide the opensm binary.

At the moment I`m just testing a cluster setup again with FreeNAS as the iSCSI target provider together with multipath to increase the max throughput. In my test cluster I have three dual ConnectX-3 cards connected to a very cheap Mellanox SX6790 switch, which provides 36x56Gbit/s. FreeNAS IBoE has six subnets for that. Each cluster node has two dual ConnectX-3 cards and is then connected to four subnets. This means on the FreeNAS side each subnet has two endpoints. I`m just playing around with this setup.


If such an unmanaged switch is used, one subnet manager has to run for each subnet. At the moment I realize that on the "client" side which means on each Linux cluster node there are running two opensm daemons for two subnets.

This is not ideal because if I have to boot one cluster node because of updates or physical cleanup, I loose two subnets until the server is up again.
I could setup a dedicated Linux server also with three dual ConnectX-3 cards but would then waste six ports on the switch plus additional power consumption. The next step for me is to compile OFED on a FreeBSD host with a running FreeNAS kernel and sources and try to get opensm running on FreeNAS.
Before I do that:
Do you want to provide also the opensm binary the next time?
From my perspective this would make sense.

I just installed a FreeBSD 11.3 VM and compiled opensm from the latest FreeNAS 11.3 sources (OFED included), copied the binary to /usr/bin in FreeNAS and voila opensm is working now as a service on FreeNAS 11.3.

It would be interesting what the current state of FreeNAS 12 is. I found the development branch already in the FreeNAS repository.
Is it worth to download and compile it for testing or is it too early for that?
From my perspective Infiniband, iSCSI and iSER are key features for a NAS system to get fastest shared block storage for a VM cluster like Proxmox. Because of available hardware I could also be a tester for the upcoming release.
 

getcom

Dabbler
Joined
Jun 3, 2019
Messages
10
Because there is still no answer on the roadmap question, I decided to install the TrueNAS core for testing.
The most things which are essential in my setup are not working. It is still in an early state.

After that I tested OpenIndiana with Napp-IT. The problem here was the Infiniband setup and the poor documentation of OpenIndiana related to Mellanox topics.

The next test was to install also Proxmox on the storage server. Mellanox shares a lot of documentations for Linux systems.
It was very easy to implement the necessary drivers, RDMA, iSER, multipath, etc., and finally I setup two iSCSI targets over targetcli for the SSD pool and the HDD pool.
I used then LIO on the Proxmox cluster for setting up new VMs. The result with iSER on Linux compared to iSCSI over IPoIB on FreeNAS was amazing.
I tested with fio on debian VMs. With FreeNAS I got ~100Mb/s with 4k blocks and ~1,3GB/s with 4M blocks (only one run, no parallel testing) . With iSER on Linux I got ~550-600MB/s with 4k blocks and 3500MB/s with 4M blocks in parallel on three VMs on three cluster nodes. The overall performance on the server side was ~10GB/s. I have the impression that the mirrored SLOG devices (two Intel Optane P4800X) have more influence on Linux, because of the overall result. A virtual Windows 2016 Standard server is booting in 4 seconds, a Debian 10 server in 3 seconds.

Hopefully there will be iSER available the next time so that I can switch back to FreeNAS.

Cheers
Ralf
 

Rand

Guru
Joined
Dec 30, 2013
Messages
906
Nice - could you elaborate some more on your setup?
You are using Proxmox as storage box since it has ZFS and runs on Linux? why not a native Linux installation then with ZoL?
And what are your clients running? Also Linux?
 

getcom

Dabbler
Joined
Jun 3, 2019
Messages
10
Hi Rand,

I assume that you mean the iSER clients? If yes, the previous setup was a cluster with three nodes with Proxmox and FreeNAS as iSCSI target.
If you use Proxmox also for the iSER target it could also be a cluster member.
The advantage is that I have one monitoring view for the whole cluster and if I want I can live migrate also a VM to the storage server.

Proxmox 6 is based on Debian 10 with HA components like ha-manager, coronsync and a nice web UI.
I created the ZFS pools from scratch on the Proxmox storage server because the OpenIndiana ZFS is not compatible to FreeNAS and ZoL.
I did that on the console because I`m using partitions for the mirrored Optanes to have a very fast SLOG available on ssdpool and hddpool.
It doesn`t matter if it is Proxmox or Debian from the targetcli or ZoL perspective. It is the same.

The targetcli has no GUI, I configured the complete RDMA/iSER/iSCSI target stuff manually. On Proxmox cluster side I setup two iSCSI storage pools.
It is then easy to create a VM because it is using the LIO API to create the iSCSI LUNs.

Related to iSER/RDMA/Infiniband there is no difference between Proxmox and any other Linux distribution. Normally I`m using Debian for web application servers, Galera clusters or similar use cases. I was also thinking about switching to native FreeBSD. But the show stopper is always Mellanox. They are focused on Linux. The OFED drivers for FreeBSD are different to the Linux versions and a little bit outdated. If you want to use the latest ConnectX-3 firmware it is better to run Linux at the moment. FreeBSD has some advantages compared to Linux but not if Infiniband comes into the game. For all network related things I run normally FreeBSD based systems like pfSense or FreeNAS. It is case related what I`m running, every system has its pros and cons.

With FreeNAS I was using freenas-proxmox for integration with Proxmox: https://github.com/TheGrandWazoo/freenas-proxmox
This is also nice but does not setup multipath. Multipath is essential if Infiniband is used. For the Proxmox setup I wrote a small tool which creates the multipath config files automatically with the wwids and aliases like "ssdpool-iscsi-vm-114-disk-0" if you create a VM. I`m using two dual Infiniband cards on the server (ConnectX-3) and the same on the clients (ConnectX-2) with four Infiniband subnets, managed on the storage server by opensm.
As the ConnectX-2 cards are PCIe 2.0 the theoretical max throughput is ~25Gbit/s. This is what I got with IPoIB in connected mode with a MTU=65520. But with this MTU I saw SCSI errors on client side with FreeNAS as storage server. With a MTU=49500 the SCSI errors where gone but also the performance. I never found out what was the reason for that, maybe it was a buffer problem somewhere.
The iSER multipath setup does not have any of this problems. It is running and it is damn fast. I never saw a performance like that over an iSCSI network connection.

These are read/write tests directly on the storage server with 128k blocks:
Read test HDD ZFS Pool / 8 mirrored VDEVs, 16xHGST SAS3 10TB, with mirrored SLOG partition 2xIntel Optane P4800X:
Run status group 0 (all jobs):
READ: bw=13.4GiB/s (14.3GB/s), 13.4GiB/s-13.4GiB/s ( 14.3GB/s-14.3GB/s), io=801GiB (860GB), run=60001-60001msec

Read test SSD ZFS Pool / 4 mirrored VDEVs, 8xWD RED SA500 2TB, with mirrored SLOG partition 2xIntel Optane P4800X:
Run status group 0 (all jobs):
READ: bw=12.4GiB/s (13.3GB/s), 12.4GiB/s-12.4GiB/s ( 13.3GB/s-13.3GB/s), io=743GiB (798GB), run=60001-60001msec

Write test with activated caching:
Write test SSD Pool (sync=0):
Run status group 0 (all jobs):
WRITE: bw=8506MiB/s (8919MB/s), 8506MiB/s-8506MiB/s ( 8919MB/s-8919MB/s), io=498GiB (535GB), run=60001-60001msec

Write test HDD Pool (sync=0):
Run status group 0 (all jobs):
WRITE: bw=8244MiB/s (8645MB/s), 8244MiB/s-8244MiB/s ( 8645MB/s-8645MB/s), io=483GiB (519GB), run=60001-60001msec

This is fast enough to serve some more cluster nodes.

The storage server has two XEON Gold 5222 with 512GB DDR4 2933MHz (16x Samsung M393A4K40CB2-CVF), motherboard is a Supermicro MDP-X11DPH-T-O, backplane is a Supermicro pass-through BPN-SAS-846, chassis is a Supermicro CSE-846BE1C-R1K23B, HBA is a LSI 9305-24i. The operating system is running on two mirrored Intel SSD DC P4511 1TB.

The Infiniband switch is a MSX6790-FS2F 36x56Gbps FDR IB Ports (QSFP), 2x Mellanox ConnectX-3 VPI (server, originally from HP, flashed to original Mellanox FW and PSID), 6x Mellanox ConnectX-2 VPI (clients), 16xMellanox QDR QSFP cables.
Network connections are based on dual 10Gbit/s TP + dual Intel X710 SFP+ on server and clients for the LACP trunks which are connected to 2 Cisco SG550XG-8f8t for backup and client networks. The backup server is an additional 3HE Supermicro server with FreeNAS and a bareos network backup server running in a jail.

The three Proxmox cluster nodes are 1U Supermicro X9DRW-7TPF Model SYS-6017R-72RFTP with 2x Intel Xeon E5-2650 V2 2.6Ghz 8 Cores (32 VCPUs), 256GB RAM (16x 16GB ECC REG PC3-8500R), dual 1GB ethernet onboard Intel i350, dual 10GB SFP+ onboard Broadcom BCM57810S. The LSI SAS controller is flashed to IT mode (for this task I have written an own routine because this was a little bit tricky to get it working). The OS is running on a RAIDZ-2 with 4 WD RED SA500 1TB.

Here are some early views:
1585096090659.png


1585094654560.jpeg








1585095366046.png


If you want to have more details of the setup, tweaks, performance tuning, please ask me.

The next step is to setup Kubernetes with a Terraform Proxmox provider together with the loadbalancer running on my pfSense CARP cluster to get a full automated container platform up and running. Additionally a ZFS HA storage environment is also on the road map (at the moment the iSER storage is a single point of failure). The second storage server is ready to run, except the 16x 10TB HGSTs are missing, which are hard to find for normal prices at the moment.


Cheers
Ralf
 
Last edited:

Rand

Guru
Joined
Dec 30, 2013
Messages
906
Thanks a mil for the detailed info Ralf - If I need further info I'll pm you, I don't think that would be appropriate here:)
Seems that my main issue is the compatibility to VMWare that I am striving to maintain since I run everything on it atm ...
 

Rand

Guru
Joined
Dec 30, 2013
Messages
906
Reopened the FR to make it voteable
 
Top