TrueNAS Scale and 25/40GbE (Mellanox ConnectX-3/4) Setup, Benchmark and Tuning ...

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
This is a follow-up from my NVMe TrueNAS Scale storage build. If you want to read about my journey so far start here:

———

I am trying to post my journey and learnings about 40GbE QSFP in this separate thread because it might be of general use for some. Disclamer I am no expert but I learn as I go so take everything with a grain of salt - input welcome!

In general I chose to go with Mellanox ConnectX-3 PCIe cards because they are dirt cheap (purchased two for 35€ each) on eBay and other platforms. And for homelab / server use they have an amazing cost/value proposition.

There are a ton of different variants out there (not addressing individual OEM variants from HP, Lenovo ..) so I browsed through the official tech specs and tried to assemble an overview for myself:

Pasted Graphic.png


According to most guides and forums best compatibility is when using those cards in ETH (Ethernet) mode instead of the IB (Infiniband) setup. So this is what I am going to try first.

Another important finding is .. that it does not seem to matter which card you get because all have pretty much the same hardware and there is a way to flash each card with the latest 40GbE firmware. Even from different vendors (IBM, HP...). Only the „Pro“ cards have some special hardware that offers a bit more features (RDMA/RoCE v2) and of course you can not flash a physical SFP+ port to a QSFP port. ;) Keeping that in mind I purchased two „MCX354A-FCBT“ cards on eBay.

All ConnectX-3 cards support auto-negotiate and backwards compatibility. For „full-speed“ they are advertised as PCIe 3.0 x8 ideally. They also work in PCIe 2.0 slots and with fewer lanes (x1, x2, x4). But they require a mechanical x8 slot at least. In my case I use one card in my TrueNAS Scale PCIe 3.0 x16 (mechanical) / x4 (electrical) slot and one card in my Windows machine .. also electrically an x4 slot. Quick recap about (maximum) PCIe speeds:

Pasted Graphic_1.png


I flashed both cards in my Windows 11 Pro machine and afterwards put one in the TrueNAS system. The process is quite straight forward:

1. Download MFT (Mellanox Firmware Tools) 4.22.0 for your system (Windows x64 in my case)

2. Download newest Firmware (MCX354A-FCBT) 2.42.5000

3. Run CMD as administrator with following useful commands:
# show device list of installed mellanox cards
mst status

#detailed firmware info
mlxfwmanager --query

#see current card config
mlxconfig -d /dev/mst/mt4099_pci_cr0 query

4. Flash the firmware
# flash firmware
flint -d mt4099_pci_cr0 -i "firmwarefile".bin -allow_psid_change burn

5. Change port mode from IB to ETH
#for instance turn both ports from VPI/Auto to Ethernet only:
mlxconfig -d /dev/mst/mt4099_pci_cr0 set LINK_TYPE_P1=2 LINK_TYPE_P2=2

... if you use Windows 11 also download WinOF 5.50.54000 for Windows Server 2019.

And then it should work!

Next I will update on speed, benchmark and potential tuning!
 
Last edited:

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
After the flash two things are important to check via the console:

1. Correct firmware flash to 40GbE model "MCX354A-FCB_A2-A5" (check the back of your card if you might have a newer revision than A5 though!) and the latest firmware "2.42.5000":

Screenshot 2022-11-23 183650_3.jpg


2. All ports work in ETH mode:

Screenshot 2022-11-23 183723_.jpg


I have to say in Windows I sometimes had issues with error code 43 (PCIe device halted) and therefore issues executing the console commands. Still looking into if the cards maybe faulty or if it is a driver issue in Windows. In the TrueNAS system using Debian no issues so far ...

When all is running it should look like this in device manager:

Screenshot 2022-11-23 184548.jpg


Screenshot 2022-11-23 184732.jpg


In the TrueNAS system I also verified the max supported speed with the command "ethtool <network adapter>" showing 40.000 and 56.000:

truenas_ethtool speed port 1.jpg
 

Glowtape

Dabbler
Joined
Apr 8, 2017
Messages
45
I'm still using the built-in drivers supplied by "Microsoft". I mean, the versioning follows the WinOF pattern, so I guess they're contributed.

Too bad for PCIe 4.0 capable cards, you'd need to spring some gold coins for a ConnectX-5.
 

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
Ok ... first speed benchmark!

Setup Machine A:
  • TrueNAS-SCALE-22.02.4
  • AMD Ryzen 5600 / ASRock Rack X470D4U / Mellanox ConnectX-3 (40GbE QSFP)
  • 3x NVMe (WD Red SN700 1TB, M.2 PCIe 3.0 x4) in RaidZ1

Setup Machine B:
  • Windows 11 Pro
  • Intel Core i9-9900K / ASUS ROG Maximus XI Hero / Mellanox ConnectX-3 (40GbE QSFP)
  • Samsung SSD 980 PRO 1TB, M.2

Machines are connected directly via 1 meter DAC cable from amazon (40G QSFP+ DAC Cable - 40GBASE-CR4 Passive Direct Attach Copper Twinax QSFP Cable for Mellanox MC2206130-001, 1-Meter").

Settings Machine A:

Screenshot 2022-11-24 181852.jpg

Screenshot 2022-11-24 185128.jpg


Settings Machine B:

Screenshot 2022-11-24 181548.jpg


For my benchmark I used CrystalDiskMark 8 (64-bit) with NVMe setting accessing a SMB share on the TrueNAS Scale:

Screenshot 2022-11-24 182836.jpg


Local speed on Samsung NVMe SSD (Windows):

Screenshot 2022-11-24 183148.jpg


For comparison the speed via Gigabit Network connection (Intel NIC on motherboard):

Screenshot 2022-11-24 183416.jpg


Mellanox to Mellanox transfer:

Screenshot 2022-11-24 183817.jpg


Windows File transfer READ (copy file from TrueNAS -> Windows):

Screenshot 2022-11-24 183922.jpg


Windows File transfer WRITE (copy file from Windows -> TrueNAS):

Screenshot 2022-11-24 184113.jpg


So without any tweaking and pretty much default settings I am already quite happy. My goal was to get at least speeds around 10G (ideally above) and I think that is already the case.

Keeping in mind the Mellanox cards are both in PCIe 3.0 x4 slots (not x8 as the card supports) but I am also using only 1 QSFP port.

During my research I did find that there is a special Windows 11 Pro for Workstations version. That one (just as also the Windows Server Versions) support Direct-SMB and therefore RDMA which is also supported by the Mellanox cards:

Windows Server includes a feature called SMB Direct, which supports the use of network adapters that have Remote Direct Memory Access (RDMA) capability. Network adapters that have RDMA can function at full speed with very low latency, while using very little CPU. For workloads such as Hyper-V or Microsoft SQL Server, this enables a remote file server to resemble local storage. SMB Direct includes:

  • Increased throughput: Leverages the full throughput of high speed networks where the network adapters coordinate the transfer of large amounts of data at line speed.
  • Low latency: Provides extremely fast responses to network requests, and, as a result, makes remote file storage feel as if it is directly attached block storage.
  • Low CPU utilization: Uses fewer CPU cycles when transferring data over the network, which leaves more power available to server applications.
SMB Direct is automatically configured by Windows Server.

Now I could purchase that Windows for Workstations license ... but I continued to investigate and the current TrueNAS Scale status is:

RDMA is a very useful technology for accessing data in RAM on another system. For accessing data on HDDs and Flash, there is only a minor benefit. TrueNAS SCALE will support RDMA in a future release based on customer/community demand.

So still under development. I am curious what speed advantages are possible with that but it seems I have to wait a bit longer for an answer.

Any other suggestions to improve transfer speed with my setup? MTU or other settings ... ?
 
Last edited:

Glowtape

Dabbler
Joined
Apr 8, 2017
Messages
45
Regarding SMB Direct, both ends need to support it. TrueNAS, like any other *nix based system, uses Samba for SMB, and it doesn't support SMB Direct. Yet, anyway. But kinda lol, because it has been a decade long odyssey that still didn't go anywhere.

As for RDMA itself, that doesn't necessarily require the Pro for Workstations version. As you read elsewhere, I'm using NVMe-oF for block IO between my Windows and the TrueNAS boxes, using RDMA. And I'm using the regular Pro version.

Regarding MTU, I've set it to 9014 bytes on both ends. It had a reason why that crummy number, but I forgot.
 

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
Some additional latency test:

Gigabit NIC:

Screenshot 2022-11-24 192451.jpg


Mellanox <-> Mellanox:

Screenshot 2022-11-24 192604.jpg


On average 0.1ms lower latency ...
 

mervincm

Contributor
Joined
Mar 21, 2014
Messages
157
How are you connecting, a DAC cable, via fibre and optics/transcievers? Do you use a switch inbetween?
PC thx for gathering this detail, I might have to order a couple myself to play.
 

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
How are you connecting, a DAC cable, via fibre and optics/transcievers? Do you use a switch inbetween?
PC thx for gathering this detail, I might have to order a couple myself to play.
For my initial testing .. both cards are connected directly without intermittent switch or other hardware.

The 40GbE QSFP DAC cable plugs in directly into the Mellanox ports:
61HHZWl7afL._AC_SL1500_.jpg


This is good and cheap for shorter lengths (up to 10 meters) ... anything longer you have to switch to fiber.

There are also QSFP to SFP+ cables / splitters (QSFP is basically 4x SFP+ lanes in one cable). This way you can more easily connect the hardware to cheaper 10G switches.

I personally try to stay away from 10G T-base (with the traditional RJ-45 plug) because it has higher latency and uses more power. If possible go with QSFP / SFP+ all the way for anything above gigabit ethernet ...
 
Last edited:

mervincm

Contributor
Joined
Mar 21, 2014
Messages
157
I have a good mixture of SR optics, DAC and even SFP+ transceiver to 10G baseT when I need to use twisted pair at 2.5G or even 10G. I just have never played with greater than 10G ethernet yet. PS I had issues with the 10M DACs, I tend to use fiber over 5m
 

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
Since QSFP switches are either huge or (bought off used on ebay) probably with an insane power consumption for my needs ... I will probably just connect the cards directly since they have 2 ports each and I need to connect 2 devices benefiting from the higher speeds to the TrueNAS server (1 PC, 1 MAC).

Looking into SFP+ switches .. it gets easier with either the 4-port "MikroTik Cloud Router Switch CRS305" or a 3-port "QNAP QSW-308S" with additional 8-ports of gigabit ethernet. Both are in the below 200 Euro range brand-new.

For connecting the Mac there are "Thunderbolt-to-SFP+" solutions like the "QNA-T310G1S".
 

ZiggyGT

Contributor
Joined
Sep 25, 2017
Messages
125
Ok ... first speed benchmark!

Setup Machine A:
  • TrueNAS-SCALE-22.02.4
  • AMD Ryzen 5600 / ASRock Rack X470D4U / Mellanox ConnectX-3 (40GbE QSFP)
  • 3x NVMe (WD Red SN700 1TB, M.2 PCIe 3.0 x4) in RaidZ1

Setup Machine B:
  • Windows 11 Pro
  • Intel Core i9-9900K / ASUS ROG Maximus XI Hero / Mellanox ConnectX-3 (40GbE QSFP)
  • Samsung SSD 980 PRO 1TB, M.2

Machines are connected directly via 1 meter DAC cable from amazon (40G QSFP+ DAC Cable - 40GBASE-CR4 Passive Direct Attach Copper Twinax QSFP Cable for Mellanox MC2206130-001, 1-Meter").

Settings Machine A:

View attachment 60291
View attachment 60292

Settings Machine B:

View attachment 60293

For my benchmark I used CrystalDiskMark 8 (64-bit) with NVMe setting accessing a SMB share on the TrueNAS Scale:

View attachment 60294

Local speed on Samsung NVMe SSD (Windows):

View attachment 60299

For comparison the speed via Gigabit Network connection (Intel NIC on motherboard):

View attachment 60295

Mellanox to Mellanox transfer:

View attachment 60296

Windows File transfer READ (copy file from TrueNAS -> Windows):

View attachment 60297

Windows File transfer WRITE (copy file from Windows -> TrueNAS):

View attachment 60298


So without any tweaking and pretty much default settings I am already quite happy. My goal was to get at least speeds around 10G (ideally above) and I think that is already the case.

Keeping in mind the Mellanox cards are both in PCIe 3.0 x4 slots (not x8 as the card supports) but I am also using only 1 QSFP port.

During my research I did find that there is a special Windows 11 Pro for Workstations version. That one (just as also the Windows Server Versions) support Direct-SMB and therefore RDMA which is also supported by the Mellanox cards:



Now I could purchase that Windows for Workstations license ... but I continued to investigate and the current TrueNAS Scale status is:



So still under development. I am curious what speed advantages are possible with that but it seems I have to wait a bit longer for an answer.

Any other suggestions to improve transfer speed with my setup? MTU or other settings ... ?
Very impressive. I have a DAC and 2 QSFP+ dual port Mellanox cards. I'll add getting them to work back on my project list.
 

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
Due to some logistics and compatibility related reasons I will migrate away from the older QSFP standard and towards SFP28.

For that reason I ordered some Mellanox ConnectX-4 (MCX4121A-ACAT) cards with dual SFP28 ports. Will post / update the results once I receive them ...
 

ZiggyGT

Contributor
Joined
Sep 25, 2017
Messages
125
Due to some logistics and compatibility related reasons I will migrate away from the older QSFP standard and towards SFP28.
Did not know what SFP28 was so I looked here.
https://community.fs.com/blog/sfp-vs-sfp-vs-sf-p28-vs-qsfp-vs-qsf-p28-what-are-the-differences.html
Same form factor just faster than SPF+? I was curious to understand the motivation for the change. Still puzzled, so thought I'd just ask. What is the primary logistics and compatibility issues?
 

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
In a nutshell (much more complex irl) SFP28(25GbE)/QSFP28(100GbE) is the sucessor to the QSFP(40GbE) path.

I am restructuring some of my home network layout and already have some SFP+ / SFP28 hardware. The main switch and network layout will be 10GbE based (SFP+) with some specific interconnections (direct crosslinks) between clients (SFP28/Aggregate).

I could have kept the ConnectX-3 / QSFP cards for that ... but would have to invest in some costly converters (QSFP to SFP+ to Fiber). Ultimately with the recent price drop of ConnectX-4 cards I choosed to ditch QSFP altogether and jump on the SFP+/SFP28/QSFP28 bandwagon.
 

Glitch01

Dabbler
Joined
Aug 24, 2022
Messages
26
In a nutshell (much more complex irl) SFP28(25GbE)/QSFP28(100GbE) is the sucessor to the QSFP(40GbE) path.

I am restructuring some of my home network layout and already have some SFP+ / SFP28 hardware. The main switch and network layout will be 10GbE based (SFP+) with some specific interconnections (direct crosslinks) between clients (SFP28/Aggregate).

I could have kept the ConnectX-3 / QSFP cards for that ... but would have to invest in some costly converters (QSFP to SFP+ to Fiber). Ultimately with the recent price drop of ConnectX-4 cards I choosed to ditch QSFP altogether and jump on the SFP+/SFP28/QSFP28 bandwagon.
Thanks for the write up though. It's helped me get my Mellanox MCX354A card working on my TrueNAS. Switching to ETH got the cards to show up in TrueNAS. Using the A2-A5 rom allowed 10 Gbe to 40 Gbe connection status. Here are my benchmarks on pcie 3.0 x8, MTU 9000, direct from pc to TrueNAS. Transferring an 8 gb file throughput was 2.88 GBps from Windows 10 to TrueNAS

Your write-up saved me a lot of researching for cables and configuration.
1688957511819.png
 
Last edited:

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
Thanks for the write up though. It's helped me get my Mellanox MCX354A card working on my TrueNAS. Switching to ETH got the cards to show up in TrueNAS. Using the A2-A5 rom allowed 10 Gbe to 40 Gbe connection status. Here are my benchmarks on pcie 3.0 x8, MTU 9000, direct from pc to TrueNAS. Transferring an 8 gb file throughput was 2.88 GBps from Windows 10 to TrueNAS

Your write-up saved me a lot of researching for cables and configuration. View attachment 68182
Glad that it was of help. What is your setup (CPU, MB, etc....)?
 

Glitch01

Dabbler
Joined
Aug 24, 2022
Messages
26
Glad that it was of help. What is your setup (CPU, MB, etc....)?
TrueNAS system is an AMD 1700 Ryzen, 32 GB DDR4 3200, Asus Crosshair Hero VI x370 motherboard, LSI SAS 9300-8I raid controller, 6x WD Helium 14 tb 7200 rpm HDD for pool1, 4x WD Helium 4 tb 7200 rpm HDD for pool2, 512 gb Samsung nvme cache log mode for pool2, 120 gb sata ssd for boot os, Mellanox MCX354A nic, Dual 10 gb Intel nic, EVGA 650 watt psu.

Main PC is a 5700G AMD Ryzen, 64 GB DDR4 3600, Asus Crosshair Hero VIII x570 motherboard, 1 tb Samsung 980 Pro nvme, 4 tb Acer Predator nvme, AMD RX580 gpu, 1 tb SSD, 2 tb SSD, 2 tb HHD, dual 10 Gbe Intel nic, Mellanox MCX354A nic.

Still in the process of re-arranging some stuff, but the original idea was to have everything on a 10 Gbe network from a 1 Gbe Cat5e network. With the Mellanox MCX354A cards being so affordable, it wasn't too much of a leap to implement into the 10 Gbe network build. I run a lot of vm's and typically move 100+ GB files back and forth from my machine, the NAS, and hypervisors. The 2nd pool on the NAS was just some spare drives I'm making use of and eventually will be replaced with additional 14 tb hdd's to pool1. I also have RAM cache enabled for my local drives which will boost read/writes to/from the local drives but not direct read/writes on the NAS. The 10 Gbe dual nic's will be moved from the NAS and my main pc to the hypervisors as an upgrade to existing 2.5 Gbe and 1 Gbe.

Connections will be going from a managed switch with 2x SPF 10 Gbe, 2x 2.5 Gbe, 8x 1 Gbe to a custom managed switch with 2x QSFP 40 Gbe, 8x 10 Gbe RJ45, 2x 1 Gbe ports.

RAM cache disabled for local nvme
1688999415557.png


RAM cache enabled local nvme
1688999756634.png
 

pixelwave

Contributor
Joined
Jan 26, 2022
Messages
174
I am wondering how you achieve 4000 MB/s read and write to the TrueNAS system? Especially since they are mostly HDDs?

I currently have issues with my ConnectX-4 cards since I have a weird behaviour not achieving full 10GbE down/up speed:
 

BloodyIron

Contributor
Joined
Feb 28, 2013
Messages
133
@Glitch01 are your ConnectX-4 cards in IB or ETH mode? I for one would like to know a lot more about your configuration and set-up specific to the IB cards please! Also, why does your iperf only go up to 24gpbs? That seems low... (from my armchair, since I'm starting the IB adventure myself hehe).
 
Top