TrueNAS and RDMA and Win11 Workstation

inno173

Dabbler
Joined
Jul 3, 2022
Messages
17
Dear All,

I am new to the forum , but experienced with TN before. TN is a nice soft indeed!

I am preparing a lab test with TN 13 Core and Win11 Workstation , the target is to test out the peak cap of Win11 Workstation RDMA support with TrueNAS.

There are little rumor that TN 13 Core got hidden supporting on RDMA , just have not officially released yet.

I am going to test this out with results share.

Here is my lab test device list:
Server
1. HP DL380 G9 with 2670 v3 x 2
2. 64G DDR4 2133 Ram.
3. 200G 12Gbps SAS SSD drive x 8 in ZFS Z1
4. HP H240 HBA Card

Network
5. HP 544+ FLR 40G Dual port Adapter (equivalent to ConnectX-3 Pro which support RDMA and RoCE v2) on HPE G9 Server
6. Chelsio T580 Dual 40G RDMA NIC on Win11 workstation (with the workstation is running Xeon Silver)
7. DAC 40G Cabling in pair (will try to Aggregate)

I have known about the windows SMB Direct (RDMA) transfer cap is 2.4GB/s max with Win10 Workstation between 2 workstations direct connected.

However this time i am trying with Win11 Workstation with TrueNAS latest build and see if RDMA working fine.

I will keep this lab test posted.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
TrueNAS 13 got many improvements, including performance ones, but unfortunately still no RDMA.
 

inno173

Dabbler
Joined
Jul 3, 2022
Messages
17
mav , thanks for heads up... sad that not yet support RDMA , any roadmap that RDMA will be supported with sooner release?

some of my parts are arrived ... sharing some photos shortly
 

inno173

Dabbler
Joined
Jul 3, 2022
Messages
17
Some Small Parts arrived today.

Big one having some delays. Will post when arrives.

Cheers. :smile:


Image from iOS (4).jpg
Image from iOS (8).jpg
Image from iOS (7).jpg
Image from iOS (5).jpg
Image from iOS (6).jpg
 

inno173

Dabbler
Joined
Jul 3, 2022
Messages
17
I have finally done all config and setup on my lab, the result are yet not reach my expected.

I have replaced the T580 on the windows machine which costing very slow write speed to NAS , After replace to ConnectX-3Pro the network bottleneck solved.

notes , jumbo frame 9014 is configurated for able to reach this performance.

iperf3 result:

The iperf3 with multi thread is only able to reach max 20.5Gbps. While single thread is only around 14.5Gbps. However after 1 night of sitting the NAS server , and do the iperf3 again. the speed drops around 40% unless reboot the server.

photo with single thread:
1657321027872.png


Photo with performance drop from 20.5Gbps to only around 9Gbps after 1 single night of sitting of the NAS. And the iperf3 hangs in the middle unless reboot the TN server.

iSCSI result:

iSCSI result is acceptable , however , the bottleneck i believe is at the network which capped around 20Gbps. And i do believe that the SLOG and cache would also affecting. Result photo as below:

1657321224358.png


CPU Utilization is around 6% on the TNC 13 while transfer.

Next Step:

I am going to add a NVMe SLOG and possible L2ARC to the TNC 13 , and see if any performance boost. I will have my upgrades parts on hand in couples of hours. Will keep the thread posted.

Question:
1.

Are there any chance that i can reach more close to 40G speed with existing hardware? or changing of hardware is a must? I dont think the CPU is affecting since the utilization is only around 6% during the transfer.

2.
As i am having limited PCIe slots , are there a alternative options to install the TNC 13 in a TF card which available on the server?

3.
I am running win11 pro with SMB direct feature enabled. Does anyone suggest me to try with Win 11 Workstation which heard that the Workstation version for W11 having some optimize with SMB v3 , like multi path .. etc.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
Chelsio NICs are usually pretty hot and require active airflow. Make sure you are not cooking one alive. Make sure your desktop has fan there. 12Gbps is definitely low. I think you should be able to get close to the 40Gbps with properly set up test. I'd try to look on per-core CPU utilization on both Windows and TrueNAS sides (`top -SHIz`). With single connection workloads it is very easy to get single core bottleneck, especially if CPU frequency is not very high.
 

inno173

Dabbler
Joined
Jul 3, 2022
Messages
17
hm .. thx mav for ur reply

done the top with below result, it seems not CPU issue for the slow performance.

hm .. may ask which part you suggest me to do the setup\tunning? I have MTU 9014 and the performance doubled up after 9014 from 1514.

Firmware of the ConnectX-3Pro are latest 5700 build on both ends.

My window side definitely will not have CPU issue as its a Xeon Silver Gen2.

Auto Tune off and on with no significant difference.

1657326957850.png


1657327217849.png
 

inno173

Dabbler
Joined
Jul 3, 2022
Messages
17
btw mav

you are right , the chelsio is really hot and i were not having active cooling, I;ve just done cooking with a T580 o_O
 

inno173

Dabbler
Joined
Jul 3, 2022
Messages
17
I have got my SLOG and L2ARC NVMe installed. But no any performance improved in either smb or the iperf3.

below are moving 200GBytes files with SMB on Win 11 Pro. Around 3 minutes through.

1657335080706.png
 

souporman

Explorer
Joined
Feb 3, 2015
Messages
57
This won't help your performance, but your L2ARC and SLOG should not be their own Pools. They need to be added as a VDEV of your "dataset" Pool. Right now they are not doing anything. Just detach them both. Go into your "dataset" pool, add a vdev, and you'll see the "cache" vdev (l2arc) and the "log" vdev (SLOG) that you can now add as additional VDEVs to your "dataset" pool. This is the only way to get your "dataset" pool to use them... otherwise they will sit there and do nothing. It is recommended to have a mirrored SLOG, or one with power loss protection, but for testing purposes what you have is fine. The SLOG will not be utilized at all for SMB as the SLOG will only be used for Synchronous writes and SMB is inherently asynchronous (and shouldn't be changed to Synchronous)--so as I said before, it won't help your performance... but it is wrong.
 

inno173

Dabbler
Joined
Jul 3, 2022
Messages
17
@souporman , thank you for your message. :D

I have realize the caching drive and log drive later on , then were now binded to a SAS spinning drive pool. While the Spinning 12Gbps Drives are only having around 300MB/s transfer rate in windows , after adding the NVMe Cache drive it can reach as high as 1.4GB/s windows file transfer.

While my 12Gbps SAS SSD pool doesnt got different with adding additional cache drive. Probably because my cache drive is a slow one. only 2GB/s

I might try one faster NVMe (P4600) Cache drive later.

i have now my pool setup as below:

SAS 12Gbps spinning drives x 8
Screenshot 2022-07-16 151121.png


Screenshot 2022-07-16 151030.png
 

inno173

Dabbler
Joined
Jul 3, 2022
Messages
17
Hi Folks, performance report as below:

In windows , tested even Win11 Workstation which support multi path smb , but still not improvement on the file transfer , so i return back to 11 Pro.

iPerf3 4 process Benchmark , with positive result now already , almost can reach 40Gbps , i am happy with this already.

iperf3 4 process 2M block
iperf3 throught put testing.png


11Pro concurrent Transfer:
Concurrent Transfer.jpg




SSD x 8 Pool in Z1
SAS x 8 RAID Z1.png


Single NVMe Pool
NVME x 1 Cache Drive.png
 

inno173

Dabbler
Joined
Jul 3, 2022
Messages
17
one strange thing i have encountered with the LAG between the TNC and the Mellanox Switch with LACP , i can never bring the link up with LACP.

right now i am only using load balancing. no clue at the moment and i think just leave this aside.

at last sharing our lab test equipment photos :)

Has been nice and enjoying sharing here.


Image from iOS.jpg
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
My window side definitely will not have CPU issue as its a Xeon Silver Gen2.
It is not even close an argument. I was talking not about total CPU usage, but about single core bottleneck. Watch your CPU usage in per-core mode. If at least single core is at 100% -- it is likely a CPU bottleneck.
 

inno173

Dabbler
Joined
Jul 3, 2022
Messages
17
Dear Mav, Thanks for your heads up.

btw , the iperf3 got around 40Gbps was after i have reinstall my windows , probably the drives in the previous windows got some limitation or what ever.

but I will try if i can get better performance once i have chance to get a better CPU.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
I am not saying the CPU is bad. I'm saying that some workloads may not scale to multiple cores, and if you are limited by one, it may explain everything. Higher clock rate may improve the numbers, but so may potentially some software tuning.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
@inno173 There are no restrictions on TrueNAS Core use. But Enterprise version makes it smoother with qualified hardware, additional enterprise features and support.
 
Top