10Gbps Chelsio working at 3Gbps | Writing speed slows down after 10s

hellamasta

Dabbler
Joined
Apr 4, 2022
Messages
27
Hi everyone, I'm facing this low performance issue for a while. I upgraded to Truenas Core 13.0 to verify if this could be a solution but it's not. Hope you can help me. I'm now going to detail my systems (both Truenas and hosts) and all the tests I did.

TRUENAS:
Asus MB p6x58d-e + Xeon X5650 + 24 GB DDR3 RAM
Asus R5-220 video card
500 W EVGA Power Supply
Chelsio S310-E CR
32GB SSD for OS
4x1TB Seagate Ironwolf in RaidZ1 configuration

HOST:
MSI Tomahawk Z690 + Intel i7 12700K
Nvidia RTX 3070 8GB
64 GB DDR4 Ram
Sun Oracle 7051223 based on Intel x520-DA2
Windows 11 64bits (not easy to make the X520 work :( )

I'm using a Zyxel XGS1250-12 managed switch. The Truenas is connected to the SFP+ port via short range fiber. The host is connected with Cat7 cable via Rj45 10Gtek transceiver plugged in X520 NIC.

>> I've tested the host with both Windows 11 and Linux OS and both gave me the same result: after setting up jumbo frames both inbound and outbound transfers run at 3Gbps (more or lesso). No way to get speed even just close to 10Gbps. These test were performed by using iperf3.

I also tried to use the X520 NIC on Truenas but I got the same problems.

>> Another issue I'm facing is that when writing on to truenas (via SMB share) the writing speed starts at over 300MB/s but slows down to 100MB/s after circa 10s.

I've two test I'd like to perform: 1- check the PCI-E settings in MB bios and directly connect Truenas to host PC.

Any help and suggestion is more than welcome
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
>> Another issue I'm facing is that when writing on to truenas (via SMB share) the writing speed starts at over 300MB/s but slows down to 100MB/s after circa 10s.
Since ZFS transaction groups are flushed to disk in 5 second increments, I guess that means your pool only takes 2 transaction groups to run out of capacity to keep up with the RAM write cache, so ZFS tells the stream to slow down as it's getting new transactions when it can't get rid of the ones already being processed in time. (that would apply if you were doing sync writes)

>> I've tested the host with both Windows 11 and Linux OS and both gave me the same result: after setting up jumbo frames both inbound and outbound transfers run at 3Gbps (more or lesso). No way to get speed even just close to 10Gbps. These test were performed by using iperf3.
If you can't even get the network to operate at full speed, there's certainly something not good going on with one or other of the ends.

4x1TB Seagate Ironwolf in RaidZ1 configuration
That may be part of the problem too, depending on what you're moving... lots of small files or sync writes will be punishing RAIDZ, which could be improved with mirrors instead.

You might also consider performance tuning of the pool and/or dataset according to this (https://openzfs.github.io/openzfs-d...uning/Workload Tuning.html#dataset-recordsize) and your workload specifics.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
In addition, the S310 is a very old card. I don't recall if it's the PCIe version, the number of lanes, or both, but I distinctly remember that it simply isn't capable of anything close to 10 Gbps.
 

hellamasta

Dabbler
Joined
Apr 4, 2022
Messages
27
In addition, the S310 is a very old card. I don't recall if it's the PCIe version, the number of lanes, or both, but I distinctly remember that it simply isn't capable of anything close to 10 Gbps.
Thanks! I'm now waiting a X540 card and planning to move the X 520-D2 to Truenas and the X540 T1 on the host
 

hellamasta

Dabbler
Joined
Apr 4, 2022
Messages
27
Since ZFS transaction groups are flushed to disk in 5 second increments, I guess that means your pool only takes 2 transaction groups to run out of capacity to keep up with the RAM write cache, so ZFS tells the stream to slow down as it's getting new transactions when it can't get rid of the ones already being processed in time. (that would apply if you were doing sync writes)


If you can't even get the network to operate at full speed, there's certainly something not good going on with one or other of the ends.


That may be part of the problem too, depending on what you're moving... lots of small files or sync writes will be punishing RAIDZ, which could be improved with mirrors instead.

You might also consider performance tuning of the pool and/or dataset according to this (https://openzfs.github.io/openzfs-docs/Performance and Tuning/Workload Tuning.html#dataset-recordsize) and your workload specifics.

Related to disks, I'm referring to a large file transfer and not to lots of small files.
Related to the write cache, I don't see the RAM cache full at any time, so I'm not sure the cache is the problem. i.e. if I transfer a 10GB file, it just occupy in the worst case a half of the RAM, but I experience this slow-down anyway.

Any test I could do?
 
Joined
Dec 29, 2014
Messages
1,135
Try an iperf3 test between the hosts. That will test the network with synthetic traffic (no disk i/o). If you can't get 90-95% of link speed there then there is some kind of network problem. If you can then the problem likely has to do with disk i/o as @sretalla and @danb35 have suggested.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
likely has to do with disk i/o as @sretalla and @danb35 have suggested.
To be clear, I'm saying the NIC itself is a limiter. It may not be the bottleneck here, but I don't believe it's possible to get over about 6 Gbit/sec out of a S310. I could be wrong, but that's the point I'm making.
 

hellamasta

Dabbler
Joined
Apr 4, 2022
Messages
27
Try an iperf3 test between the hosts. That will test the network with synthetic traffic (no disk i/o). If you can't get 90-95% of link speed there then there is some kind of network problem. If you can then the problem likely has to do with disk i/o as @sretalla and @danb35 have suggested.
I already did an iperf3 test, also by running Linux on the host insteaf of Windows. that's where the 3Gbps speed comes from.
 
Joined
Dec 29, 2014
Messages
1,135
I already did an iperf3 test, also by running Linux on the host insteaf of Windows. that's where the 3Gbps speed comes from.
If you can't get past 3Gbps with iperf then you have a hardware or network limitation. It sounds like your hardware has several points where it could be a limitation. Unfortunately this kind of thing can end up being a "whack-a-mole" situation. Until you can get 95%-ish out of iperf, you are going to be constrained by that.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Did I just imagine another post in this thread that said that the motherboard would limit to no more than 4 Gbit/sec? I don't see it now, but I thought I had earlier.
 
Joined
Dec 29, 2014
Messages
1,135
Did I just imagine another post in this thread that said that the motherboard would limit to no more than 4 Gbit/sec? I don't see it now, but I thought I had earlier.
If you imagined it then so did I.
 

hellamasta

Dabbler
Joined
Apr 4, 2022
Messages
27
Ye
Did I just imagine another post in this thread that said that the motherboard would limit to no more than 4 Gbit/sec? I don't see it now, but I thought I had earlier.
yep, it has been deleted I think. He confused bit with bytes
I’ll perform some more test in the next days and let you know!
 

hellamasta

Dabbler
Joined
Apr 4, 2022
Messages
27
I
If you can't get past 3Gbps with iperf then you have a hardware or network limitation. It sounds like your hardware has several points where it could be a limitation. Unfortunately this kind of thing can end up being a "whack-a-mole" situation. Until you can get 95%-ish out of iperf, you are going to be constrained by that.
just figured out I put the intel nic in a pci-e 3,0 x1 slot (full length). I red that as it has got two sfp + ports, the nic shares the bandwidth between the two ports even if one of them is not used. This should mean 0.5 GB/s each port, or 4Gbps. If this is true, by moving the nic to a x8 slot it should work just fine.
 

hellamasta

Dabbler
Joined
Apr 4, 2022
Messages
27
Well, I just double checked the PCI-E thing, but nothing really improved. I also tested the host by running a linux live distro and got the exact same results.
However, I figured out that the Chelsio NIC in my NAS is way hot. I just can't touch the heatsink even if the server is just running with no intensive networking operations. I'm not sure, but I don't think this shoul be ok.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
No, that doesn't sound OK. If you don't have the card in a chassis with good airflow, see if you can point a fan at the heatsink.
 

hellamasta

Dabbler
Joined
Apr 4, 2022
Messages
27
No, that doesn't sound OK. If you don't have the card in a chassis with good airflow, see if you can point a fan at the heatsink.
The airflow is quite good 14cm intake (pointed to disks and nic) and a 12cm outtake (the CPU when transferring large files for about 20 min it's running at 39 degrees celsius).
It's strange that the NIC gets hot in a few minutes after switching up the nas even with very low network activity.
 

hellamasta

Dabbler
Joined
Apr 4, 2022
Messages
27
No, that doesn't sound OK. If you don't have the card in a chassis with good airflow, see if you can point a fan at the heatsink.
I installed a second old Celsio card on a old desktop and installed Linux. Connected it to my 10Gbps switch and got it communicating with my truenas at 9.6Gbps.
I think I can isolate the network speed problem to my desktop. I think to make the following test
1. Run a live Linux again and check if the performance are comparable to windows. If Linux works better or should be a driver issue
2. Double check the pci-e lines
3. Verify if the 10Gtek rj45 transceiver is not working well with my Intel x520
4. The Intel nic itself is not working well

Any other suggestion?
 
Joined
Dec 29, 2014
Messages
1,135
It sounds like you have a bad combination of OS/hardware. Even good hardware is crappy if the drivers in the OS are crappy. The X520's are supposed to work well with FreeNAS/TrueNAS. I have Chelsio T580's in my FreeNAS units and they work great. The transceiver not working well seems less likely to me. They can have intermittent failures, but it is more often a binary situation. Maybe somebody else has a suggestion for a 10G NIC for Windows. I don't have any occasion to run that.
 

hellamasta

Dabbler
Joined
Apr 4, 2022
Messages
27
It sounds like you have a bad combination of OS/hardware. Even good hardware is crappy if the drivers in the OS are crappy. The X520's are supposed to work well with FreeNAS/TrueNAS. I have Chelsio T580's in my FreeNAS units and they work great. The transceiver not working well seems less likely to me. They can have intermittent failures, but it is more often a binary situation. Maybe somebody else has a suggestion for a 10G NIC for Windows. I don't have any occasion to run that.
Thanks!
I’m waiting for a genuine Intel X540 T1 found at a good price. It is a little more recent and seems to be supported by Intel in Windows 11.
 

hellamasta

Dabbler
Joined
Apr 4, 2022
Messages
27
I figured out that the iperf3 speed is just a windows driver problem. I run again two different live linux distro on my desktop and with both of them iperf3 results were just OK (9.45 Gbps).
The problem is that intel driver support for windows is discontinued for X520. I found a driver on their website but it's not working properly. I fear that with the X540 I'm waiting I'm gonna get the same problem.
Any suggestion on a quite robust 10Gbps NIC for windows 11 desktops?
Thanks in advance!
 
Top