Communication speed over 10Gb SFP+ connection

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,079
I have a performance problem...
The NAS in question is not my personal system (from my signature) so here are some details:
The system chassis is a Supermicro "SSG-6049P-E1CR60L" with sixty 12TB drives.
The system is running TrueNAS-12.0-U2.1 and has been upgraded to 256GB of RAM.
CPU is a pair of Xeon Gold 5222s, but CUP utilization never goes over 50% and only for a few seconds at a time.
The storage pool is configured into 10 RAIDz2 vdevs (6 drives each) and is about 65% filled.
When transferring a large (8 TB) volume of data, read or write, the system slows to a crawl. Mbps speed instead of Gbps.
The network should be capable of much better, but I don't think it is a network problem.
We have a Cisco 10Gb switch and the workstations are all using Intel interface cards on Windows 10.

I have been trying to figure this out on my own, but I am asking for help...
Can anyone offer suggestions on what might be changed to improve the situation?
Is AutoTune any good under the latest version of TrueNAS? Would that help or hurt?

I am able to provide more information on the system configuration, but I wasn't sure what information would be helpful and didn't want to make the first post horribly long..
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,079
What about workstation<->workstation performance?
Untested. Storage is on the NAS, so I have only looked at communication between the NAS and workstations. Are you thinking it might be a problem in the switch ?
 

Touche

Explorer
Joined
Nov 26, 2016
Messages
55
If it's unaffected, the issue is probably with the NAS. If it's also slow, the issue is probably not with the NAS but could be either in the switch or the workstations (probably the switch if the issue is present on all the workstations accessing the NAS).

Are all the interfaces reporting a 10g connection? Is the switch reporting anything unusual on the ports in question?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
but I don't think it is a network problem.
Can you confirm it with a cp to NUL or something else local to the server first to see if you can get IO off the disks at a reasonable rate?
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
No idea, whether this is relevant. But relative to the other specs, the 256 GB RAM seem not very "generous".
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
I presume you've already confirmed via Iperf that the network pipes work?

Have you tried out intra-workstation test file transfers (in case this is a switch/workstation issue)?

Has the NAS been tested with multiple workstations?

You mention this happens with NAS reads and writes, suggesting it's not something as simple as a bad SLOG. However, I wonder if the fiber connection from the NAS to the switch may be bad - either a transceiver overheating or some similar heat-related failure mode in the switch. Easy enough to test by swapping stuff out.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
So you've got 720TB and 65% occupancy, with only 256GB RAM.

What's the workload? Is fragmentation expected? Are the disks maxxing out (check "gstat")?

Does iperf3 max out the NAS-side ethernet interface when run from multiple clients? Write *and* read problems feel like you should check the network stuff, but could also easily be ZFS not having sufficient ARC in a stressy environment, so checking the ARC stats would be good.

If this is a SMB server serving Win10 clients, 720TB seems like a lot of space, so it feels like the use case might play a factor here.

If the use case is merely storing or retrieving large sequential file images ("8TB") and only one client is doing this at a time, something seems off.
 

Borja Marcos

Contributor
Joined
Nov 24, 2014
Messages
125
I would check several things.

- Switch statistics for the network port and network statistics on the Trunas server. For example, ifconfig for network errors. Or netstat -s -p tcp for retransmissions which would point to some network and/or host congestion problem.

- As it has been suggested, iperf is your friend. Of course check for errors. What kind of SFP+ media are you using? Make sure the media (either fiber or twinax) is in good condition.

- Interrupt activity. Try systat -vmstat 1. Maybe too many interrupts? There is a sysctl that I believe could help: hw.intr_storm_threshold.

- You can use gstat to check for disk I/O problems. Start with gstat -p -I1s -o and check for abnormal latency. An example of what you can find out, I have a lazy disk in a server at home which latency is double the others in the pool.
 
Last edited:
Top