Communication speed over 10Gb SFP+ connection

Chris Moore · Mar 24, 2021

I have a performance problem...
The NAS in question is not my personal system (from my signature) so here are some details:
The system chassis is a Supermicro "SSG-6049P-E1CR60L" with sixty 12TB drives.
The system is running TrueNAS-12.0-U2.1 and has been upgraded to 256GB of RAM.
CPU is a pair of Xeon Gold 5222s, but CUP utilization never goes over 50% and only for a few seconds at a time.
The storage pool is configured into 10 RAIDz2 vdevs (6 drives each) and is about 65% filled.
When transferring a large (8 TB) volume of data, read or write, the system slows to a crawl. Mbps speed instead of Gbps.
The network should be capable of much better, but I don't think it is a network problem.
We have a Cisco 10Gb switch and the workstations are all using Intel interface cards on Windows 10.

I have been trying to figure this out on my own, but I am asking for help...
Can anyone offer suggestions on what might be changed to improve the situation?
Is AutoTune any good under the latest version of TrueNAS? Would that help or hurt?

I am able to provide more information on the system configuration, but I wasn't sure what information would be helpful and didn't want to make the first post horribly long..

Touche · Mar 24, 2021

What about workstation<->workstation performance?

Chris Moore · Mar 24, 2021

Touche said:
What about workstation<->workstation performance?

Untested. Storage is on the NAS, so I have only looked at communication between the NAS and workstations. Are you thinking it might be a problem in the switch ?

Touche · Mar 24, 2021

If it's unaffected, the issue is probably with the NAS. If it's also slow, the issue is probably not with the NAS but could be either in the switch or the workstations (probably the switch if the issue is present on all the workstations accessing the NAS).

Are all the interfaces reporting a 10g connection? Is the switch reporting anything unusual on the ports in question?

sretalla · Mar 24, 2021

Chris Moore said:
but I don't think it is a network problem.

Can you confirm it with a cp to NUL or something else local to the server first to see if you can get IO off the disks at a reasonable rate?

ChrisRJ · Mar 24, 2021

No idea, whether this is relevant. But relative to the other specs, the 256 GB RAM seem not very "generous".

Constantin · Mar 24, 2021

I presume you've already confirmed via Iperf that the network pipes work?

Have you tried out intra-workstation test file transfers (in case this is a switch/workstation issue)?

Has the NAS been tested with multiple workstations?

You mention this happens with NAS reads and writes, suggesting it's not something as simple as a bad SLOG. However, I wonder if the fiber connection from the NAS to the switch may be bad - either a transceiver overheating or some similar heat-related failure mode in the switch. Easy enough to test by swapping stuff out.

jgreco · Mar 25, 2021

So you've got 720TB and 65% occupancy, with only 256GB RAM.

What's the workload? Is fragmentation expected? Are the disks maxxing out (check "gstat")?

Does iperf3 max out the NAS-side ethernet interface when run from multiple clients? Write *and* read problems feel like you should check the network stuff, but could also easily be ZFS not having sufficient ARC in a stressy environment, so checking the ARC stats would be good.

If this is a SMB server serving Win10 clients, 720TB seems like a lot of space, so it feels like the use case might play a factor here.

If the use case is merely storing or retrieving large sequential file images ("8TB") and only one client is doing this at a time, something seems off.

Borja Marcos · Mar 26, 2021

I would check several things.

- Switch statistics for the network port and network statistics on the Trunas server. For example, ifconfig for network errors. Or netstat -s -p tcp for retransmissions which would point to some network and/or host congestion problem.

- As it has been suggested, iperf is your friend. Of course check for errors. What kind of SFP+ media are you using? Make sure the media (either fiber or twinax) is in good condition.

- Interrupt activity. Try systat -vmstat 1. Maybe too many interrupts? There is a sysctl that I believe could help: hw.intr_storm_threshold.

- You can use gstat to check for disk I/O problems. Start with gstat -p -I1s -o and check for abnormal latency. An example of what you can find out, I have a lazy disk in a server at home which latency is double the others in the pool.

Constantin · Mar 31, 2021

So... any updates, Chris?

Important Announcement for the TrueNAS Community.

Communication speed over 10Gb SFP+ connection

Chris Moore

Hall of Famer

Touche

Explorer

Chris Moore

Hall of Famer

Touche

Explorer

sretalla

Powered by Neutrality

ChrisRJ

Wizard

Constantin

Vampire Pig

jgreco

Resident Grinch

Borja Marcos

Contributor

Constantin

Vampire Pig

Similar threads

Important Announcement for the TrueNAS Community.

Communication speed over 10Gb SFP+ connection

Hall of Famer

Explorer

Hall of Famer

Explorer

Powered by Neutrality

Wizard

Vampire Pig

Resident Grinch

Contributor

Vampire Pig

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Communication speed over 10Gb SFP+ connection"

Similar threads