Slow write speed over network (reboot solves it for couple of days)

whizzle

Cadet
Joined
Oct 25, 2021
Messages
7
Hi all,

I have a problem that I am not able to solve and hope someone could help me :) I am running the following server with TrueNAS-13.0-U3.1 installed :
  • Motherboard make and model: fujitsu d3644-b
  • CPU make and model: Pentium G5600
  • RAM quantity: 32gb ECC
  • Hard drives, quantity, model numbers, and RAID configuration, including boot drives: boot drive is SSD mirror, ZFS pool: 6x 4tb WD RED (EFRX model) the CMR type
  • Hard disk controllers:
  • Network cards: onboard
The problem is that after a couple of days the speed over SMB share drops ~700kb/s when writing, reading remains OK. At first I thought something was wrong with the ZFS pool.
But the weekly scrub tests and long SMART tests are all OK. Furthermore I did a FIO test in the "good" state after a reboot and the "bad" state after a couple of days. Performance is the same:

Result of testing single 1MiB random write process:
fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=1m --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1

result: (after server reboot with good network speed ~110mb/s)
run status group 0 (all jobs): WRITE: bw=345MiB/s (361MB/s), 345MiB/s-345Mib/s (361MB/s-361Mb/s), io=22.7 GiB (24.4GB), run=67540-67540msec

result: (after couple of days with bad network speed ~1mb/s)
run status group 0 (all jobs): WRITE: bw=375MiB/s (393MB/s), 375MiB/s-375MiB/s (393MB/s-393MB/s), io=24,4 GiB (26.2GB), run=66563-66563msec

So i'm personally ruling out my pool. And speed seems in my opinion what you would expect from such drives.

As a second trouble shooting step i'm focusing now on my network connection.
Running iperf3 from a linux box on the same network gives me the following results:

After reboot (looks good):

iperf3 -c 1192.168.10.11 -bidir
Connecting to host 192.168.10.11, port 5201
[ 5] local 192.168.10.158 port 43080 connected to 192.168.10.11 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 114 MBytes 959 Mbits/sec 0 757 KBytes
[ 5] 1.00-2.00 sec 111 MBytes 933 Mbits/sec 0 757 KBytes
[ 5] 2.00-3.00 sec 111 MBytes 933 Mbits/sec 0 757 KBytes
[ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 0 757 KBytes
[ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 0 757 KBytes
[ 5] 5.00-6.00 sec 111 MBytes 933 Mbits/sec 0 757 KBytes
[ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 757 KBytes
[ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 0 757 KBytes
[ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 0 757 KBytes
[ 5] 9.00-10.00 sec 111 MBytes 933 Mbits/sec 0 757 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.09 GBytes 936 Mbits/sec 0 sender
[ 5] 0.00-10.01 sec 1.09 GBytes 933 Mbits/sec receiver

Than after a couple of days speed drops by a factor 20:

Connecting to host 192.168.10.11, port 5201
[ 5] local 192.168.10.158 port 54886 connected to 192.168.10.11 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 6.15 MBytes 51.6 Mbits/sec 1464 4.24 KBytes
[ 5] 1.00-2.00 sec 6.02 MBytes 50.5 Mbits/sec 1440 1.41 KBytes
[ 5] 2.00-3.00 sec 6.32 MBytes 53.0 Mbits/sec 1510 4.24 KBytes
[ 5] 3.00-4.00 sec 5.68 MBytes 47.7 Mbits/sec 1284 1.41 KBytes
[ 5] 4.00-5.00 sec 6.20 MBytes 52.0 Mbits/sec 1504 2.83 KBytes
[ 5] 5.00-6.00 sec 6.41 MBytes 53.8 Mbits/sec 1488 2.83 KBytes
[ 5] 6.00-7.00 sec 6.26 MBytes 52.5 Mbits/sec 1508 2.83 KBytes
[ 5] 7.00-8.00 sec 6.32 MBytes 53.0 Mbits/sec 1516 2.83 KBytes
[ 5] 8.00-9.00 sec 5.86 MBytes 49.2 Mbits/sec 1401 2.83 KBytes
[ 5] 9.00-10.00 sec 6.11 MBytes 51.2 Mbits/sec 1457 2.83 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 61.3 MBytes 51.4 Mbits/sec 14572 sender
[ 5] 0.00-10.00 sec 61.2 MBytes 51.3 Mbits/sec receiver


My attention is on the high Retr number? That does not seem right. Any further steps I could take?
I'm using the onboard intel NIC. Changing cable or ports on the switch did not solve it as well.

Thanks in advance for the help. Much appriciated.

regards,
Whizzle
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
You seem to have identified a clear issue with the network.

Lots of retries could be related to a duplex mismatch...

How have you set the port on the switch?

What happens if you unplug and re-connect the cable to the switch without changing anything else?
 

whizzle

Cadet
Joined
Oct 25, 2021
Messages
7
I tried a couple of things:
Unplug and replug the same cable.
Use a different port on the switch
Swap out the cable

All without effect. The switch in use is a unify 24p Gen 2.
The port has default config. No VLAN configured:

1673960423830.png
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I had seen you had changed the cable, but was more curious if you just forced a renegotiation by re-connecting while still powered on, would that bring back the full speed?
 

whizzle

Cadet
Joined
Oct 25, 2021
Messages
7
Ah ok, What I did is indeed reconnect the cable while still powered on. That did not solve the issue unfortunately.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
OK... so somehow the network stack on the TrueNAS side getting a reset (via reboot) is the only thing that makes it normal again.

I would report a bug and see if sharing some debugs can get some progress on it.

With an Intel NIC, I don't normally expect to see this kind of bad behavior.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

whizzle

Cadet
Joined
Oct 25, 2021
Messages
7
In the meanwhile you can set a cronjob to reboot once a day.
Indeed that is exactely how I solved it currently.

Does it make sense to test a different NIC? I do have a spare Intel 1gbit card laying around.

regards,
Yori
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Does it make sense to test a different NIC? I do have a spare Intel 1gbit card laying around.
It wouldn't hurt to eliminate another variable
 

whizzle

Cadet
Joined
Oct 25, 2021
Messages
7
Allright thanks for the help sretalla, appreciated.
I will monitor it for a while and report a bug if this keeps happening
 

whizzle

Cadet
Joined
Oct 25, 2021
Messages
7
Today I found out A reboot of the switch is also leading to the issue. Not sure if that is a good lead. Rebooting the server is solving it. Seems indeed some sort of negotiation issue?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
If you have access to the console directly, you may be able to test ifconfig [interface name] down and up on the interface and see if that does anything to force the renegotiation.

If you don't have access to the console directly, it's a bit more dangerous to do that, but you can try both commands in one ifconfig em0 down && ifconfig em0 up (make sure to replace em0 with your actual interface name)
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
Today I found out A reboot of the switch is also leading to the issue. Not sure if that is a good lead. Rebooting the server is solving it. Seems indeed some sort of negotiation issue?
Could you temporarily use a different switch? Just to narrow things down further.
 
Top