SOLVED 10GBit when receiving but not when sending

Status
Not open for further replies.

silbro

Dabbler
Joined
Sep 7, 2014
Messages
19
Hi all

I have a strange problem that I just noticed. I have 1 Freenas and 3 proxmox servers. All have a Mellanox ConnectX MT26448 card (10GBit). I have a switch with 4SFP+ ports. All ports on the switch show 10Gbit. All cards are, at least it seems to me, installed correctly. I did iperf tests between all 3 proxmox hosts where I always receive and can send 10Gbit.

When I connect from 1 proxmox host to freenas I also get 10Gbit/s:
root@proxmox03:~# iperf -c 192.168.111.5
------------------------------------------------------------
Client connecting to 192.168.111.5, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.111.12 port 60618 connected with 192.168.111.5 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 10.9 GBytes 9.37 Gbits/sec



but whenever I connect from the freenas to a proxmox host I only get 1GBit:
root@freenas:~ # iperf -c 192.168.111.12
------------------------------------------------------------
Client connecting to 192.168.111.12, TCP port 5001
TCP window size: 32.8 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.111.5 port 21453 connected with 192.168.111.12 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.10 GBytes 943 Mbits/sec


This is what I get when i do a ifconfig on the freenas:
mlxen0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=ed07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:02:c9:54:f0:c8
hwaddr 00:02:c9:54:f0:c8
inet 192.168.111.5 netmask 0xffffff00 broadcast 192.168.111.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (10Gbase-SR <full-duplex,rxpause,txpause>)
status: active


I have a 1GB card on freenas which is on a seperate subnet. I did a traceroute to see if it was going the right way:
root@freenas:~ # traceroute 192.168.111.12
traceroute to 192.168.111.12 (192.168.111.12), 64 hops max, 40 byte packets
1 192.168.111.12 (192.168.111.12) 0.157 ms 0.110 ms 0.076 ms


I'm really confused. What could the problem be? Could this actually be a faulty card, port or cable? If so, wouldn't it either not work at all or be 1Gig all the way or the switch wouldnt show 10GBit? I don't know where I should start debugging.

Thanks for your tips!

Edit:
When I pull a file from the freenas at the same time from all 3 proxmox hosts I get about 110MB/s for each host. Meaning the card is capable of actually transferring more than 1 GBit/s simultaniously. It just seems it is "capped" at 1GBit/s for each host. I haven't found such a setting in freenas.

My freenas Version btw is:
FreeNAS-11.1-RELEASE

and when I monitor the traffic I can see it all going through the mellanox card. It shows up to 3G on the monitor.
 
Last edited:
Joined
Dec 29, 2014
Messages
1,135
It is likely the TCP window size. That is the amount of data that the host will leave unacknowledged in a TCP connection before pausing transmission. Look at the two different sessions. The slower one has a TCP window size that ~=1/3 of the faster one.
 

silbro

Dabbler
Joined
Sep 7, 2014
Messages
19
Thanks for your reply!

I adapted the Window size to 85KB. I still get the same speed results. Strange is also the transfer speed of a big file where it limits it to 1GBit/s. I'm writing to a local SSD which can handle up to 400MB/s and I also get that result when reading from a proxmox node and writing to another proxmox node. Here is the result with the adapted 85KB TCP window:

root@freenas:~ # iperf -w 85KB -c 192.168.111.12
------------------------------------------------------------
Client connecting to 192.168.111.12, TCP port 5001
TCP window size: 85.5 KByte (WARNING: requested 85.0 KByte)
------------------------------------------------------------
[ 3] local 192.168.111.5 port 42227 connected with 192.168.111.12 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.09 GBytes 931 Mbits/sec
 
Joined
Dec 29, 2014
Messages
1,135

silbro

Dabbler
Joined
Sep 7, 2014
Messages
19
Ok so I agree with you on this iperf test not being meaningful in this case. But the reading speeds of a large file are. So it is very strange that I get exactly the limit of 1GBit network when pulling files from the freenas but getting more than 1GBit/s when writing to the freenas. Since I can pull a file from freenas with exactly 1GBit/s speed multiple times parallell it prooves to me that it still is somehow capped at 1 GBit/s. How would I go about finding the error in this case since without iperf?

Here I pulled the same ISO file 3 times at the exact same time (/temp is the mounted CIFS NAS share, I also tried rsync directly without the CIFS share):
root@proxmox01:~# rsync -ah --progress /temp/SW_DVD5_NTRL_Win_10_1607_64Bit_English_Home_Pro_X21-05999.ISO /tmp/
sending incremental file list
SW_DVD5_NTRL_Win_10_1607_64Bit_English_Home_Pro_X21-05999.ISO
4.38G 100% 108.72MB/s 0:00:38 (xfr#1, to-chk=0/1)


root@proxmox03:~# rsync -ah --progress /temp/SW_DVD5_NTRL_Win_10_1607_64Bit_English_Home_Pro_X21-05999.ISO /data_ssd/
sending incremental file list
SW_DVD5_NTRL_Win_10_1607_64Bit_English_Home_Pro_X21-05999.ISO
4.38G 100% 109.64MB/s 0:00:38 (xfr#1, to-chk=0/1)


root@proxmox02:/data_ssd# rsync -ah --progress /temp/SW_DVD5_NTRL_Win_10_1607_64Bit_English_Home_Pro_X21-05999.ISO /data_ssd/
sending incremental file list
SW_DVD5_NTRL_Win_10_1607_64Bit_English_Home_Pro_X21-05999.ISO
4.38G 100% 81.37MB/s 0:00:51 (xfr#1, to-chk=0/1)


The total read speed in this case would be around 300MB/s but if I do the same test only on 1 host I dont get any speed advantage. 109MB/s is the most.

This was a write to the freenas:
root@proxmox03:~# rsync --progress /data_ssd/SW_DVD5_NTRL_Win_10_1607_64Bit_English_Home_Pro_X21-05999.ISO root@192.168.111.5:/mnt/data1/testwrite
root@192.168.111.5's password:
SW_DVD5_NTRL_Win_10_1607_64Bit_English_Home_Pro_X21-05999.ISO
4,380,387,328 100% 135.88MB/s 0:00:30 (xfr#1, to-chk=0/1)


Yes I know the write isn't really fast but it's over 1GBit/s and my hdds in the freenas are probably the limiting factor here. (When I write the same file from 2 hosts at the same time I get 120 MB/s each, hmm strange...)
 
Last edited:
Joined
Dec 29, 2014
Messages
1,135
What do you get trying to read the the file from the CLI of FreeNAS? Something like this:
dd bs=1m if=/mnt/ZPOOL_Path/SW_DVD5_NTRL_Win_10_1607_64Bit_English_Home_Pro_X21-05999.ISO of=/dev/null

Edit: you can play with the block size. bs=32k or something like that. I just lobbed in a tactical nuke with the 1m.
 

silbro

Dabbler
Joined
Sep 7, 2014
Messages
19
So on the freenas itself:
root@freenas:~ # dd bs=1m if=/mnt/data1/SW_DVD5_NTRL_Win_10_1607_64Bit_English_Home_Pro_X21-05999.ISO of=/dev/null
4177+1 records in
4177+1 records out
4380387328 bytes transferred in 14.844323 secs (295088393 bytes/sec)


root@freenas:~ # dd bs=32K if=/mnt/data1/SW_DVD5_NTRL_Win_10_1607_64Bit_English_Home_Pro_X21-05999.ISO of=/dev/null
133678+1 records in
133678+1 records out
4380387328 bytes transferred in 15.700741 secs (278992387 bytes/sec)



and on the CIFS share from a proxmox node:
root@proxmox03:~# dd bs=1M if=/temp/SW_DVD5_NTRL_Win_10_1607_64Bit_English_Home_Pro_X21-05999.ISO of=/dev/null
4177+1 records in
4177+1 records out
4380387328 bytes (4.4 GB, 4.1 GiB) copied, 37.4211 s, 117 MB/s


root@proxmox03:~# dd bs=32K if=/temp/SW_DVD5_NTRL_Win_10_1607_64Bit_English_Home_Pro_X21-05999.ISO of=/dev/null
133678+1 records in
133678+1 records out
4380387328 bytes (4.4 GB, 4.1 GiB) copied, 37.5097 s, 117 MB/s
 
Last edited:
Joined
Dec 29, 2014
Messages
1,135
If my math is correct, the 1M block size yielded ~=288MB/s. Your network write from FreeNAS can't possibly be any faster than reading the file off the disk locally. I would say that there is something in your hardware or pool configuration that is constraining the read rates which corresponds to slow reads from network shares. You would be able to get more help from the group if you provide some detailed info about your FreeNAS setup. It is all in the forum guidelines, but things like motherboard info, CPU, RAM, drive controller, etc. As far as the pool goes, something like the output of zpool list -v. Ideally the output of the zpool command would be inside of a CODE block.

Edit: it does really appear that somehow your reads from the proxmox nodes are being constrained to gigabit speed. 125MB/s being the theoretical maximum of gigabit link.
 

silbro

Dabbler
Joined
Sep 7, 2014
Messages
19
Sure, here some additional information on my systems:

Freenas Hardware
  • Supermicro X10SLL-F
  • Xeon E3-1200v3
  • Kingston Memory DDR3 ECC 16GB
  • 6*2TB HDDS (WD Se and Black, both intended for 24/7 use) -> all HDDs connected directly to the motherboard
  • Sandisk Cruzer 16GB for freenas operating system
  • Mellanox MNPA19-XTR 10GB

Proxmox Hardware
  • Supermicro X10SLL-F
  • Intel Dual Core i3
  • Kingston Memory DDR3 ECC 16GB
  • Mellanox MNPA19-XTR 10GB

Freenas zpool:
Code:
root@freenas:~ # zpool list -v
NAME									 SIZE  ALLOC   FREE  EXPANDSZ   FRAG	CAP  DEDUP  HEALTH  ALTROOT
data1								   5.44T   483G  4.97T		 -	 4%	 8%  1.00x  ONLINE  /mnt
  mirror								1.81T   158G  1.66T		 -	 4%	 8%
	gptid/130b9b8c-aec3-11e8-8468-0025904657ff	  -	  -	  -		 -	  -	  -
	gptid/13c3fdd5-aec3-11e8-8468-0025904657ff	  -	  -	  -		 -	  -	  -
  mirror								1.81T   162G  1.65T		 -	 4%	 8%
	gptid/14932ae6-aec3-11e8-8468-0025904657ff	  -	  -	  -		 -	  -	  -
	gptid/1562a284-aec3-11e8-8468-0025904657ff	  -	  -	  -		 -	  -	  -
  mirror								1.81T   163G  1.65T		 -	 4%	 8%
	gptid/162cc16f-aec3-11e8-8468-0025904657ff	  -	  -	  -		 -	  -	  -
	gptid/16dda63b-aec3-11e8-8468-0025904657ff	  -	  -	  -		 -	  -	  -
freenas-boot							14.9G   847M  14.0G		 -	  -	 5%  1.00x  ONLINE  -
  da0p2								 14.9G   847M  14.0G		 -	  -	 5%
 

silbro

Dabbler
Joined
Sep 7, 2014
Messages
19
I just got a mail from my freenas system with many of these entries:

Code:
arp: 192.168.111.12 moved from 00:02:c9:54:f9:e4 to 00:25:90:46:53:81 on mlxen0
arp: 192.168.111.11 moved from 00:25:90:44:10:e1 to 00:02:c9:54:fc:68 on mlxen0
arp: 192.168.111.10 moved from 00:25:90:46:5c:6d to 00:02:c9:54:9d:bc on mlxen0
arp: 192.168.111.12 moved from 00:02:c9:54:f9:e4 to 00:25:90:46:53:81 on mlxen0
arp: 192.168.111.11 moved from 00:02:c9:54:fc:68 to 00:25:90:44:10:e1 on mlxen0
arp: 192.168.111.10 moved from 00:25:90:46:5c:6d to 00:02:c9:54:9d:bc on mlxen0
arp: 192.168.111.12 moved from 00:25:90:46:53:81 to 00:02:c9:54:f9:e4 on mlxen0
arp: 192.168.111.11 moved from 00:25:90:44:10:e1 to 00:02:c9:54:fc:68 on mlxen0
arp: 192.168.111.10 moved from 00:25:90:46:5c:6d to 00:02:c9:54:9d:bc on mlxen0
arp: 192.168.111.12 moved from 00:02:c9:54:f9:e4 to 00:25:90:46:53:81 on mlxen0
arp: 192.168.111.11 moved from 00:25:90:44:10:e1 to 00:02:c9:54:fc:68 on mlxen0
arp: 192.168.111.10 moved from 00:25:90:46:5c:6d to 00:02:c9:54:9d:bc on mlxen0
arp: 192.168.111.12 moved from 00:02:c9:54:f9:e4 to 00:25:90:46:53:81 on mlxen0
arp: 192.168.111.11 moved from 00:25:90:44:10:e1 to 00:02:c9:54:fc:68 on mlxen0


what does this mean? Could this be the cause of the issue?
 
Joined
Dec 29, 2014
Messages
1,135
what does this mean? Could this be the cause of the issue?

I would bet you a non-trivial sum of money that it means that you have a workstation that is bridging between its 1G and 10G network interfaces. See the manufacturers to which those MAC addresses are assigned.

Results
00:02:C9 Mellanox Mellanox Technologies, Inc.
00:25:90 SuperMic Super Micro Computer, Inc.

Supermicro motherboard builtin NIC port perhaps?
 

silbro

Dabbler
Joined
Sep 7, 2014
Messages
19
So yes I have an onboard NIC for IPMI and one for managing plus the PCI card with the 10GB. In total there are 3 onboard and 1 pcie card. The same for all servers. I only configured the IPMI and 1 NIC for managing servers (192.168.10.0/24) and the PCI card for accessing the freenas storage because of its 10GB (192.168.111.0). I made a simple picture so anyone reading this thread understands what I mean:
setup.png


So yes, all network cards go to the same switch and I wanted to separate the traffic by using 2 subnets. The MACs I posted above really is the MAC of the onboard NIC and the one from the 10GBit card that switches. I really am confused of how this is possible. When I look at the arp cache it looks fine.

Bridging is setup like this on the nodes (enp2s0 being the pcie 10GB card):

Code:
auto lo
iface lo inet loopback

iface eno2 inet manual

iface eno1 inet manual

auto enp2s0
iface enp2s0 inet static
		address  192.168.111.10
		netmask  255.255.255.0

auto vmbr0
iface vmbr0 inet static
		address  192.168.10.10
		netmask  255.255.255.0
		gateway  192.168.10.1
		bridge_ports eno2
		bridge_stp off
		bridge_fd 0
 
Joined
Dec 29, 2014
Messages
1,135
It isn't complicated. You have 3 IP addresses (192.168.111.10, 192.168.111.11, 192.168.111.12) that are each configured on multiple hosts or NIC's. Check this out.
Code:
   4 arp: 192.168.111.10 moved from 00:25:90:46:5c:6d to 00:02:c9:54:9d:bc on mlxen0
   1 arp: 192.168.111.11 moved from 00:02:c9:54:fc:68 to 00:25:90:44:10:e1 on mlxen0
   4 arp: 192.168.111.11 moved from 00:25:90:44:10:e1 to 00:02:c9:54:fc:68 on mlxen0
   4 arp: 192.168.111.12 moved from 00:02:c9:54:f9:e4 to 00:25:90:46:53:81 on mlxen0
   1 arp: 192.168.111.12 moved from 00:25:90:46:53:81 to 00:02:c9:54:f9:e4 on mlxen0

The first column is the number of times that line appears in you output.

Another really useful tool is looking up what manufacturer own the MAC address in question. https://www.wireshark.org/tools/oui-lookup.html
It can be really helpful to know if you have a mystery device squatting on an IP address, and you are trying to figure out what it is.

You have IP conflicts. Period. Things will be wonky until you resolve that.

Edit: You are bridging the NIC's together. The question becomes where is the root bridge. It could be switching around, and that could be why the ARP entry is changing. IMHO, this is a configuration that is going to continue to cause you pain. Get rid of bridging on the hosts. If a host has a 10G connection to you switch, it doesn't also need a 1G connection. There does not appear to be any VLAN stuff going on. Perhaps you could try a controlled test and disconnect all the 1G ports/cable from your switch. See if your results change then.
 
Last edited:

silbro

Dabbler
Joined
Sep 7, 2014
Messages
19
Well I just double and triple checked all servers for duplicate IPs and I didn't find any. Plus the network is isolated with just these 4 machines. It just seems that e.g. server1 jumps from its pcie card to its onboard nic and then it switches again from its onboard nic to the pcie card. That's what is happening on all 3 nodes (at least from the freenas log point of view). I'm guessing the way I configured the cards (like posted above) must have trouble with freenas or is just simply wrong (even though it's standard when looking at the proxmox manual). But I think this is for sure causing the issue somehow. Thanks for all the support thus far @Elliot Dierksen !!
 
Joined
Dec 29, 2014
Messages
1,135
If you want actual separation between the 10G and 1G networks, then you are doing it wrong. It is hard for me to say why the bridging code is switching to the 1G interface, but I would encourage you in the strongest possible terms to configure this a different way. What kind of switch do you have, and is it capable of building separate VLAN's?
 
Last edited:

silbro

Dabbler
Joined
Sep 7, 2014
Messages
19
I have this option on my switch. I just did the VLANs. Suprisingly I now get the full 10GB when I do iperf (yes I know we talked about it not being relevant for measuring ;) ) also when I mount a cifs share and I do the dd... .command I get much better performance:

root@proxmox01:/temp# dd bs=1M if=/temp/SW_DVD5_NTRL_Win_10_1607_64Bit_English_Home_Pro_X21-05999.ISO of=/dev/null
4177+1 records in
4177+1 records out
4380387328 bytes (4.4 GB, 4.1 GiB) copied, 14.5806 s, 300 MB/s


When I rsync I still get 1GB performance when doing it from the Freenas:
root@freenas:/mnt/data1/ # rsync -W --progress /mnt/data1/SW_DVD5_NTRL_Win_10_1607_64Bit_English_Home_Pro_X21-05999.ISO root@192.168.111.12:/dev/null
root@192.168.111.12's password:
SW_DVD5_NTRL_Win_10_1607_64Bit_English_Home_Pro_X21-05999.ISO
4,380,387,328 100% 101.48MB/s 0:00:41 (xfr#1, to-chk=0/1)


Doing the same command on node I get better performance:
root@proxmox03:/data_ssd# rsync --progress /data_ssd/SW_DVD5_NTRL_Win_10_1607_64Bit_English_Home_Pro_X21-05999.ISO root@192.168.111.5:/mnt/data1/testwrite
root@192.168.111.5's password:
SW_DVD5_NTRL_Win_10_1607_64Bit_English_Home_Pro_X21-05999.ISO
4,380,387,328 100% 135.06MB/s 0:00:30 (xfr#1, to-chk=0/1)


Maybe this is just a rsync issue from the freenas side. The errors also have not reappeared in the log.

Is ther another command or test I can do to write from the freenas to a node appart from rsync that you know of? Does this even matter, I mean isn't "pulling" a file from freenas the same as "pushing" it from freenas to another client?

Thanks man, you already helped a bunch and things are starting to look really good! :D:cool:
 
Joined
Dec 29, 2014
Messages
1,135
When I said VLAN's shouldn't be relevant, I didn't realize that the hosts were bridging their network interfaces together. I am glad to hear that it is working better for you.

Regarding rsynch, it has many fine attributes but getting every ounce of performance out of the network isn't one of them. Using dd, iperf, CIFS copy, or ftp are more relevant benchmarks. I barely got 1G performance out of two systems that can sustain 8G read and 4G write.
 

silbro

Dabbler
Joined
Sep 7, 2014
Messages
19
Just wanted to say thanks to you @Elliot Dierksen ! This problem has been resolved :)

In short for others reading this thread:

If you want to separate traffic and only have 1 switch and multiple NIC connected to it, use VLAN to separate traffic! Separate subnets won't suffice as it seems.
 
Status
Not open for further replies.
Top