Performance issue with 10Gbps network

Status
Not open for further replies.

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
It sure seems that way.

So I did a clean install of Windows 8.1. Exact same issue(s).

Did another clean install of Windows 10 (no fast ring this time). No change

Switched to Eth0 SFP+ port on workstation X520. No change

Swapped out Intel SFP+ module in Eth0. No change.

Cleaned the tips of the fiber cable ends with alcohol. No change.

I guess the next logical step would be to drag my workstation down in the basement where the FreeNAS server is, and connect them back to back using a twinax SFP+ cable, so see if that makes a difference.
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
I picked up a single port X520 card and am borrowing a twinax copper SFP+ cable from a friend of mine. I'm determined to get to the bottom of what the issue is. I plan to connect 2 workstations back to back using the twinax cable, both booted into FreeNAS and go from there. This way I won't impact my "production" network and make all my main FreeNAS users unhappy.
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
Single port X520 card showed up today. First test I did was to swap the dual port card on the workstation with the new card and retest in both directions:

Code:
[root@freenas] ~# iperf -s -p 5001 -w 512k
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  512 KByte
------------------------------------------------------------
[  4] local 10.0.1.50 port 5001 connected with 10.0.1.229 port 49544
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  5.95 GBytes  5.10 Gbits/sec
^C[root@freenas] ~# iperf -c 10.0.1.229 -p 5001 -w 512k
------------------------------------------------------------
Client connecting to 10.0.1.229, TCP port 5001
TCP window size:  513 KByte (WARNING: requested  512 KByte)
------------------------------------------------------------
[  3] local 10.0.1.50 port 44384 connected with 10.0.1.229 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   646 MBytes   542 Mbits/sec


This is the first time I have seen more than 5 Gbps single thread going from the win10 machine to the FreeNAS server The other direction remains terrible. So I have ruled out it being an issue with the dual-port X520 NIC in the workstation at this point I believe.

Still waiting on another twinax copper SFP+ cable so I'm not quite ready to do back to back testing between workstations, although patience might get the better of me and I'll end up converting the FreeNAS server back to GigE temporarily to free up that twinax cable.
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
Just for grins, I installed the dual port X520 in a 2nd workstation also running win10 and ran the exact same test. My results were as follows (again viewed from the FreeNAS session):

Code:
[root@freenas] ~# iperf -s -p 5001 -w 512k
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  512 KByte
------------------------------------------------------------
[  4] local 10.0.1.50 port 5001 connected with 10.0.1.53 port 49649
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  5.14 GBytes  4.41 Gbits/sec
^C[root@freenas] ~# iperf -c 10.0.1.53 -p 5001 -w 512k
------------------------------------------------------------
Client connecting to 10.0.1.53, TCP port 5001
TCP window size:  513 KByte (WARNING: requested  512 KByte)
------------------------------------------------------------
[  3] local 10.0.1.50 port 62522 connected with 10.0.1.53 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.11 GBytes   953 Mbits/sec


This other machine is Celeron G1840 (Haswell) with an Asus H87 based chipset. So copying to the FreeNAS server was slight slower, but copying to the workstation was almost twice as fast.

Now I almost have to go convert the server back to GigE and grap that SFP+ twinax to get to the bottom of what the heck is going on. I'm starting to suspect that 15 meter fiber cable...
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
Results from win10 to win10 workstations using 2 meter twinax cable:

win10towin10.PNG


So I'm getting almost 9 Gbps single threated going from the Celeron to the i7 and around 6 Gbps going to the Celeron with CPU at almost 90%.

So my conclusion so far is that the win10 drivers are fine, the NICs are fine, and the only reason I'm not getting 9 Gbps in one direction is due to the crappy CPU.

So next step will be to do another test between these 2 win10 machines, only this time, instead of using the 2 meter twinax cable, to do it across the 15 meter fiber cable with the SFP+ transceivers in place.
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
Win 10 box has Intel i7-4770K Haswell CPU and Z87 motherboard (Asus MAXIMUS VI HERO) and 2x DDR3-2400 4Gig memory sticks. X520 card is sitting in PCIe x16 slot.
Just thinking outside the box here and wondering if this isn't the PCIe bus???
Which slot is the NIC installed in? The two I have circled below are X8 (physical)
maximus-hero.jpg
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
The NIC is installed in the red PCIe slot that you don't have circled (the top one in the pic).

There are no other cards installed in that workstation. Here's a pic:

10g4770K.JPG


And here's the Celeron machine I'm using to test against. It only has a single 16x slot:

10gceleron.JPG


I think based on the most recent test I conducted, both workstations are ok other than the Celeron being a little weak to act as the server for a 10g iperf test.

Speaking of placing the 10G NICs in the correct PCIe slots, below is how I have things installed on my X10 motherboard. I have used up all the 16x/8x slots at this point with 3 8x LSI HBAs and the 8x X520.

10gx10.JPG


As stated above, I think my next test is going to be to ensure the 15 meter fiber cable between the workstation in my office and the server room in the basement is ok, along with the transceivers on each end. I might also boot each workstation into FreeBSD and test that way. I bet my numbers will be higher.
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
I'm paying very close attention to all this.
The only piece I'm missing to start my experiments, is my switch.
I'm involved in a couple of ebay auctions and hoping to end
up with an awesome bargain.
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
So I took the win10 test box to the basement and ran the test again, this time over the fiber link with the Intel transceivers. Pretty much same results as before using the twinax cable.

win10towin10overfiber.PNG


Next I booted both computers into FreeNAS, and here are the results going in both directions with a single thread:

freenastofreenasfiber.JPG


That is the performance I was expecting. So at this time we can rule out the issue being with the Intel transcievers, the X520 NICs or the 15 meter fiber cable run.

Next I inserted the switch between the freenas servers, one side connected via the 2 meter twinax cable and the other side side via the 15 meter fiber cable.

Now we're back to having poor performance when the machine connected to the switch via twinax copper is being the iperf host, and terrible performance when the machine connected to the switch via the fiber cable is the host.

freenastofreenasswitch01.JPG


freenastofreenasswitch02.JPG


The fiber cable has an Intel transceiver on the computer end, which has already been validated is being ok. The transceiver on the switch end is this Dell unit:

dellsfp+1.JPG


A look down the barrel:

dellsfp+2.JPG


So the issue appears to be either with the Dell transceiver or with the switch.

Once I get another twinax cable, I'll be able to remove the Dell from the equation and determine if the switch is the culprit.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Is the transceiver validated for your switch?
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
Based on my research, these appear to be the only ones that are supposed to work:

Code:
Dell FTLX8571D3BCL    $38.79 from California (one I'm using currently)
Dell FTLX1371D3BCL    $96.95 from Connecticut
MikroTik S+85DLC03D    $50.95 + $3.95 shipping from Latvia


I think I'll give the MikroTik a shot being that it cost half that of the other Dell. Only downside is it will ship from Latvia.

I found the one I'm currently using for $23 with shipping. I guess the old saying "you get what you pay for" holds, even for SFP+ modules... Given how much time I have wasted on this already, the MikroTik would have more than paid for itself. Assuming of course that it actually works. I'm to hold off until I have tested with 2 twinax cables before ordering another 10G transceiver however.

To bad there isn't a standard for SFP+ modules, at least that appears to not be the case.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
To bad there isn't a standard for SFP+ modules, at least that appears to not be the case.
There is, apparently, but it's ignored in the name of vendor lock-in customer experience. Yeah, that's it...
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
So I got my 2nd twinax cable today and was able to test with both servers connected to the switch via copper cables.

twinaxtotwinax1024k.PNG


So nice consistent 9.4 Gbps in both directions now single threaded. So yeah, the switch is fine, it's that damn "customer experience" crap using the Dell transceiver that likely caused the issue.

Turns out Newegg had the MikroTik S+85DLC03D in stock, so I got one on the way. Fingers crossed swapping the Dell for the MikroTik will take care of the issue for good!
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
So I got the MikroTik transceiver in today. This guy:

mikrotik.JPG


I ran my usual iperf tests:

freenaswin10.PNG


So slightly over 5 Gbps towards the FreeNAS server single thread and slightly over 6 Gbps dual thread. The other direction, out from the FreeNAS server, I'm getting 8 Gbps single thread and "full speed" 9.4 Gbps dual thread.

Doing actual file copies the numbers get worse. About 1.5 Gbps towards the server:

filecopytofreenas.PNG


And about 7 Gbps from the server:

filecopytowin10.PNG


Not sure why C: shows 100% during the file copy to the server. I'll look into that, but there still seems to be some issues with either the new MikroTik transceiver, the switch, or perhaps something related to the server being connected via twinax and the workstation connected via fiber...
 

cfgmgr

Cadet
Joined
Jan 9, 2015
Messages
9
You can check the logs in Windows for something along the lines of "Unqualified SFP module detected" shortly after boot. Seen this verbiage in Windows/Linux/VMware and could certainly be the cause of grumpy/misbehaving systems. In the same breath I've also seen systems with that that seem to work perfectly... regardless it's another thing to check for. If the card does not like you SFP, it will typically let you know.
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
So the issue, I think, is with the SFP+ modules in the switch, and not with the SFP+ modules in the X520 cards. I say this because the modules in the cards are the ones that the card shipped with. In fact, from what I understand, you are not supposed to use anything but the Intel FTLX8571D3BCV transceivers in the X520 NIC. When I get the time, I plan to connect my Win10 workstations directly to the FreeNAS server and do some file transfers to see how it goes. Assuming I get close to 10 Gbps througput on copying in both directions, which I suspect that I will, I can then take those results to Ubiquiti's support website and see if they can explain why my numbers are much less when I go through their switch, which is supposedly rated at 70 Gbps non-blocking throughput.
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
So I got a 2nd MikroTik transceiver (same as the first one) and a 2 meter OM3 fiber cable. So I now have identical connections from my workstation to the switch as well as from my FreeNAS server to the switch. They are both as follows:

Intel X520 <-> Intel FTLX8571D3BCV <-> OM3 fiber <-> MikroTik S+85DLC03D <-> UniFi Switch

Testing from workstation to FreeNAS:

Code:
[root@argon] ~# iperf -s -p 5001 -w 1024k
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 3.65 GBytes 3.13 Gbits/sec

[ 4] 0.0-10.0 sec 2.22 GBytes 1.90 Gbits/sec
[ 5] 0.0-10.0 sec 2.21 GBytes 1.89 Gbits/sec
[SUM] 0.0-10.0 sec 4.42 GBytes 3.79 Gbits/sec

[ 6] 0.0-10.0 sec 2.04 GBytes 1.75 Gbits/sec
[ 4] 0.0-10.0 sec 2.05 GBytes 1.76 Gbits/sec
[ 7] 0.0-10.0 sec 2.06 GBytes 1.77 Gbits/sec
[ 5] 0.0-10.0 sec 2.03 GBytes 1.73 Gbits/sec
[SUM] 0.0-10.0 sec 8.18 GBytes 7.00 Gbits/sec

[ 8] 0.0-10.0 sec 1.53 GBytes 1.31 Gbits/sec
[ 4] 0.0-10.0 sec 1.52 GBytes 1.31 Gbits/sec
[ 5] 0.0-10.0 sec 1.54 GBytes 1.32 Gbits/sec
[ 6] 0.0-10.0 sec 959 MBytes 803 Mbits/sec
[ 7] 0.0-10.0 sec 1.64 GBytes 1.40 Gbits/sec
[ 10] 0.0-10.0 sec 1.53 GBytes 1.31 Gbits/sec
[ 9] 0.0-10.0 sec 1.66 GBytes 1.42 Gbits/sec
[ 11] 0.0-10.0 sec 696 MBytes 582 Mbits/sec
[SUM] 0.0-10.0 sec 11.0 GBytes 9.45 Gbits/sec


Actual file copy performance:

copytoserverswitch.PNG


Testing from FreeNAS to workstation:

Code:
[root@argon] ~# iperf -c 10.0.1.53 -p 5001 -w 1024k
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 5.64 GBytes 4.85 Gbits/sec

[4] 0.0-10.0 sec 4.02 GBytes 3.45 Gbits/sec
[ 3] 0.0-10.0 sec 3.98 GBytes 3.42 Gbits/sec
[SUM] 0.0-10.0 sec 8.00 GBytes 6.87 Gbits/sec

[ 6] 0.0-10.0 sec 2.74 GBytes 2.35 Gbits/sec
[ 4] 0.0-10.0 sec 2.69 GBytes 2.31 Gbits/sec
[ 3] 0.0-10.0 sec 2.73 GBytes 2.34 Gbits/sec
[ 5] 0.0-10.0 sec 2.72 GBytes 2.34 Gbits/sec
[SUM] 0.0-10.0 sec 10.9 GBytes 9.34 Gbits/sec

[ 3] 0.0-10.0 sec 936 MBytes 785 Mbits/sec
[ 5] 0.0-10.0 sec 1.83 GBytes 1.57 Gbits/sec
[ 6] 0.0-10.0 sec 842 MBytes 707 Mbits/sec
[ 10] 0.0-10.0 sec 1.83 GBytes 1.57 Gbits/sec
[ 4] 0.0-10.0 sec 1.83 GBytes 1.57 Gbits/sec
[ 7] 0.0-10.0 sec 936 MBytes 785 Mbits/sec
[ 9] 0.0-10.0 sec 1.83 GBytes 1.57 Gbits/sec
[ 8] 0.0-10.0 sec 1.00 GBytes 861 Mbits/sec
[SUM] 0.0-10.0 sec 11.0 GBytes 9.41 Gbits/sec


Actual file copy performance:

copytoworkstationserver.PNG


Next I decided to drag my workstation down in the basement and connect it directly to the FreeNAS server (I used my now available 2 meter twinax cable for this test).

iperfw10tofreenastwinax.PNG


So not bad, especially going towards the FreeNAS server as I'm now getting wire speed single thread.

So now some real world file copy tests (using ftp):

Copy to workstation:

copytoworkstationtwinax.PNG


Copy to FreeNAS:

copytoservertwinax.PNG


So the copying of files to the FreeNAS server appears to be limited to about 1.6 Gbps regardless of going through the switch or not. I do see C: at 100% during both test, which I don't think is my workstation running out of steam, since that same C: can be written to at 6.2 Gbps with C: only at 54%. Perhaps the FreeNAS server itself is not capable of receiving a file at greater than 1.6 Gbps? Is there a test I can perform on my FreeNAS server to see what write speed my pool should be capable off?

Copying to the workstation through the switch is limited to 3.9 Gbps and 6.2 Gbps via a direct twinax connection.

I guess I need to reset my expectations of what is possible over a 10 Gbps network?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
First off ... Please set system tunable "hw.ix.enable_aim" to zero and see if it helps.

Perhaps the FreeNAS server itself is not capable of receiving a file at greater than 1.6 Gbps?

Possible but unlikely.

Is there a test I can perform on my FreeNAS server to see what write speed my pool should be capable off?

Turn off compression, then "dd if=/dev/zero of=/mnt/poolname/foofile bs=1048576" and then let it run for awhile, use control-T to see speeds while running or control-C for the final summary. Don't forget to turn compression back on.

I guess I need to reset my expectations of what is possible over a 10 Gbps network?

Possibly. Try to remember that NAS is a layering of complex technologies on top of each other. Even if your pool can write at 10Gbps and your network can actually communicate at 10Gbps, the layering of the two things together introduces a slight bottleneck.

We've been fortunate in the 1Gbps arena here because it *used* to be that you'd have to design a fileserver very carefully in order to be able to saturate 1Gbps. I have several units from the mid-2000's here which were designed with very specific hardware to be able to act as kickin' fast Gbps-capable fileservers. Most people back then were simply not seeing that sort of performance, but with the correct hardware and some careful tuning, it was possible to achieve. Hardware and OS improvements over the last 10 years made that go from "very difficult to get 1Gbps out of a filer" to "you have to make really stupid choices to not be able to get 1Gbps out of a filer."

In the 10Gbps arena, however, it's just like it was ten years ago. There's no real guarantee that a given platform can necessarily hit 10Gbps. There are ways you can make sure it WON'T, but even if you address all of those, you may still end up needing to do tuning and tinkering and swapping out of stuff in order to find some magic combination of things that actually works.

Anyways check out the enable_aim thing please.
 
Status
Not open for further replies.
Top