Poor network performance Chelsio t520-cr (FreeNAS <-> ESXi 6.7)

monte1299

Dabbler
Joined
Jun 4, 2017
Messages
23
Hello!

I have been trying to add 10g network connections between my primary FreeNAS and my ESXi box. I am trying to provide iSCSI storage to ESXi according to these instructions:
http://johnkeen.tech/freenas-11-iscsi-esxi-6-5-lab-setup/
http://johnkeen.tech/freenas-11-iscsi-esxi-6-5-lab-setup/

The main difference is I'm using dual 10g links instead of quad 1g. Also, in an effort to eliminate MTU as an issue, I've left all interfaces at 1500 instead of 9000. I have succeeded in getting the cards recognized and communicating, but the performance is terrible and iSCSI and other services like ssh will not function.

I purchased two Chelsio t520-cr cards and connected the two ports on both servers directly with two Chelsio twinax cables. As expected, the card was immediately recognized in FreeNAS. I struggled to get the NIC to load in ESXi, but I eventually got it to recognize. In ESXi, I'm using the driver from Chelsio (cxl-2.2.0.1-1OEM.650.0.0.4598673.x86_64.zip). This is the third driver I've tried. The stable Chelsio driver would load, but the nics would not be created in ESXi. I uninstalled that one and installed the drivers from the VMware hardware compatibility list. With those drivers, the card would load and the nics would be present. However, the performance was terrible (pings in the 2000-5000ms, no services would actually work). I removed those drivers and installed the current alpha driver (referenced above), but performance is still horrible. Pings are better, but still no services (ssh, iSCSCI, etc.) will function.

I'm a little stuck at this point on what to try next. I've never encountered a situation where the systems could ping each other, but the connection was this terrible. I've configured each nic with static IP's on their own separate subnet (according to the instructions in the link above). I'm out of my depth in networking on how to troubleshoot this. Any suggestions would be greatly appreciated!

Thanks,
monte
 

Pliqui

Dabbler
Joined
Apr 24, 2018
Messages
25
I'm in a similar boat right now but with a different cards.

ESXi 6.5: 10Gtek 10Gb using intel chipset (82599ES CNA) X520-DA2
Freenas: NetApp Chelsio Dual Port SFP+ 10GbE PCIe 111-00603+A0 CC2-S320E-SR 100-1082-00

What speeds you are getting with iperf?
 

monte1299

Dabbler
Joined
Jun 4, 2017
Messages
23
Pliqui - thanks for your response. Sorry it took me a little while, it looks like they removed iperf from ESXi 6.7. I compiled a statically linked binary from source for this test.

Here are the outputs from my run on each of my ports (1g, and then 2x 10g):

Code:
[root@orion:/vmfs/volumes/5aff1136-dfc79715-eb9e-ac1f6b4689d0/tools] ./iperf -c 192.168.1.35
------------------------------------------------------------
Client connecting to 192.168.1.35, TCP port 5001
TCP window size: 32.5 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.22 port 45336 connected with 192.168.1.35 port 5001
[ ID] Interval	   Transfer	 Bandwidth
[  3]  0.0-10.0 sec  1.09 GBytes   940 Mbits/sec
[root@orion:/vmfs/volumes/5aff1136-dfc79715-eb9e-ac1f6b4689d0/tools] ./iperf -c 10.0.0.1
------------------------------------------------------------
Client connecting to 10.0.0.1, TCP port 5001
TCP window size: 32.5 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.2 port 10337 connected with 10.0.0.1 port 5001
[ ID] Interval	   Transfer	 Bandwidth
[  3]  0.0-10.0 sec   334 MBytes   280 Mbits/sec
[root@orion:/vmfs/volumes/5aff1136-dfc79715-eb9e-ac1f6b4689d0/tools] ./iperf -c 10.0.1.1
------------------------------------------------------------
Client connecting to 10.0.1.1, TCP port 5001
TCP window size: 32.5 KByte (default)
------------------------------------------------------------
[  3] local 10.0.1.2 port 58809 connected with 10.0.1.1 port 5001
[ ID] Interval	   Transfer	 Bandwidth
[  3]  0.0-10.0 sec  2.98 GBytes  2.56 Gbits/sec
[root@orion:/vmfs/volumes/5aff1136-dfc79715-eb9e-ac1f6b4689d0/tools]

 

monte1299

Dabbler
Joined
Jun 4, 2017
Messages
23
The results are really all over the place. For instance, the interface on 10.0.1.2 goes from 2.79 Gbits/sec (still under-performing, but faster than 1g) to 136 Mbits/sec in successive runs. Any idea what could cause this?

Code:
[root@orion:/vmfs/volumes/5aff1136-dfc79715-eb9e-ac1f6b4689d0/tools] ./iperf -c 10.0.1.1
------------------------------------------------------------
Client connecting to 10.0.1.1, TCP port 5001
TCP window size: 32.5 KByte (default)
------------------------------------------------------------
[  3] local 10.0.1.2 port 16665 connected with 10.0.1.1 port 5001
[ ID] Interval	   Transfer	 Bandwidth
[  3]  0.0-10.0 sec  3.25 GBytes  2.79 Gbits/sec
[root@orion:/vmfs/volumes/5aff1136-dfc79715-eb9e-ac1f6b4689d0/tools] ./iperf -c 10.0.1.1
------------------------------------------------------------
Client connecting to 10.0.1.1, TCP port 5001
TCP window size: 32.5 KByte (default)
------------------------------------------------------------
[  3] local 10.0.1.2 port 59133 connected with 10.0.1.1 port 5001
[ ID] Interval	   Transfer	 Bandwidth
[  3]  0.0-10.0 sec   162 MBytes   136 Mbits/sec
[root@orion:/vmfs/volumes/5aff1136-dfc79715-eb9e-ac1f6b4689d0/tools]

 

Pliqui

Dabbler
Joined
Apr 24, 2018
Messages
25
I'm on the same boat and at this point still looking.

When I ran iperf by default I got these values
Code:
[root@esxi01:/usr/lib/vmware/vsan/bin] ./iperf.copy -c 10.0.1.1
------------------------------------------------------------
Client connecting to 10.0.1.1, TCP port 5001
TCP window size: 35.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.1.2 port 47836 connected with 10.0.1.1 port 5001
[ ID] Interval	   Transfer	 Bandwidth
[  3]  0.0-10.0 sec  2.49 GBytes  2.14 Gbits/sec


If i put a little more juice into it, I get
Code:
[root@esxi01:/usr/lib/vmware/vsan/bin] ./iperf.copy -c 10.0.1.1 -R -P 2 -w 128k
------------------------------------------------------------
Client connecting to 10.0.1.1, TCP port 5001
TCP window size:  131 KByte (WARNING: requested  128 KByte)
------------------------------------------------------------
[  4] local 10.0.1.2 port 28828 connected with 10.0.1.1 port 5001
[  3] local 10.0.1.2 port 61403 connected with 10.0.1.1 port 5001
[ ID] Interval	   Transfer	 Bandwidth
[  4]  0.0-10.0 sec  2.71 GBytes  2.33 Gbits/sec
[  3]  0.0-10.0 sec  3.21 GBytes  2.75 Gbits/sec
[SUM]  0.0-10.0 sec  5.92 GBytes  5.08 Gbits/sec


Try the second command and run an esxtop on the network to see if you have any drop packages.

EDIT: Iperf from FreeNAS to ESXi
Code:
[root@freenas /mnt/VMWARE/TEst]# iperf -c 10.0.1.3 -m -i1 -fg
------------------------------------------------------------
Client connecting to 10.0.1.3, TCP port 5001
TCP window size: 0.00 GByte (default)
------------------------------------------------------------
[  3] local 10.0.1.2 port 46277 connected with 10.0.1.3 port 5001
[ ID] Interval	   Transfer	 Bandwidth
[  3]  0.0- 1.0 sec  0.83 GBytes  7.13 Gbits/sec
[  3]  1.0- 2.0 sec  0.84 GBytes  7.19 Gbits/sec
[  3]  2.0- 3.0 sec  0.84 GBytes  7.20 Gbits/sec
[  3]  3.0- 4.0 sec  0.84 GBytes  7.20 Gbits/sec
[  3]  4.0- 5.0 sec  0.84 GBytes  7.19 Gbits/sec
[  3]  5.0- 6.0 sec  0.84 GBytes  7.19 Gbits/sec
[  3]  6.0- 7.0 sec  0.84 GBytes  7.20 Gbits/sec
[  3]  7.0- 8.0 sec  0.84 GBytes  7.20 Gbits/sec
[  3]  8.0- 9.0 sec  0.84 GBytes  7.19 Gbits/sec
[  3]  9.0-10.0 sec  0.84 GBytes  7.19 Gbits/sec
[  3]  0.0-10.0 sec  8.37 GBytes  7.19 Gbits/sec
[  3] MSS size 8960 bytes (MTU 9000 bytes, unknown interface)
 
Last edited:

monte1299

Dabbler
Joined
Jun 4, 2017
Messages
23
My struggle continues as well. I had a third card, so I installed it in a Windows 10 workstation, and directly connected it to FreeNAS. Both ports got at least 6 Gbits/sec - good enough without tuning. At this point, I am pretty sure it is something to do with the ESXi host.

I reconnected the ESXi host to the FreeNAS server on both ports. Just for fun, I turned the Twinax cable around (took the end out of the ESXi box and put that end into the FreeNAS box). After I had them reconnected, and they could ping eachother, I re-ran iperf with the default options. Both links get 9.4 Gbits/sec!

Great, I think - let me re-enable iSCSI and see if ESXi will see the storage now. Re-scan the iSCSI adapter, and no luck.

I re-run iperf and one link is still at 9.4 Gbits/sec, while the other drops to around 100 Mbits/sec.

While I've been typing this response, vSphere is now showing my 2 paths and 1 device:

upload_2018-6-21_14-37-55.png


However, as you can see, for both paths, vSphere is reporting they are dead.

Re-seating the cables seems to have stabilized things a bit, but I am still not getting a steady connection to my iSCSI share. I think I'll try swapping cables with my Windows 10 workstation to see if that helps (although it's tough to know which of my two ports is performing well right now). Such a pain!

I'm open to any other suggestions. For now though, I would pull your cables and re-seat them and try your test again.
 

Pliqui

Dabbler
Joined
Apr 24, 2018
Messages
25
Last edited:

monte1299

Dabbler
Joined
Jun 4, 2017
Messages
23
Ok, here's what I have:

FreeNAS (helios):
Supermicro X10SRM-F
Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz
64 GB 2400MHZ DDR4 ECC
8x Seagate 8TB IronWolf NAS
RAIDZ2
Onboard C612 SATA Controller
Intel® i350 Dual GbE LAN ports
Chelsio t520-cr 10g
FreeNAS-11.1-U5
Standard cxgb driver (I didn't do anything to FreeNAS - the card was automatically recognized)

ESXi (orion):
Supermicro X10SRM-F
Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
64 GB 2400MHZ DDR4 ECC
2x Samsung 850 EVO SSD
Intel® i350 Dual GbE LAN ports
Chelsio t520-cr 10g
ESXi 6.7
Chelsio cxl driver 2.2.0.1-1OEM.650.0.0.4598673 (I installed this driver manually)

Network:
Both FreeNAS and ESXi have onboard dual 1g ethernet. FreeNAS is configured in LAGG and has been working for over a year with this configuration. The following tunables are in effect on FreeNAS:
upload_2018-6-21_15-27-36.png


I implemented these about 6 months ago to try and get everything I could out of my 1g network. SMB transfers between clients on my network and FreeNAS are around 120 Mb/s.

The 1g ethernet for my home network is on these network settings:
192.168.1.0/24
DNS: 192.168.1.19
All clients on DHCP (some are set to a static IP through their MAC addresses using isc-dhcp)

On both servers, I added a Chelsio t520-cr dual port adapter. I connected both ports, on both servers, to each other using two Chelsio twinax cables. I gave both NICs on both servers static IPs as follows:

FreeNAS <-----> ESXi
cxl0: 10.0.0.1 10.0.0.2
cxl1: 10.0.1.1 10.0.1.2

In FreeNAS, I crated an iSCSI portal as follows:
upload_2018-6-21_15-34-12.png


an initiator group that accepted all:
upload_2018-6-21_15-34-56.png


an iSCSI target:
upload_2018-6-21_15-35-46.png


and an Extent as follows:
upload_2018-6-21_15-36-31.png


and lastly, the associated target:
upload_2018-6-21_15-37-20.png





At this point, the iSCSI share should be available and ready to use (yes, I enabled the iSCSI service :) )


Here's what I have on the ESXi side:

Physical NICs (you can see the two cxl interfaces)
upload_2018-6-21_15-41-28.png


I created two virtual switches (one for each interface):
upload_2018-6-21_15-43-12.png


upload_2018-6-21_15-44-7.png


Then I created 2 VMkernel NICs (one for each interface)
upload_2018-6-21_15-45-17.png
 

monte1299

Dabbler
Joined
Jun 4, 2017
Messages
23
Here's the last VMkernel NIC (I'm only allowed 10 images per post it appears)

upload_2018-6-21_15-46-35.png



With all of these in place, I enabled software iSCSI with the following settings:

upload_2018-6-21_15-49-5.png



That's it - that's the whole config. I cannot create a new data store using this iSCSI config because of all of the network errors that I described in earlier posts. Hopefully someone sees something in this configuration that can be causing this problem!

Thanks for your help!
 

monte1299

Dabbler
Joined
Jun 4, 2017
Messages
23
Pliqui - I took a look at the links you provided and updated my tunables. They are now as follows:

upload_2018-6-21_16-30-50.png


iperf is still all over the place. Sometimes interfaces perform at 9.4 Gbits/sec, other times, down in the 100 Mbits/sec.

The debugging fun continues...
 

Josif

Dabbler
Joined
Aug 15, 2015
Messages
12
Pliqui - I took a look at the links you provided and updated my tunables. They are now as follows:

View attachment 24473

iperf is still all over the place. Sometimes interfaces perform at 9.4 Gbits/sec, other times, down in the 100 Mbits/sec.

The debugging fun continues...
Did you succeed to fix the problem?
 
Top