Another Intel X520-DA2 speed issue

Status
Not open for further replies.

jlewis

Cadet
Joined
Aug 1, 2015
Messages
6
My system specs:

Supermicro:
SuperServer 5018A-AR12L
32G RAM
12x4TB disks
2x1TB Samsung 540 Pro, I only have a very tiny portion(I know, way overkill even for over provisioning) configured for slog and l2arc
Freenas 11.1-U3

As the subject indicates, I'm having issues with my 10g controller...

Throughput 40-60MB/s(not to be confused with mb/s, I'm seeing as high as 750mb/s)...

I've run some basic IO tests on the system which indicates IO throughput as high as 1.4GB/s, considerably more than what I'm seeing...

I'm under no illusions that I should see anywhere near 20g thoughput, the slot can't support as well as the underlying storage...

ifconfig is reporting(IPs redacted):
ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 90:e2:ba:88:18:84
hwaddr 90:e2:ba:88:18:84
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>

ix1 - reports the same

I have enabled/disabled some the hw offloading, while this did improve performance, it's still nowhere near what it should be, less than 1G connection...

No apparent logs indicating issues...

tcpdump - doesn't indicate any issues(other than when hw offload is enabled)

Switch reports 10G connectivity as would be expected with no errors being reported...

Any ideas or information that may help?
 

jlewis

Cadet
Joined
Aug 1, 2015
Messages
6
I have another duplicate thread, not sure how I did that...

Couple key pieces of info...

The connectivity surrounds iscsi...

Clients are ESXi 6.5 hosts - I was initially unsure if it was a hyper or freenas host issue, then I realized, hey check the throughput during a host vmotion and I can saturate 10G on the hypers so I'm fairly certain this isn't a hyper issue...

The only other thing of note that I can think of, there was originally 10gbaseT which I swapped out when I couldn't find copper SFPs that cost less than 2 or 3 10G cards that had SFP slots but I wouldn't think that would cause an issue...
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
I have another duplicate thread, not sure how I did that...

Couple key pieces of info...

The connectivity surrounds iscsi...

Clients are ESXi 6.5 hosts - I was initially unsure if it was a hyper or freenas host issue, then I realized, hey check the throughput during a host vmotion and I can saturate 10G on the hypers so I'm fairly certain this isn't a hyper issue...

The only other thing of note that I can think of, there was originally 10gbaseT which I swapped out when I couldn't find copper SFPs that cost less than 2 or 3 10G cards that had SFP slots but I wouldn't think that would cause an issue...

The PHY doesn't matter. Both SFP+ and RJ45 can pass 10Gb/s.

Slog device type? Intel 540? I'm not familiar a Samsung with a 540 model number..

ISCSI or NFS?

Have you skimmed the other ESXi performance threads? They all typically start out with a similar observation of a delta between raw performance and what an ESXi host experiences. Your slog should address the sync writes, but if that disk is an Intel 540, I'm not sure it has the iops performance you need for an slog.
 

jlewis

Cadet
Joined
Aug 1, 2015
Messages
6
RJ45 mattered to me, the SFPs for copper for my switches cost significantly more than just replacing the NIC that accepts SFPs since I already had both 10 optics and twixax lying around... ;-)

Samsung 850 Pro, my apologies, typo on my part. Not very concerned about underlying IO capabilities, I checked that and was able to push over GB/sec, more than I would expect over a single 10Gb interface. I did run down the rabbit hole testing the various tunables and testing sync, disabled, standard, and always when I realized I hadn't checked the underlying capabilities...

ISCSI

Again, another item I forgot to mention. I did peruse the forums regarding the tunables that were recommended surrounding 10G connectivity. I also came another realization while configuring these tunable, I should see significantly more performance even without tuning, I saw nothing to indicate others were seeing such a drastic lack of performance, most of the low performance numbers seemed to be around 5-6Gb/sec per link not sub Gb/sec that I'm seeing across 2 links. I also started seeing indications that Intel 10G adapters were/are not the first choice of adapters due to performance issues. Wanted to kick myself for not checking this before I purchased the NIC but normally Intel is a safe bet across various OSes, apparently not for 10G. Sounds like the first/recommended choice surrounds the chelsio NICs
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
ISCSI

Again, another item I forgot to mention. I did peruse the forums regarding the tunables that were recommended surrounding 10G connectivity. I also came another realization while configuring these tunable, I should see significantly more performance even without tuning, I saw nothing to indicate others were seeing such a drastic lack of performance, most of the low performance numbers seemed to be around 5-6Gb/sec per link not sub Gb/sec that I'm seeing across 2 links. I also started seeing indications that Intel 10G adapters were/are not the first choice of adapters due to performance issues. Wanted to kick myself for not checking this before I purchased the NIC but normally Intel is a safe bet across various OSes, apparently not for 10G. Sounds like the first/recommended choice surrounds the chelsio NICs

RJ45 10G is still pretty power hungry.. We tend to stick to Twinax for the same cost reasons.

Performance is a somewhat subjective thing. We had some bumps with the Intel Gig cards, but they've been pretty reasonable lately.

This is from one of our production boxes. It's basically "stock" with almost no network tuning. The other end is a Linux VM on a busy ESXi server. The network is active, so I'm not surprised by the deltas. If the network was quiescent I would expect to see 9ish Gb/s in both directions.

Code:
root@nas1:~ # ifconfig ix1
ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: connected to 2960-1 (Te1/0/2)
options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
	ether 00:1b:21:6e:d0:dc
	hwaddr 00:1b:21:6e:d0:dc
	inet 192.168.1.51 netmask 0xffffff00 broadcast 192.168.1.255 
	nd6 options=9<PERFORMNUD,IFDISABLED>
	media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>)
	status: active
root@nas1:~ # iperf -c 192.168.1.18
------------------------------------------------------------
Client connecting to 192.168.1.18, TCP port 5001
TCP window size: 2.00 MByte (default)
------------------------------------------------------------
[  3] local 192.168.1.51 port 29940 connected with 192.168.1.18 port 5001
[ ID] Interval	   Transfer	 Bandwidth
[  3]  0.0-10.0 sec  5.72 GBytes  4.91 Gbits/sec
root@nas1:~ # iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 4.00 MByte (default)
------------------------------------------------------------
[  4] local 192.168.1.51 port 5001 connected with 192.168.1.18 port 47484
[ ID] Interval	   Transfer	 Bandwidth
[  4]  0.0-10.0 sec  10.1 GBytes  8.69 Gbits/sec
^Croot@nas1:~ #


That's an intel X520. The system is an X9-SRL-F with 64G of ram and an E5-1620v2.

You'd probably want to start poking at your network.. use Iperf to test pieces.. I'd also confirm the entire path from NAS to client is 10G.

I will say if I was looking for an SLOG disk, I would look at IOPS, not throughput. SLOG does a lot of little writes, so the disk's IOPS are way more important than raw read or write bandwidth.
 

jlewis

Cadet
Joined
Aug 1, 2015
Messages
6
Yeah, I wouldn't expect the performance that I'm experiencing without tuning, to be honest, I'd be ok with anything over 4Gb/s and happy with 6-8Gb/s but to only get a peak of 750Mb/s is unacceptable...

I know the clients(ESXi) hosts(twinax as well) have no obvious issues, I see them put 6-8Gb/sec during a vMotion, I suspect this would be higher but the vmotion finishes too fast...

The freenas host reports both interfaces are at 10G while the switch is reporting the same, while I wouldn't be surprised to see a single bad port on a switch but not the same port across 2 switches. You'll usually see errors on the ports if they're bad but partially functional but I'm not seeing any of that either...

I definitely agree, IOPs are definitely more important in the vast majority of situations including my own. These ssds are rated in the thousands of IOPs/sec at a qdepth of 1 while nearly 100K with a qdepth of 32 and they'll saturate the sata links before the drives themselves become a bottleneck...

I'm definitely at a loss as to the reasons behind my poor performance. As unlikely as it is for me to have 2 bad ports or twinax cables, I'm going to take a closer look at moving the connections to alternate ports then swapping out the twinax cables if I continue to have issues...

Please let me know if there is anything else you think I may have missed...

Thanks...
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
Please let me know if there is anything else you think I may have missed...

Thanks...

Can you do things that might have data loss risk associated with them?

if so, I'd remove the slog from the volume and retest both locally and via the network.
I'd also disable sync writes on the volume and test again.
then I'd add the slog back in and test with sync enabled and disabled.

When I test the full stack, I mount a datastore on an ESXi host, spawn a small linux vm and do dd and bonnie++ in the linux vm and then watch "zpool iostat -v volxx 1" in one window and "systat -ifstat" in another window. they are my preferred methods for actually knowing what Freenas thinks is going on with IO at the disk layer and network layer in real time.

Using that method on one of our production NASs, I can get dd to give me 800MB/s to a 7 disk raidz2 SSD array with no SLOG for a VM sitting on an NFS mount (with sync enabled). With an intel nvme as an slog I can saturate the 10G link at around 1100MB/s.
We pretty much use a standard FreeNAS setup. The only things that are tweaked are the number of NFS servers (24 instead of 4) and the usual 10G performance tuning:

Screen Shot 2018-03-23 at 00.48.32.png

No guarantees on whether these tweaks are suitable in general. They work for us, but we've done a lot of testing.
 
Last edited:

peacepipe

Dabbler
Joined
Dec 17, 2017
Messages
36
I had a similiar issue this week. My problem was that I configured a Management Interface and Data Interface in the same Subnet and the throughput was halved. Maybe it helps.
 
Status
Not open for further replies.
Top