No connectivity over Chelsio 10gbe

Status
Not open for further replies.

TheCowGod

Dabbler
Joined
Apr 16, 2015
Messages
14
Hi, guys. I've got a FreeNAS 9.10 server running on a Supermicro X10 motherboard (named keats), and a Proxmox 4.4 (Debian-based) server running on a Supermicro X9 motherboard (named ummon). Each server has a Chelsio S320E-SR dual-port 10gbe card in it, and they're directly connected to each other using SFP+ transceivers and OM3 fiber. Both ports are connected (port 1 to port 1, port 2 to port 2), but I've tested it with one of the ports disconnected.

Each interface is configured with a static IP in a different subnet. My main, gigabit network is 172.16.0.0/24, and port 1 of each card is configured with an IP address in the 10.10.1.0/24 network, and port 2 of each card is configured in the 10.10.2.0/24 network. As far as I can tell, the routing table looks correct on each server:

FreeNAS:
Code:
[cowgod@keats ~]$ ifconfig cxgb0
cxgb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
		options=6c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
		ether 00:07:43:07:d5:51
		inet 10.10.1.60 netmask 0xffffff00 broadcast 10.10.1.255
		nd6 options=9<PERFORMNUD,IFDISABLED>
		media: Ethernet 10Gbase-SR <full-duplex>
		status: active

[cowgod@keats ~]$ netstat -nr
Routing tables

Internet:
Destination		Gateway			Flags	  Netif Expire
default			172.16.0.1		 UGS		igb0
10.10.1.0/24	   link#1			 U		 cxgb0
10.10.1.60		 link#1			 UHS		 lo0
10.10.2.0/24	   link#2			 U		 cxgb1
10.10.2.60		 link#2			 UHS		 lo0
127.0.0.1		  link#5			 UH		  lo0
172.16.0.0/24	  link#3			 U		  igb0
172.16.0.60		link#3			 UHS		 lo0

Internet6:
Destination					   Gateway					   Flags	  Netif Expire
::/96							 ::1						   UGRS		lo0
::1							   link#5						UH		  lo0
::ffff:0.0.0.0/96				 ::1						   UGRS		lo0
fe80::/10						 ::1						   UGRS		lo0
fe80::%lo0/64					 link#5						U		   lo0
fe80::1%lo0					   link#5						UHS		 lo0
ff01::%lo0/32					 ::1						   U		   lo0
ff02::/16						 ::1						   UGRS		lo0
ff02::%lo0/32					 ::1						   U		   lo0




Proxmox:
Code:
root@ummon:~# ifconfig vmbr0
vmbr0	 Link encap:Ethernet  HWaddr 0c:c4:7a:9d:26:ba
		  inet addr:172.16.0.65  Bcast:172.16.0.255  Mask:255.255.255.0
		  inet6 addr: fe80::ec4:7aff:fe9d:26ba/64 Scope:Link
		  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
		  RX packets:57224 errors:0 dropped:0 overruns:0 frame:0
		  TX packets:36557 errors:0 dropped:0 overruns:0 carrier:0
		  collisions:0 txqueuelen:1000
		  RX bytes:53737737 (51.2 MiB)  TX bytes:3409453 (3.2 MiB)

root@ummon:~# route -n
Kernel IP routing table
Destination	 Gateway		 Genmask		 Flags Metric Ref	Use Iface
0.0.0.0		 172.16.0.1	  0.0.0.0		 UG	0	  0		0 vmbr0
10.10.1.0	   0.0.0.0		 255.255.255.0   U	 0	  0		0 vmbr1
10.10.2.0	   0.0.0.0		 255.255.255.0   U	 0	  0		0 vmbr2
172.16.0.0	  0.0.0.0		 255.255.255.0   U	 0	  0		0 vmbr0



For these tests, I'm using port 1 of each card. They can talk to each other just fine on the 1gb network, just as they always have, but I have no connectivity on the 10gbe interface. If I ping from the Debian server (ummon) to the 10gb IP of the FreeNAS box (keats), doing a tcpdump on the Debian server I see the ARP requests go out, but no replies coming back:


Proxmox:
Code:
root@ummon:~# ping 10.10.1.60
PING 10.10.1.60 (10.10.1.60) 56(84) bytes of data.
From 10.10.1.65 icmp_seq=1 Destination Host Unreachable
From 10.10.1.65 icmp_seq=2 Destination Host Unreachable
From 10.10.1.65 icmp_seq=3 Destination Host Unreachable
^C
--- 10.10.1.60 ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3016ms
pipe 3




root@ummon:~# tcpdump -vv -nn -s 0 -X -i vmbr1
tcpdump: listening on vmbr1, link-type EN10MB (Ethernet), capture size 262144 bytes
19:42:22.554468 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.10.1.60 tell 10.10.1.65, length 28
		0x0000:  0001 0800 0604 0001 0007 4307 d535 0a0a  ..........C..5..
		0x0010:  0141 0000 0000 0000 0a0a 013c			.A.........<
19:42:23.553223 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.10.1.60 tell 10.10.1.65, length 28
		0x0000:  0001 0800 0604 0001 0007 4307 d535 0a0a  ..........C..5..
		0x0010:  0141 0000 0000 0000 0a0a 013c			.A.........<
19:42:24.553209 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.10.1.60 tell 10.10.1.65, length 28
		0x0000:  0001 0800 0604 0001 0007 4307 d535 0a0a  ..........C..5..
		0x0010:  0141 0000 0000 0000 0a0a 013c			.A.........<



Meanwhile, if I do a tcpdump on the FreeNAS box (keats), I see the ARP requests coming in, and the replies going back out:


FreeNAS:
Code:
[cowgod@keats ~]$ sudo tcpdump -vv -nn -s 0 -X -i cxgb0 arp
tcpdump: listening on cxgb0, link-type EN10MB (Ethernet), capture size 65535 bytes
19:47:35.009897 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.10.1.60 tell 10.10.1.65, length 46
		0x0000:  0001 0800 0604 0001 0007 4307 d535 0a0a  ..........C..5..
		0x0010:  0141 0000 0000 0000 0a0a 013c 0000 0000  .A.........<....
		0x0020:  0000 0000 0000 0000 0000 0000 0000	   ..............
19:47:35.009903 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.10.1.60 is-at 00:07:43:07:d5:51, length 28
		0x0000:  0001 0800 0604 0002 0007 4307 d551 0a0a  ..........C..Q..
		0x0010:  013c 0007 4307 d535 0a0a 0141			.<..C..5...A
19:47:36.006782 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.10.1.60 tell 10.10.1.65, length 46
		0x0000:  0001 0800 0604 0001 0007 4307 d535 0a0a  ..........C..5..
		0x0010:  0141 0000 0000 0000 0a0a 013c 0000 0000  .A.........<....
		0x0020:  0000 0000 0000 0000 0000 0000 0000	   ..............
19:47:36.006789 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.10.1.60 is-at 00:07:43:07:d5:51, length 28
		0x0000:  0001 0800 0604 0002 0007 4307 d551 0a0a  ..........C..Q..
		0x0010:  013c 0007 4307 d535 0a0a 0141			.<..C..5...A
19:47:37.006777 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.10.1.60 tell 10.10.1.65, length 46
		0x0000:  0001 0800 0604 0001 0007 4307 d535 0a0a  ..........C..5..
		0x0010:  0141 0000 0000 0000 0a0a 013c 0000 0000  .A.........<....
		0x0020:  0000 0000 0000 0000 0000 0000 0000	   ..............
19:47:37.006783 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.10.1.60 is-at 00:07:43:07:d5:51, length 28
		0x0000:  0001 0800 0604 0002 0007 4307 d551 0a0a  ..........C..Q..
		0x0010:  013c 0007 4307 d535 0a0a 0141			.<..C..5...A



At first I assumed that, since I was seeing the ARP replies going back out, the problem had to be with the Proxmox machine. However, I tested booting the Proxmox server with an Ubuntu LiveCD, and I see the same behavior. I then booted the Proxmox server back up into Proxmox and rebooted the FreeNAS box using the Ubuntu LiveCD, and in that case I was able to establish normal communication over the 10gbe ports. I was able to ping, SSH, etc.

So the problem does seem to be related to FreeNAS/FreeBSD specifically, but I have no idea where to look next. As I said, the routing tables (posted above) look correct. As far as I can tell, FreeNAS doesn't configure a software firewall like ipfw or pf (please correct me if I'm wrong). I've never worked with FreeBSD before FreeNAS, so I may be missing something obvious. Can someone suggest some other troubleshooting step I should try, or some other configuration variable I need to look at? Thanks!
 

TheCowGod

Dabbler
Joined
Apr 16, 2015
Messages
14
Looks like I managed to sort this one out, with the help of this thread: https://forums.freenas.org/index.ph...on-freenas-workstation-stopped-working.28741/

I happen to have the exact same motheboard, the X10SL7-F. In that thread, he mentions that his problem was solved by navigating to the "Advanced -> Chipset Configuration -> System Agent (SA) Configuration -> PCIe Configuration" screen in the BIOS and changing the setting for "PCI-E slot6" from "Auto" to "Gen2". I checked it out on my own machine and saw that it indicated that the card in the slot was Gen1. So I tried changing the setting from Auto to Gen1, and sure enough, when I booted it back up it works fine.

It seems to be an issue specific to the X10SL7-F motherboard. Thank god for that post! This forum is a great resource :)
 

TheCowGod

Dabbler
Joined
Apr 16, 2015
Messages
14
Looks like I managed to sort this one out, with the help of this thread: https://forums.freenas.org/index.ph...on-freenas-workstation-stopped-working.28741/

I happen to have the exact same motheboard, the X10SL7-F. In that thread, he mentions that his problem was solved by navigating to the "Advanced -> Chipset Configuration -> System Agent (SA) Configuration -> PCIe Configuration" screen in the BIOS and changing the setting for "PCI-E slot6" from "Auto" to "Gen2". I checked it out on my own machine and saw that it indicated that the card in the slot was Gen1. So I tried changing the setting from Auto to Gen1, and sure enough, when I booted it back up it works fine.

It seems to be an issue specific to the X10SL7-F motherboard. Thank god for that post! This forum is a great resource :)


Interestingly enough, though, it worked fine in Ubuntu 17.04 x64, on the same hardware, with the BIOS at the default setting of "Auto". So it seems to have something to do with the FreeBSD driver specifically, and the way it interacts with the Auto setting on the X10SL7-F motherboard. I don't know close to enough about the workings of PCI-E to speculate further, but I'm glad it got resolved :)
 
Status
Not open for further replies.
Top