Upgraded from 11.1U2 to U4, then U5. 10GbE SMB and NFS no longer work. How to diagnose?

Status
Not open for further replies.

Markj

Dabbler
Joined
Dec 20, 2013
Messages
11
Can anyone suggest how I can find the root cause of my problem? I hate taking shots in the dark. Thanks in advance!

Symptom:
From the time I set this server up about 6 months ago until last week I was able to access SMB and NFS shares over 10GbE connections from all my other machines.
Everything worked fine until the system was updated to U4. I then updated to U5. No other changes to hardware or software configuration.
Windows now gives the error message "Windows cannot access \\10.0.0.nn\sharename" with no further diagnostic information. ESXi datastore browser returns a message "An error occurred, please try again".
Everything still works fine over the 1GbE NICs.
I can ping the 10GbE NIC on my FreeNAS box from my ESXi machine, my Windows Server 2016 machine and my Centos 7 machine and vice-versa, but can no longer access the NFS and SMB fileshares. In each case the ping stats look like this:
7 packets transmitted, 7 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.214/0.282/0.331/0.043 ms

Config:
I'm using a Mikrotik CSR317 10GbE switch and a mix of Finisar and Cisco 10G-SR tranceivers, all of which worked fine previously. All lights on the NICs and switch are green
The FreeNAS box is a Supermicro X9XRL-F with a Chelsio 10GbE dual-port NIC and 2 onboard 1GbE NICs. The other boxes have either Chelsio or Mellanox NICs.
There is no physical connection between the 10GbE network which is addressed 10.0.0.0/24, and the 1GbE network which is addressed 192.168.0.0/24. The 10.0.0.0/24 network is dedicated to storage traffic.
My router is at 192.168.0.1 which is the default gateway. Because the 10GbE NICs can't connect to 192.168.0.1, I defined 10.0.0.1 as the gateway for those NICs but there is no device at that address.
Jumbo frames are enabled on all NICs with MTU=9000. MTU size on the router is 10218 (which had been working. I reset to 9000 but it made no difference, so reverted to the original value).

Output of ifconfig -a on the FreeNAS box

igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:25:90:c1:bf:ac
hwaddr 00:25:90:c1:bf:ac
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:25:90:c1:bf:ac
hwaddr 00:25:90:c1:bf:ad
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
cxgb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
options=6c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TX
CSUM_IPV6>
ether 00:07:43:06:bd:69
hwaddr 00:07:43:06:bd:69
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet 10Gbase-SR <full-duplex>
status: no carrier
cxgb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
options=6c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TX
CSUM_IPV6>
ether 00:07:43:06:bd:69
hwaddr 00:07:43:06:bd:6a
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet 10Gbase-SR <full-duplex>
status: active
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: connected to 3 peers
options=209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC>
ether 00:0e:04:29:58:23
hwaddr 00:0e:04:29:58:23
inet 192.168.0.40 netmask 0xffffff00 broadcast 192.168.0.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x6
inet 127.0.0.1 netmask 0xff000000
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
groups: lo
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:25:90:c1:bf:ac
inet 192.168.0.41 netmask 0xffffff00 broadcast 192.168.0.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect
status: active
groups: lagg
laggproto loadbalance lagghash l2,l3,l4
laggport: igb0 flags=4<ACTIVE>
laggport: igb1 flags=4<ACTIVE>
lagg1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
options=6c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TX
CSUM_IPV6>
ether 00:07:43:06:bd:69
inet 10.0.0.41 netmask 0xffffff00 broadcast 10.0.0.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect
status: active
groups: lagg
laggproto failover lagghash l2,l3,l4
laggport: cxgb0 flags=1<MASTER>
laggport: cxgb1 flags=4<ACTIVE>
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
Can anyone suggest how I can find the root cause of my problem? I hate taking shots in the dark. Thanks in advance!

One of your 10g ports is showing no carrier. If you're seeing blinking lights, maybe some component in the path is confused about link state.

Also, you might consider simplifying your 10G setup to a single link, instead of a link aggregate. It'll make troubleshooting easier in the short term.
 
Status
Not open for further replies.
Top