Hi everyone.
Been lurking around the forums for years now, but now it's time to actually post something as well. (First post ever, yay!! :)
I have recently replaced our switches for a couple of new ones, with extra 10G ports I could attach our FreeNAS system to. Since the new switches has been introduced in our network, we have seen a lot of dropouts in connecticity. We're using the FreeNAS box as an iSCSI target with a couple of ESXi servers as initiators. This has been working flawlessly before.
I have been debugging the problem for days now, using these forums as a great source for ideas and best practice. At first I thought it was related to the use of jumbo frames in a 10G network, so I reconfigured our entire network to MTU 1500. Then I suspected it was iSCSI port binding, which I had configured but found out was of no use since we have our target IP's in different subnets. But the problem still persists. The ESXi logs shows a path as being down, and FreeNAS shows a log line similar to: (iqn.1998-01.com.vmware:esx01-3e9ab750): no ping reply (NOP-Out) after 5 seconds; dropping connection.
But today I boiled the problem down to having something to do with ARP! Or at least very suspiciously connected to ARP. When checking with arp -an on the FreeNAS machine, I can now predict which host will fail next as the dropout happens exactly when the arp expires! Sometimes fetching the ARP entry takes 1-2 seconds and sometimes it takes way more, stalling the iSCSI path. I have attached some screenshots showing the problem.
The switches are stacked Cisco SG550X, and the FreeNAS machine is connected to two separate ports on each physical switch, with 10GBASE-T and short CAT6a cables. There are no other signs of network problems or latency in anyway, everyting is running smooth and fast - except when ARP entries expires. The funny thing is that the problem occurs 99% only on one of the physical links/adapters (10.0.21.0/24). The paths on the other physical link (10.0.22.0/24) are somewhat unaffected. This tells me maybe it's a STP related issue, but don't have the knowledge to confirm it.
I have looked through all configuration on the switches, for something related to ARP but nothing really makes sense.
So.. has anyone experienced similar issues before? Anyone got an idea of how introducing newer/faster switches would introduce this problem? Is there any tweaking that can be done to fetch a new ARP before expiration?
In general Cisco equipment can be a bit "slow" to let new devices access the network. For example, when I test stuff with my laptop and plug in the cable to a port, it can take up to 10 seconds before a ping is answered. I'm thinking this is a variety of this. Maybe everytime an ARP expires on the FreeNAS machine, it has to "plug in" again and the switches are slow to respond, and provide access?
Hardware info:
FreeNAS-9.10.2 (a476f16)
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
262079MB Memory
Supermicro X9DRW-3TF+ motherboard
Intel X540 Dual port 10GBase-T onboard
Network:
2x Cisco SG550X-24.
FreeNAS server connected in 10GBASE-T ports with CAT 6a cables.
Screenshots:

Been lurking around the forums for years now, but now it's time to actually post something as well. (First post ever, yay!! :)
I have recently replaced our switches for a couple of new ones, with extra 10G ports I could attach our FreeNAS system to. Since the new switches has been introduced in our network, we have seen a lot of dropouts in connecticity. We're using the FreeNAS box as an iSCSI target with a couple of ESXi servers as initiators. This has been working flawlessly before.
I have been debugging the problem for days now, using these forums as a great source for ideas and best practice. At first I thought it was related to the use of jumbo frames in a 10G network, so I reconfigured our entire network to MTU 1500. Then I suspected it was iSCSI port binding, which I had configured but found out was of no use since we have our target IP's in different subnets. But the problem still persists. The ESXi logs shows a path as being down, and FreeNAS shows a log line similar to: (iqn.1998-01.com.vmware:esx01-3e9ab750): no ping reply (NOP-Out) after 5 seconds; dropping connection.
But today I boiled the problem down to having something to do with ARP! Or at least very suspiciously connected to ARP. When checking with arp -an on the FreeNAS machine, I can now predict which host will fail next as the dropout happens exactly when the arp expires! Sometimes fetching the ARP entry takes 1-2 seconds and sometimes it takes way more, stalling the iSCSI path. I have attached some screenshots showing the problem.
The switches are stacked Cisco SG550X, and the FreeNAS machine is connected to two separate ports on each physical switch, with 10GBASE-T and short CAT6a cables. There are no other signs of network problems or latency in anyway, everyting is running smooth and fast - except when ARP entries expires. The funny thing is that the problem occurs 99% only on one of the physical links/adapters (10.0.21.0/24). The paths on the other physical link (10.0.22.0/24) are somewhat unaffected. This tells me maybe it's a STP related issue, but don't have the knowledge to confirm it.
I have looked through all configuration on the switches, for something related to ARP but nothing really makes sense.
So.. has anyone experienced similar issues before? Anyone got an idea of how introducing newer/faster switches would introduce this problem? Is there any tweaking that can be done to fetch a new ARP before expiration?
In general Cisco equipment can be a bit "slow" to let new devices access the network. For example, when I test stuff with my laptop and plug in the cable to a port, it can take up to 10 seconds before a ping is answered. I'm thinking this is a variety of this. Maybe everytime an ARP expires on the FreeNAS machine, it has to "plug in" again and the switches are slow to respond, and provide access?
Hardware info:
FreeNAS-9.10.2 (a476f16)
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
262079MB Memory
Supermicro X9DRW-3TF+ motherboard
Intel X540 Dual port 10GBase-T onboard
Network:
2x Cisco SG550X-24.
FreeNAS server connected in 10GBASE-T ports with CAT 6a cables.
Screenshots:


