So, the basic details of my system are in my signature.
I've have a chronic problem with FreeNAS crashing on heavy file transfers over the network, and I'm not sure where to look to debug what could be going on. I've suspected various things over time, and after reconfiguring, rebooting, etc - things seem stable, and then out of the blue, FreeNAS completely locks up during file transfers over the network and requires a reboot to recover.
FreeNAS provides many NFS shares to my physical linux hosts, as well as CIFS shares to windows, and iSCSI for an ESXi host as the vm storage.
My initial build had the 2 Intel I210 NICs teamed together to my Cisco SG200-26 switch. Things worked fine, but data transfers weren't as fast as we expected, and we enabled jumbo frames on the switch and set MTU=9000 on FreeNAS. An ESXi 6 host was added to the network, also with dual NICs teamed together to the switch.
On a big data write to an NFS mount on FreeNAS, everything locked up on FreeNAS. Things had been stable for several months, but usage was very light. This was the first really heavy use. I rebooted, and it was good for a month, and then did it again. This time, I suspected the LACP NIC teaming and went down to 1 NIC, things seemed stable, but a week later, it crashed FreeNAS again. Now I suspected the MTU=9000, set it back to default, again, things seem stable (oddly, the ifconfig below shows "JUMBO_MTU" in options, but I'm not sure where it might be getting that.
Most recently, we reconfigured things to use the second NIC in both FreeNAS and the ESXi host to have a dedicated iSCSI bridge to get the vm disk traffic off the primary NIC to avoid interfering with NFS traffic to the linux hosts. This all seemed to be working well... but the iSCSI writes were slow, so yesterday I was experimenting and built another zvol for a second iSCSI target, and while using a VM that used this new iSCSI target to test a data transfer, FreeNAS locked up again.
Complete reboots are always required to get it back. I haven't lost any data in the process, yet, but I have to get to the bottom of this.
I believe my hardware is a well supported configuration. The I210 NICs should be stable.
I'm not terribly familiar with FreeBSD, and I'm not sure where to look on FreeNAS after a reboot to figure out what it was struggling with before crashing. If you have any ideas after reading this and can help steer me to figure out what could be going wrong, I'd really appreciate it!
Here is an ifconfig on freenas:
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
ether d0:50:99:c0:11:30
inet 192.168.100.99 netmask 0xffffff00 broadcast 192.168.100.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
ether d0:50:99:c0:11:31
inet 10.10.10.1 netmask 0xffffff00 broadcast 10.10.10.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
I've have a chronic problem with FreeNAS crashing on heavy file transfers over the network, and I'm not sure where to look to debug what could be going on. I've suspected various things over time, and after reconfiguring, rebooting, etc - things seem stable, and then out of the blue, FreeNAS completely locks up during file transfers over the network and requires a reboot to recover.
FreeNAS provides many NFS shares to my physical linux hosts, as well as CIFS shares to windows, and iSCSI for an ESXi host as the vm storage.
My initial build had the 2 Intel I210 NICs teamed together to my Cisco SG200-26 switch. Things worked fine, but data transfers weren't as fast as we expected, and we enabled jumbo frames on the switch and set MTU=9000 on FreeNAS. An ESXi 6 host was added to the network, also with dual NICs teamed together to the switch.
On a big data write to an NFS mount on FreeNAS, everything locked up on FreeNAS. Things had been stable for several months, but usage was very light. This was the first really heavy use. I rebooted, and it was good for a month, and then did it again. This time, I suspected the LACP NIC teaming and went down to 1 NIC, things seemed stable, but a week later, it crashed FreeNAS again. Now I suspected the MTU=9000, set it back to default, again, things seem stable (oddly, the ifconfig below shows "JUMBO_MTU" in options, but I'm not sure where it might be getting that.
Most recently, we reconfigured things to use the second NIC in both FreeNAS and the ESXi host to have a dedicated iSCSI bridge to get the vm disk traffic off the primary NIC to avoid interfering with NFS traffic to the linux hosts. This all seemed to be working well... but the iSCSI writes were slow, so yesterday I was experimenting and built another zvol for a second iSCSI target, and while using a VM that used this new iSCSI target to test a data transfer, FreeNAS locked up again.
Complete reboots are always required to get it back. I haven't lost any data in the process, yet, but I have to get to the bottom of this.
I believe my hardware is a well supported configuration. The I210 NICs should be stable.
I'm not terribly familiar with FreeBSD, and I'm not sure where to look on FreeNAS after a reboot to figure out what it was struggling with before crashing. If you have any ideas after reading this and can help steer me to figure out what could be going wrong, I'd really appreciate it!
Here is an ifconfig on freenas:
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
ether d0:50:99:c0:11:30
inet 192.168.100.99 netmask 0xffffff00 broadcast 192.168.100.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
ether d0:50:99:c0:11:31
inet 10.10.10.1 netmask 0xffffff00 broadcast 10.10.10.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active