DATAstrm
Dabbler
- Joined
- Nov 24, 2021
- Messages
- 14
Hi All. I'm looking for some troubleshooting tips for the below issue.
My jails are all losing connectivity after multiple days (the most recent being about 10 days). The issue is difficult to troubleshoot because the connectivity loss happens after numerous days (with jails working flawlessly during the interim). I have searched the forum for similar issues and also checked the logs (/var/logs) in the host machine as well as the jails, but I have not found anything relevant.
Some information:
Jails running
Because I have a cloud backup running every 5 min (Jail #5) with verbose logging, I am able to identify when the connectivity drops down to ~5 min. The rclone backup simply stalls and never completes. When that happens, I cannot access any of the 6 jails. I cannot ping them either (allow raw sockets is on).
Restarting the jails (iocage restart ALL) does NOT result in connectivity being restored. The jails ONLY come back online after I reboot the entire system. After a reboot, the system, including all the jails, work fine for multiple days. I am then able to ping each jail with no problem, both from inside each jail and from an external system.
Some Questions I have:
Any help would be welcome!
Here's my ifconfig. I doubt the issue is there since the jails work for multiple days before failing silently. This indicates to me that the network is configured and working correctly.
My jails are all losing connectivity after multiple days (the most recent being about 10 days). The issue is difficult to troubleshoot because the connectivity loss happens after numerous days (with jails working flawlessly during the interim). I have searched the forum for similar issues and also checked the logs (/var/logs) in the host machine as well as the jails, but I have not found anything relevant.
Some information:
Jails running
- Reverse proxy (for #2)
- Nextcloud
- Unifi controller
- Wireguard
- Cloud backup to Backblaze (runs rclone every 5 min)
- Cloud backup (downloads a file every day)
Because I have a cloud backup running every 5 min (Jail #5) with verbose logging, I am able to identify when the connectivity drops down to ~5 min. The rclone backup simply stalls and never completes. When that happens, I cannot access any of the 6 jails. I cannot ping them either (allow raw sockets is on).
Restarting the jails (iocage restart ALL) does NOT result in connectivity being restored. The jails ONLY come back online after I reboot the entire system. After a reboot, the system, including all the jails, work fine for multiple days. I am then able to ping each jail with no problem, both from inside each jail and from an external system.
Some Questions I have:
- Is there any other troubleshooting I can perform? I have already scoured the logs for all the jails and the host system. I've also looked at the logs in my router. Because I know when the connection loss happens down to ~ 5 min, I am able to look at the relevant part of the logs.
- Is there any way to restart the network stack so I don't have to reboot the entire system? I tried restarting the jails, with no success. I also tried making a nominal change to the network to see if the jail would come up, but they don't. Only a reboot seems to work. If there's a command that can restart/refresh the network stack, then maybe I can avoid rebooting with a script.
- Last resort would be to just get a monitoring script up that pings the jails and lets me know when they go down. Because the failure is silent, this would at least tell me when to reboot the system.
Any help would be welcome!
Here's my ifconfig. I doubt the issue is there since the jails work for multiple days before failing silently. This indicates to me that the network is configured and working correctly.
Code:
ix0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: member of lagg0 options=a100b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,RXCSUM_IPV6> ether b8:ca:3a:70:b3:24 hwaddr b8:ca:3a:70:b3:20 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=9<PERFORMNUD,IFDISABLED> ix1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: member of lagg0 options=a100b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,RXCSUM_IPV6> ether b8:ca:3a:70:b3:24 hwaddr b8:ca:3a:70:b3:22 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=9<PERFORMNUD,IFDISABLED> igb0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: member of lagg0 options=a100b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,RXCSUM_IPV6> ether b8:ca:3a:70:b3:24 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=9<PERFORMNUD,IFDISABLED> igb1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: Access Vlan 20 - Port 4 options=a500b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6> ether b8:ca:3a:70:b3:25 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=9<PERFORMNUD,IFDISABLED> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 inet 127.0.0.1 netmask 0xff000000 groups: lo nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> pflog0: flags=0<> metric 0 mtu 33160 groups: pflog lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: Mediaserver 2 Main interface (LAGG) options=a100b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,RXCSUM_IPV6> ether b8:ca:3a:70:b3:24 inet 192.168.1.111 netmask 0xffffff00 broadcast 192.168.1.255 laggproto lacp lagghash l2,l3,l4 laggport: ix0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> laggport: ix1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> groups: lagg media: Ethernet autoselect status: active nd6 options=9<PERFORMNUD,IFDISABLED> vlan20: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: Camera Vlan options=200001<RXCSUM,RXCSUM_IPV6> ether b8:ca:3a:70:b3:25 groups: vlan vlan: 20 vlanpcp: 4 parent interface: igb1 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=9<PERFORMNUD,IFDISABLED> bridge20: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: vlan 20 bridge ether 02:ab:61:57:66:14 id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto stp-rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: vlan20 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 8 priority 128 path cost 55 groups: bridge nd6 options=9<PERFORMNUD,IFDISABLED> bridge111: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: Main Lagg Bridge ether 02:ab:61:57:66:6f id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto stp-rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: vnet0.6 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 19 priority 128 path cost 2000 member: vnet0.4 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 17 priority 128 path cost 2000 member: vnet0.3 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 16 priority 128 path cost 2000 member: vnet0.2 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 15 priority 128 path cost 2000 member: vnet0.1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 14 priority 128 path cost 2000 member: vnet1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 13 priority 128 path cost 2000000 member: lagg0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 7 priority 128 path cost 2000000 groups: bridge nd6 options=9<PERFORMNUD,IFDISABLED> bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 ether 02:ab:61:57:66:00 id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto stp-rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: vnet0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 12 priority 128 path cost 2000000 member: igb1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 4 priority 128 path cost 20000 groups: bridge nd6 options=1<PERFORMNUD> vnet0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=80000<LINKSTATE> ether fe:a0:98:59:68:7e hwaddr 58:9c:fc:10:ff:9d groups: tap media: Ethernet autoselect status: active nd6 options=1<PERFORMNUD> Opened by PID 2384 vnet1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=80000<LINKSTATE> ether fe:a0:98:03:50:7e hwaddr 58:9c:fc:10:2f:7d groups: tap media: Ethernet autoselect status: active nd6 options=1<PERFORMNUD> Opened by PID 2384 vnet0.1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: associated with jail: CloudBackup as nic: epair0b options=8<VLAN_MTU> ether ba:ca:3a:f3:e1:77 hwaddr 02:e8:a0:a9:fb:0a groups: epair media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>) status: active nd6 options=1<PERFORMNUD> vnet0.2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: associated with jail: Nextcloud as nic: epair0b options=8<VLAN_MTU> ether ba:ca:3a:27:9a:bf hwaddr 02:21:56:f3:88:0a groups: epair media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>) status: active nd6 options=1<PERFORMNUD> vnet0.3: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: associated with jail: Security-Backup as nic: epair0b options=8<VLAN_MTU> ether ba:ca:3a:bd:c3:3f hwaddr 02:21:63:5f:00:0a groups: epair media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>) status: active nd6 options=1<PERFORMNUD> vnet0.4: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: associated with jail: Unifi_Controller as nic: epair0b options=8<VLAN_MTU> ether ba:ca:3a:81:ad:c7 hwaddr 02:5e:97:7c:89:0a groups: epair media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>) status: active nd6 options=1<PERFORMNUD> vnet0.5: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: associated with jail: WireGuardJail as nic: epair0b options=8<VLAN_MTU> ether ba:ca:3a:cf:07:1b hwaddr 02:1e:e4:93:49:0a inet 172.16.0.1 netmask 0xfffffffc broadcast 172.16.0.3 groups: epair media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>) status: active nd6 options=1<PERFORMNUD> vnet0.6: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: associated with jail: reverse-proxy as nic: epair0b options=8<VLAN_MTU> ether ba:ca:3a:b5:7c:2a hwaddr 02:7f:dc:69:1b:0a groups: epair media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>) status: active nd6 options=1<PERFORMNUD>