SOLVED Jails losing access to network

chravis

Contributor
Joined
Jan 27, 2019
Messages
104
I am hoping that someone else has experienced (and been able to resolve) this issue, or at least can help me figure out where to look to pinpoint the problem. I have found similar threads on this forum and on reddit, but I didn't find anything that resolved my problem (and honestly I couldn't understand much of what they were saying).

About a week and a half ago, all of my jails lost networking. By that I mean they quit working, presumably because they couldn't talk to the outside anymore. From a shell inside the jail, I cannot ping the default router (192.168.1.1) like I used to.
I have a pretty simple set up that was chugging along without issue for quite some time, so this issue has surprised me. I only have 4 jails (one is created by a plugin):
1. The MineOS plugin/jail
2. A jail for Plex
3. A jail that runs a mysql server
4. A jail that runs a tomcat server (that communicates with the mysql server)

I do not use DHCP for any of my jails. I have a static IP for all 4. I then have set my router (Netgear Nighthawk R6900) to only assign IPs starting above the range that my static IPs are in. I also have each jail IP as a reserved IP in the router, and I can (when things are working) see each of these IPs as "attached devices" inside the router gui.

The first time things quit working, I was able to resolve it by restarting the TrueNAS server. Everything was fine for a few hours, then it happened again. This time a TrueNAS server reboot did not resolve the problem. Instead, I shut down the server, my modem, and router. Then started each one up in order. This resolved the problem for about a day and a half until last night when it happened again. Currently things are down.

From the threads I already read, there is a lot of talk about vnet, nics, and bridges, which honestly is above my paygrade. The fact that it was working fine, but now it is not, makes me think that something has changed. Ok, so what changed? Yes, I did do some things before the jails went haywire. This is what I can remember doing:
1. I upgraded TrueNAS from TrueNAS-12.0-U2 to TrueNAS-12.0-U4
2. I needed to upgrade the java version inside the mineos jail so that I could use the latest minecraft server jar. I had a heck of a time doing this upgrade and ended up just blowing away the plugin and reinstalling it, then upgrading java. This appeared to be working
3. I attempted to install the gitlab plugin a few times but it never would complete
4. I did not change any kind of network settings anywhere

Part of me wants to think it has something to do with the MineOS upgrade. It seems like when the issue happens, it's while my two kids are playing Minecraft. But I haven't made that conclusive connection, and it's hard to believe that a problem with one plugin can mess up all the others.
With that said, because all the jails are messed up, it makes me think it's a higher level server or networking issue. I just don't understand how it can work for a while and then stop working.

Here is the ifconfig from the host:
Code:
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 150                    0
        options=812099<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,VLAN                    _HWFILTER>
        ether 6c:2b:59:d9:89:1d
        inet 192.168.1.26 netmask 0xffffff00 broadcast 192.168.1.255
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=9<PERFORMNUD,IFDISABLED>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
        inet 127.0.0.1 netmask 0xff000000
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
pflog0: flags=0<> metric 0 mtu 33160
        groups: pflog
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 02:83:6e:a1:24:00
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto stp-rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        member: vnet0.4 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 8 priority 128 path cost 2000
        member: vnet0.3 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 7 priority 128 path cost 2000
        member: vnet0.2 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 6 priority 128 path cost 2000
        member: vnet0.1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 5 priority 128 path cost 2000
        member: em0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 1 priority 128 path cost 20000
        groups: bridge
        nd6 options=1<PERFORMNUD>
vnet0.1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu                     1500
        description: associated with jail: mineos as nic: epair0b
        options=8<VLAN_MTU>
        ether 62:2b:59:c5:1d:5f
        hwaddr 02:9d:8b:66:16:0a
        groups: epair
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        nd6 options=1<PERFORMNUD>
vnet0.2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu                     1500
        description: associated with jail: mysql as nic: epair0b
        options=8<VLAN_MTU>
        ether 6e:2b:59:9d:e6:cc
        hwaddr 02:15:c7:05:56:0a
        groups: epair
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        nd6 options=1<PERFORMNUD>
vnet0.3: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu                     1500
        description: associated with jail: pms as nic: epair0b
        options=8<VLAN_MTU>
        ether 6e:2b:59:57:7e:c0
        hwaddr 02:98:39:79:54:0a
        groups: epair
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        nd6 options=1<PERFORMNUD>
vnet0.4: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu                     1500
        description: associated with jail: tomcat as nic: epair0b
        options=8<VLAN_MTU>
        ether 6e:2b:59:32:e3:88
        hwaddr 02:e5:89:0e:15:0a
        groups: epair
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        nd6 options=1<PERFORMNUD>


Here's the TrueNAS Network Summary screen:
NetworkSummary.PNG


Here's the Network Global Config screen:

NetworkGlobalConfig.PNG


Anyone have any thoughts or suggestions?
 

Attachments

  • NetworkGlobalConfig.PNG
    NetworkGlobalConfig.PNG
    51.4 KB · Views: 161

chravis

Contributor
Joined
Jan 27, 2019
Messages
104
This is mostly solved but I still have a question about jails.
I was able to do some testing of different scenarios and this issue is directly related to my Mineos plugin, specifically after upgrading java to openjdk16 and creating a 1.17 minecraft server.
If my kids play on a 1.16.5 server, all is well. If I upgrade the java version and upgrade the server profile to 1.17, the kids are able to play minecraft for about 10 minutes and then all of a sudden all my jails lose networking.
My guess is that either the java16 or Minecraft profile 1.17 require more memory than I had allocated for the minecraft server and thus as my kids played for a few minutes, all the available (allocated) memory was being used up and then things just crashed.

My bigger question is - is it expected that a memory crash in a jail would affect networking in all other jails? This just seems a little strange since I was under the impression that jails were not supposed to be able to "mess with" other jails.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
My bigger question is - is it expected that a memory crash in a jail would affect networking in all other jails? This just seems a little strange since I was under the impression that jails were not supposed to be able to "mess with" other jails.
Jails are for filesystems.

They share memory with the host system and can consume memory in a way that can result in overall system issues if insufficient memory exists to sustain all jails and the system.
 

chravis

Contributor
Joined
Jan 27, 2019
Messages
104
I'm going to go ahead and mark this as solved. After bumping up the memory to 4GB I haven't had any of the previous issues.
 

bal0an

Explorer
Joined
Mar 2, 2012
Messages
72
I just lost networking on a mineos jail on my TrueNAS-12.0-U3.1.
Restarting the jail didn't help. Apparently the virtual switch was broken and the DHCP lease failed.
When restarting the jail with iocage start mineos it cycled through some vnet-n.0 adapters without success.
Minecraft server version is 1.16.5. The jail has been running for > 1 year without issues.
A TrueNAS server restart solved the issue.
 
Last edited:

bal0an

Explorer
Joined
Mar 2, 2012
Messages
72
I am not convinced increasing memory solves the issue (IMHO it is a workaround buying some time)
and
I am concerned that there is an interaction between the jail and the host network stack.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
The TrueNAS specific autoconfiguration of bridge interfaces sometimes messes up your network if your topology is more complex than a single LAN/broadcast-domain. This must change for the future, but I have not yet managed to find the spare time to properly document everything and file a ticket in JIRA.

You will find lots of threads where I try and often successfully help people with that. Search for VLAN and bridge and myself as author of the post. As I said "all it needs" is a proper write up :wink:
 

chravis

Contributor
Joined
Jan 27, 2019
Messages
104
@bal0an - I have no doubt you could be correct about the increase in memory just being a temporary solution. But honestly words like vnet and bridge and so forth just confuses me, so I'll take the temporary solution for now :smile:
With that said, my case was slightly different than yours - all of my jails (I only have 4) have IPs assigned to them. I do not use DHCP at all. My IPs are below my router's assignable range and I've never seen any conflict or issue there. A TrueNAS server reboot did fix my problem also, but only until my kids started playing Minecraft again - about 10 minutes in. And all of this only started when I attempted to upgrade things so they could play MC v1.17 (but in full disclosure I had a heck of time upgrading so there could very well be multiple issues at play here).
So, I agree that adding memory is probably not the ultimate solution, but it's about all I can do with what I know.

To @Patrick M. Hausen 's point, I feel like my topography is very simple so I'd be surprised if TrueNAS was tripping up. But then again I don't understand all the network stuff anyways.
 
Top