Installing jails brick server after system restart

wgreenway

Dabbler
Joined
Mar 19, 2019
Messages
26
Hostname nasverse.rrealms.com
Build FreeNAS-11.1-U7
Platform AMD Ryzen 5 2600 Six-Core Processor
Memory 16267MB
NIC: Intel X520 10G connected VIA DAC

I need some help here... I have lost hours and hours trying to troubleshoot this issue where I install a JAIL (resilio sync is the one I tried last) but this has happened with three other things (DNSMasq, Plex, and Madsonic). The plug-ins will install fine and work either DHCP or with a static IP assignment. However, if I go to restart the server things go to hell. I discovered this the hard way when Plex wanted a server restart.

So, after much fiddling around I've come to the conclusion it has something to do with the bridge / epair setup. Here's what happens...

Start the server. Messages scroll, lights on the switch all green.

When the jails start I see the bridge0 come up, then epairs come up, and just pair A gets promiscuous mode turned on. I see a message about arp incomplete. Then I see the switch connection light wink out. All the interfaces still show up... but networking is completely dead. Ifconfig up / down ... no command I have found will bring the networking back to life.

If I go in the jail and turn off autostart, the server will start and operate fine. I can from the GUI start the jail and it may (or may not) hang the server.

WTH? I have googled for hours and I can't find any reference to anyone else having a similar problem. Anyone have any ideas what's going on?

I am not doing anything strange. I'm just clicking the install plugin thing and letting FreeNAS do its thing. Before I figured out the jail was doing it, I spent hours digging around, and blew away my configuration twice thinking I'm doing something wrong. This is the third time. The pattern can't be accidental.
 
D

dlavigne

Guest
What is the full output of ifconfig when this issue occurs?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
What is your networking layout? Do you have multiple nics on the sever plugged in?
 

wgreenway

Dabbler
Joined
Mar 19, 2019
Messages
26
What is the full output of ifconfig when this issue occurs?
That's the weird thing... all the interfaces (including Bridge0, EPAIR01A, EPAIR01B all show as UP). Nothing is working though ... it can ping itself but no other devices from the outside can ping it. I would have to crash it again to see the exact output... but I grepped through the messages /var/log and didn't find anything.

Note, it works fine if I start up a jail. It will continue to operate until I shutdown the system... on that next reboot... dead. If I tell the jails not to start and reboot... fine. It's whatever stuff that the GUI configures for when the machine boots that is apparently sideways.
 
Last edited:

wgreenway

Dabbler
Joined
Mar 19, 2019
Messages
26
What is your networking layout? Do you have multiple nics on the sever plugged in?
There is 10GTek X520DA1 clone board in an 8X PCI-E slot (ix0) and the onboard GBE controller (gb0). The machine is hooked up by a DAC cable to a MikroTik 4 port SFP+ switch.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
There is 10GTek X520DA1 clone board in an 8X PCI-E slot (ix0) and the onboard GBE controller (gb0). The machine is hooked up by a DAC cable to a MikroTik 4 port SFP+ switch.
Unplug every cable except for 1 and retest.
 

wgreenway

Dabbler
Joined
Mar 19, 2019
Messages
26
Unplug every cable except for 1 and retest.
Oh, I miswrote... the gb0 isn't hooked up and hasn't been since I reconfigured the box. Just the 10G. The last two times it bricked itself, the 10G was the only thing hooked up.

When it starts I see it probe the gb0 and determine it's not plugged in. Even when it fails... I see it probe ix0 and change the link state to UP. I see Bridge0 go up. Then the pairs... that's when I see the link light go out. It goes by very fast and the stupidly annoying thing... is THOSE messages aren't in the dang log. Whatever goes wrong DOES NOT GET RECORDED IN ANY LOG.
 

wgreenway

Dabbler
Joined
Mar 19, 2019
Messages
26
What is the full output of ifconfig when this issue occurs?
Went back and reproduced.

FAILING IFCONFIG:
igb0: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 04:92:26:d9:38:ef
hwaddr 04:92:26:d9:38:ef
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect
status: no carrier
ix0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=a400b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,RXCSUM_IPV6>
ether 00:1b:21:be:cd:48
hwaddr 00:1b:21:be:cd:48
inet 192.168.0.88 netmask 0xffffff00 broadcast 192.168.0.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (Unknown <rxpause,txpause>)
status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
inet 127.0.0.1 netmask 0xff000000
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
groups: lo
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
ether 02:d4:6c:23:3e:00
nd6 options=9<PERFORMNUD,IFDISABLED>
groups: bridge
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: epair0a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 5 priority 128 path cost 2000
member: ix0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 2 priority 128 path cost 2000
epair0a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=8<VLAN_MTU>
ether 02:49:50:00:05:0a
hwaddr 02:49:50:00:05:0a
nd6 options=1<PERFORMNUD>
media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
status: active
groups: epair

Messages:

Apr 1 10:22:41 nasverse kernel: ix0: link state changed to UP
Apr 1 10:22:41 nasverse kernel: ix0: link state changed to UP

Apr 1 10:22:41 nasverse uhid0 on uhub2
Apr 1 10:22:41 nasverse uhid0: <CHICONY USB Keyboard, class 0/0, rev 2.00/2.30, addr 1> on usbus2
Apr 1 10:22:41 nasverse ums0 on uhub2
Apr 1 10:22:41 nasverse ums0: <PixArt Lenovo USB Optical Mouse, class 0/0, rev 2.00/1.00, addr 2> on usbus2
Apr 1 10:22:41 nasverse ums0: 3 buttons and [XYZ] coordinates ID=0
Apr 1 10:22:41 nasverse ntpd[2137]: ntpd 4.2.8p10-a (1): Starting
Apr 1 17:22:41 nasverse python3.6: dnssd_clientstub ConnectToServer: connect()-> No of tries: 2
Apr 1 17:22:41 nasverse python3.6: dnssd_clientstub ConnectToServer: connect()-> No of tries: 2
Apr 1 10:22:42 nasverse proftpd[2287]: 127.0.0.1 - ProFTPD 1.3.6 (stable) (built Wed Jan 23 2019 17:17:27 UTC) standalone mode STARTUP

Apr 1 10:22:49 nasverse bridge0: Ethernet address: 02:d4:6c:23:3e:00
Apr 1 10:22:49 nasverse kernel: ix0: promiscuous mode enabled
Apr 1 10:22:49 nasverse kernel: bridge0: link state changed to UP
Apr 1 10:22:49 nasverse kernel: bridge0: link state changed to UP
Apr 1 10:22:49 nasverse epair0a: Ethernet address: 02:49:50:00:05:0a
Apr 1 10:22:49 nasverse epair0b: Ethernet address: 02:49:a0:00:06:0b
Apr 1 10:22:49 nasverse kernel: epair0a: link state changed to UP
Apr 1 10:22:49 nasverse kernel: epair0a: link state changed to UP
Apr 1 10:22:49 nasverse kernel: epair0b: link state changed to UP
Apr 1 10:22:49 nasverse kernel: epair0b: link state changed to UP
Apr 1 10:22:49 nasverse kernel: epair0a: promiscuous mode enabled

NOTE: epair0b doesn't get it enabled, could this be part of the problem?


SUCCESSFUL IFCONFIG (Jail autostart turned off)
igb0: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 04:92:26:d9:38:ef
hwaddr 04:92:26:d9:38:ef
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect
status: no carrier
ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:1b:21:be:cd:48
hwaddr 00:1b:21:be:cd:48
inet 192.168.0.88 netmask 0xffffff00 broadcast 192.168.0.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>)
status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
inet 127.0.0.1 netmask 0xff000000
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
groups: lo
 

Attachments

  • bootlogs.zip
    11.2 KB · Views: 372
Last edited:

wgreenway

Dabbler
Joined
Mar 19, 2019
Messages
26
Looking back at my own post -- in the failing scenario:

Ethernet autoselect (Unknown <rxpause,txpause>)

When it's working:

Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>)


What's causing that?
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,450
You should update to 11.2 as lots of iocage issues where present then and seems fairly stable now.
 

wgreenway

Dabbler
Joined
Mar 19, 2019
Messages
26
You should update to 11.2 as lots of iocage issues where present then and seems fairly stable now.

So..... did the upgrade from the gui... rebooted a couple times... turned the jail on... and.... it works.

Freaking A. Hallelujah! Thanks.

Holy crap the interface changes a lot between those versions.

It does yell about the jail being created in the earlier version... is there away to fix that? Or should I leave it alone?
 
Top