SOLVED RancherVM pauses unexpectedly; when serial CLI connecting, it resumes

apwiggins

Dabbler
Joined
Dec 23, 2016
Messages
41
I'm seeing behavior similar to this link on my server (Xeon E5-2620 v4 @ 2.10GHz (16 cores):
https://www.reddit.com/r/freenas/comments/8zkcva/112_why_are_my_vms_stopping/

I have an entirely new Rancher VM created in FreeNAS-11.2-U4.1. It runs and I've been able to add Docker containers, including an OwnCloud website for file-hosting.

After some period of time, the OwnCloud instance was disappearing, so http://xx.xx.xx.210:80 returned nothing and the RancherUI interface also returned nothing (http://xx.xx.xx.210:8080). Investigating further, the RancherVM when pinged (xx.xx.xx.210), was unreachable. Using the Docker web interface, the Virtual Machine instance was accessed via web interface Serial option (it invokes cu -l /dev/nmdm11B) to access the Rancher VM. The CLI popped up quickly. Pings to the RancherVM (xx.xx.xx.210) started immediately and the OwnCloud website (http://xx.xx.xx.210:80) content appeared. This exhibits all the behavior of a pause rather than a restart.

Why does the RancherVM pause on its own? A parallel UbuntuVM hosting a gitlab instance (xx.xx.xx.211) created on the same machine doesn't pause like this, ever.

A clue from the logs could be the following:
May 27 22:09:54 freenas kernel: tap1: link state changed to DOWN
May 27 22:09:54 freenas kernel: tap1: link state changed to DOWN
May 27 22:11:02 freenas tap1: Ethernet address: 00:xx:xx:xx:07:01
May 27 22:11:02 freenas kernel: tap1: promiscuous mode enabled
May 27 22:11:04 freenas kernel: tap1: link state changed to UP
May 27 22:11:04 freenas kernel: tap1: link state changed to UP

The tap1 interface MAC doesn't match the RancherVM NIC MAC (00:xx:xx:xx:E7:39), but maybe the VM's NIC talks to the tap1 interface on the host?
 

apwiggins

Dabbler
Joined
Dec 23, 2016
Messages
41
Thanks for the reply.

Not sure what you mean exactly by 'within code tags'.

ifconfig for the tap interface on the FreeNAS host:
tap1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: Attached to RancherUIVM3
options=80000<LINKSTATE>
ether 00:xx:xx:xx:07:01
hwaddr 00:xx:xx:xx:07:01
inet 0.0.0.0 netmask 0xff000000 broadcast 255.255.255.255
nd6 options=1<PERFORMNUD>
media: Ethernet autoselect
status: active
groups: tap
Opened by PID 8185



ifconfig from the rancher guest which provides the following (less container interfaces).
[rancher@rancher ~]$ ifconfig
docker-sys Link encap:Ethernet HWaddr xx:xx:xx:xx:EC:A7
inet addr:172.18.42.1 Bcast:0.0.0.0 Mask:255.255.0.0
inet6 addr: fe80::42:8fff:fe4f:eca7/64 Scope:Link
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:5 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:426 (426.0 B)

docker0 Link encap:Ethernet HWaddr xx:xx:xx:xx:78:4F
inet addr:172.17.0.1 Bcast:172.17.255.255 Mask:255.255.0.0
inet6 addr: fe80::42:d0ff:fe8e:784f/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:665301 errors:0 dropped:0 overruns:0 frame:0
TX packets:490898 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:178419837 (170.1 MiB) TX bytes:178977004 (170.6 MiB)

eth0 Link encap:Ethernet HWaddr xx:xx:xx:xx:E7:39
inet addr:xx.xx.xx.210 Bcast:xx.xx.xx.255 Mask:255.255.255.0
inet6 addr: fe80::2a0:xxxx:xxxx:e739/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:334730 errors:0 dropped:0 overruns:0 frame:0
TX packets:216680 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:54120787 (51.6 MiB) TX bytes:29911715 (28.5 MiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
 
D

dlavigne

Guest
Can you post the full output of ifconfig from the FreeNAS host? That will tell us which driver you're using.
 

apwiggins

Dabbler
Joined
Dec 23, 2016
Messages
41
From the FreeNAS host. The green font below is the Rancher VM i/f.

@freenas:~ # ifconfig
igb0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=2400b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,RXCSUM_IPV6>
ether xx:xx:xx:xx:ba:67
hwaddr xx:xx:xx:xx:ba:67
inet xx.xx.xx.200 netmask 0xffffff00 broadcast 192.168.0.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether xx:xx:xx:xx:ba:68
hwaddr xx:xx:xx:xx:ba:68
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect
status: no carrier
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
inet 127.0.0.1 netmask 0xff000000
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
groups: lo
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
ether 02:45:89:7a:3a:00
nd6 options=1<PERFORMNUD>
groups: bridge
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: epair2a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 7 priority 128 path cost 2000
member: epair1a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 6 priority 128 path cost 2000
member: epair0a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 5 priority 128 path cost 2000
member: igb0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 1 priority 128 path cost 20000

epair0a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=8<VLAN_MTU>
ether 02:2d:d0:00:05:0a
hwaddr 02:2d:d0:00:05:0a
nd6 options=1<PERFORMNUD>
media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
status: active
groups: epair
epair1a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=8<VLAN_MTU>
ether 02:2d:d0:00:06:0a
hwaddr 02:2d:d0:00:06:0a
nd6 options=1<PERFORMNUD>
media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
status: active
groups: epair
epair2a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=8<VLAN_MTU>
ether 02:2d:d0:00:07:0a
hwaddr 02:2d:d0:00:07:0a
nd6 options=1<PERFORMNUD>
media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
status: active
groups: epair
tap0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: Attached to Gitlab_VM2
options=80000<LINKSTATE>
ether 00:bd:59:c7:f8:00
hwaddr 00:bd:59:c7:f8:00
nd6 options=1<PERFORMNUD>
media: Ethernet autoselect
status: active
groups: tap
Opened by PID 9248
vnet0:4: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: associated with jail: radarr as nic: epair0b
options=8<VLAN_MTU>
ether 02:ff:60:c1:df:ca
hwaddr 02:2d:d0:00:09:0a
nd6 options=1<PERFORMNUD>
media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
status: active
groups: epair

tap1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500

description: Attached to RancherUIVM3
options=80000<LINKSTATE>
ether 00:bd:02:d5:f8:01
hwaddr 00:bd:02:d5:f8:01
nd6 options=1<PERFORMNUD>
media: Ethernet autoselect
status: active
groups: tap

Opened by PID 9683
 

apwiggins

Dabbler
Joined
Dec 23, 2016
Messages
41
Can you post the full output of ifconfig from the FreeNAS host? That will tell us which driver you're using.
Hi dlavigne...any more thoughts on this before the trail goes cold? Thanks again for taking the time to have a look.
 
D

dlavigne

Guest
If the issue is persisting for you, go ahead and make a ticket at bugs.ixsystems.com so a dev can take a look.
 

apwiggins

Dabbler
Joined
Dec 23, 2016
Messages
41
So, RancherOS is being removed in FreeNAS 11.3.

https://jira.ixsystems.com/browse/N...issuetabpanels:comment-tabpanel#comment-93610
===
For anyone still using 11.2, RancherOS has this suspend function which is the likely offender. Speculating that there is an ACPI event that is triggering this script.
https://github.com/rancher/os/blob/master/images/02-acpid/etc/acpi/suspend.sh

suspend.sh can probably be removed with a write_files
https://github.com/rancher/os/issues/1388#issuecomment-260218914
https://rancher.com/docs/os/v1.2/en/configuration/write-files/
 
Top