K3s failure in TNS

cyrus104

Explorer
Joined
Feb 7, 2021
Messages
70
I can't access my Apps in the TNS interface. I'm running the few commands that I can. I also can't generate a debug file because it seems to never finish and about an hour later I get some long python error.

Here are a few console errors that I get (70% of the time it returns the error), also not sure why all of the kube-system are terminating:
root@truenas[~]# k3s kubectl get pods -A
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
root@truenas[~]# k3s kubectl get pods -A
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods)
root@truenas[~]# k3s kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
ix-truenas-scale-chia truenas-scale-chia-5c59b65f79-j6wqw 1/1 Running 0 35h
ix-truenas-scale-plex truenas-scale-plex-648b9b44f9-wbdfg 1/1 Running 0 4h3m
kube-system openebs-zfs-controller-0 4/5 Terminating 22 4h9m
kube-system coredns-7448499f4d-m2542 1/1 Terminating 0 4h9m
kube-system openebs-zfs-node-mr76b 2/2 Terminating 0 4h9m


Part of the k3s.service status that I could copy off.
Dec 01 01:51:44 truenas.local k3s[100008]: time="2021-12-01T01:51:44.734007070+07:00" level=info msg="Stopped tunnel to 127.0.0.1:6443"
Dec 01 01:51:44 truenas.local k3s[100008]: time="2021-12-01T01:51:44.734035030+07:00" level=info msg="Proxy done" err="context canceled" url="wss://127.0.0.1:6443/v1-k3s/connect"
Dec 01 01:51:44 truenas.local k3s[100008]: time="2021-12-01T01:51:44.734039690+07:00" level=info msg="Connecting to proxy" url="wss://10.100.10.4:6443/v1-k3s/connect"
Dec 01 01:51:44 truenas.local k3s[100008]: time="2021-12-01T01:51:44.734127160+07:00" level=info msg="error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF"
Dec 01 01:51:44 truenas.local k3s[100008]: time="2021-12-01T01:51:44.739834459+07:00" level=info msg="Handling backend connection request [ix-truenas]"
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.780662 100008 shared_informer.go:240] Waiting for caches to sync for tokens
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.785476 100008 controllermanager.go:574] Started "podgc"
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.785594 100008 gc_controller.go:89] Starting GC controller
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.785614 100008 shared_informer.go:240] Waiting for caches to sync for GC
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.811740 100008 controllermanager.go:574] Started "namespace"
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.811815 100008 namespace_controller.go:200] Starting namespace controller
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.811840 100008 shared_informer.go:240] Waiting for caches to sync for namespace
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.816138 100008 controllermanager.go:574] Started "deployment"
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.816229 100008 deployment_controller.go:153] "Starting controller" controller="deployment"
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.816250 100008 shared_informer.go:240] Waiting for caches to sync for deployment
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.820216 100008 node_ipam_controller.go:91] Sending events to api server.
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.871625 100008 server.go:660] "--cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /"
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.871865 100008 container_manager_linux.go:291] "Container manager verified user specified cgroup-root exists" cgroupRoot=[]
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.871956 100008 container_manager_linux.go:296] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgrou>
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.871985 100008 topology_manager.go:120] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container"
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.872001 100008 container_manager_linux.go:327] "Initializing Topology Manager" policy="none" scope="container"
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.872013 100008 container_manager_linux.go:332] "Creating device plugin manager" devicePluginEnabled=true
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.872091 100008 kubelet.go:310] "Using dockershim is deprecated, please consider using a full-fledged CRI implementation"
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.872115 100008 client.go:78] "Connecting to docker on the dockerEndpoint" endpoint="unix:///var/run/docker.sock"
Dec 01 01:51:44 truenas.local k3s[100008]: I1201 01:51:44.872132 100008 client.go:97] "Start docker client with request timeout" timeout="2m0s"
Dec 01 01:51:44 truenas.local k3s[100008]: E1201 01:51:44.872363 100008 server.go:288] "Failed to run kubelet" err="failed to run Kubelet: failed to get docker version: Cannot connect to the Docker daemon at unix:///var/run/
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
You are running openEBS ... could you confirm whether the system is stable without these?
 

cyrus104

Explorer
Joined
Feb 7, 2021
Messages
70
I have no idea what openEBS is, this is a fresh install of TNS RC.2 with an uploaded config that just has chia and plex stock apps installed.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Did this happen immediately on installing apps.. or after it was running?
Did either app start successfully?
 

cyrus104

Explorer
Joined
Feb 7, 2021
Messages
70
I reinstalled TNS on a new SSD, then imported my configuration from a backup including seeds (.tar) and after a reboot I could see the two apps that I have installed were there but shortly there after the App menu doesn't load.

I this command never times out or gives me any results: k3s kubectl get nodes

Code:
root@truenas[~]# systemctl status k3s
● k3s.service - Lightweight Kubernetes
     Loaded: loaded (/lib/systemd/system/k3s.service; disabled; vendor preset: disabled)
     Active: active (running) since Wed 2021-12-01 07:10:16 +07; 24h ago
       Docs: https://k3s.io
    Process: 18018 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 18021 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
   Main PID: 18024 (k3s-server)
      Tasks: 60
     Memory: 656.1M
     CGroup: /system.slice/k3s.service
             └─18024 /usr/local/bin/k3s server


Dec 02 07:14:29 truenas.local k3s[18024]: E1202 07:14:29.178163   18024 kubelet_node_status.go:457] "Unable to update node status" err="update node status exceeds retry count"
Dec 02 07:14:35 truenas.local k3s[18024]: E1202 07:14:35.158842   18024 cni.go:380] "Error deleting pod from network" err="Multus: [kube-system/openebs-zfs-controller-0]: error getting pod: Get \"https://127.0.0.1:6443/api/v1>
Dec 02 07:14:35 truenas.local k3s[18024]: E1202 07:14:35.159370   18024 remote_runtime.go:144] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"open>
Dec 02 07:14:35 truenas.local k3s[18024]: E1202 07:14:35.159391   18024 kuberuntime_gc.go:176] "Failed to stop sandbox before removing" err="rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"openebs->
Dec 02 07:14:35 truenas.local k3s[18024]: E1202 07:14:35.966051   18024 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://127.0.0.1:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-l>
Dec 02 07:14:45 truenas.local k3s[18024]: time="2021-12-02T07:14:45.294416362+07:00" level=error msg="Failed to connect to proxy" error="dial tcp 10.100.10.4:6443: connect: connection timed out"
Dec 02 07:14:45 truenas.local k3s[18024]: time="2021-12-02T07:14:45.294465492+07:00" level=error msg="Remotedialer proxy error" error="dial tcp 10.100.10.4:6443: connect: connection timed out"
Dec 02 07:14:49 truenas.local k3s[18024]: E1202 07:14:49.178441   18024 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"ix-truenas\": Get \"https://127.0.0.1:6443/api/v1/nodes/ix>
Dec 02 07:14:50 truenas.local k3s[18024]: time="2021-12-02T07:14:50.294574626+07:00" level=info msg="Connecting to proxy" url="wss://10.100.10.4:6443/v1-k3s/connect"
Dec 02 07:14:52 truenas.local k3s[18024]: E1202 07:14:52.967134   18024 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://127.0.0.1:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-l>
 

truecharts

Guru
Joined
Aug 19, 2021
Messages
788
You are running openEBS ... could you confirm whether the system is stable without these?
OpenEBS is part of TrueNAS SCALE, it's required for the PVC storage to work.
 

cyrus104

Explorer
Joined
Feb 7, 2021
Messages
70
Not sure how to continue with this as it happens when I restore from my backup. Without being able to run any k3s kubectl commands for status due to timeouts, what are some other commands to help troubleshoot the issue?
 

waqarahmed

iXsystems
iXsystems
Joined
Aug 28, 2019
Messages
136
@cyrus104 can you please email me a debug of your system at waqar at the rate of ixsystems.com and we can go from there ? Thanks!
 

truecharts

Guru
Joined
Aug 19, 2021
Messages
788
@truecharts Is this solved? K3S is not starting for me with similar errors.

We're not the creator nor the maintainer of TrueNAS and it's App system.

There are multiple cases in which these error appear, if you have them please file a bugreport on the iX Systems Jira bugtracker. That way @waqarahmed gets them in his inbox.

No one can conclude anything about your issue based on this issue.
 

radomirpolach

Explorer
Joined
Feb 13, 2022
Messages
71
We're not the creator nor the maintainer of TrueNAS and it's App system.

There are multiple cases in which these error appear, if you have them please file a bugreport on the iX Systems Jira bugtracker. That way @waqarahmed gets them in his inbox.

No one can conclude anything about your issue based on this issue.
Yes, I will make some issue on the tracker. I downgraded in the meantime, but I get this now:
Ready False Tue, 11 Oct 2022 00:18:54 +0200 Tue, 11 Oct 2022 00:18:25 +0200 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
So far k3s on TrueNAS were super unreliable.
 
Top