Apps Service won't start

hunterjm · Jan 10, 2024

Today after a restart, none of my apps will start. k3s says the node is ready, but there is a `not-ready` taint, and the logs look like containers are trying to start but can not be accessed. I've browsed these forums for a few hours and couldn't find any relevant posts, but here is what I'm seeing:

Code:

Failed to start kubernetes cluster for Applications: [EFAULT] Kube-router routes not applied as timed out waiting for pods to execute

Code:

root@truenas[~]# k3s kubectl get nodes
NAME         STATUS   ROLES                  AGE    VERSION
ix-truenas   Ready    control-plane,master   406d   v1.26.6+k3s-e18037a7-dirty

Code:

root@truenas[~]# k3s kubectl describe node ix-truenas
Name:               ix-truenas
Roles:              control-plane,master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ix-truenas
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=true
                    node-role.kubernetes.io/master=true
                    openebs.io/nodeid=ix-truenas
                    openebs.io/nodename=ix-truenas
Annotations:        csi.volume.kubernetes.io/nodeid: {"zfs.csi.openebs.io":"ix-truenas"}
                    k3s.io/node-args:
                      ["server","--cluster-cidr","172.16.0.0/16","--cluster-dns","172.17.0.10","--data-dir","/mnt/tank/ix-applications/k3s","--disable","metrics...
                    k3s.io/node-config-hash: 5VMKXMDJNBI2D5KDF52SDV37V4ZY2EIOJXZDRLUULQXDPJX5RA4Q====
                    k3s.io/node-env: {"K3S_DATA_DIR":"/mnt/tank/ix-applications/k3s/data/6c243f7cbf543e01911aa24f7651922820ca56e79179e8fd215a3e4381aceecf"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Tue, 29 Nov 2022 22:25:48 -0500
Taints:             node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  ix-truenas
  AcquireTime:     <unset>
  RenewTime:       Wed, 10 Jan 2024 22:08:12 -0500
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 10 Jan 2024 22:08:13 -0500   Mon, 13 Nov 2023 22:16:40 -0500   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 10 Jan 2024 22:08:13 -0500   Mon, 13 Nov 2023 22:16:40 -0500   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 10 Jan 2024 22:08:13 -0500   Mon, 13 Nov 2023 22:16:40 -0500   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Wed, 10 Jan 2024 22:08:13 -0500   Wed, 10 Jan 2024 22:08:13 -0500   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  192.168.1.30
  Hostname:    ix-truenas
Capacity:
  cpu:                12
  ephemeral-storage:  27854154Mi
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             65761356Ki
  nvidia.com/gpu:     0
  pods:               250
Allocatable:
  cpu:                12
  ephemeral-storage:  27746837493708
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             65761356Ki
  nvidia.com/gpu:     0
  pods:               250
System Info:
  Machine ID:                 3226bac618d148519c61c31b083dc929
  System UUID:                af59a1a8-6f8d-0000-0000-000000000000
  Boot ID:                    e4291094-7048-4ac4-8d8c-595cf703dcc2
  Kernel Version:             6.1.63-production+truenas
  OS Image:                   Debian GNU/Linux 12 (bookworm)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://Unknown
  Kubelet Version:            v1.26.6+k3s-e18037a7-dirty
  Kube-Proxy Version:         v1.26.6+k3s-e18037a7-dirty
PodCIDR:                      172.16.0.0/16
PodCIDRs:                     172.16.0.0/16
Non-terminated Pods:          (26 in total)
  Namespace                   Name                                               CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                               ------------  ----------  ---------------  -------------  ---
  kube-system                 nvidia-device-plugin-daemonset-7skx6               0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
  kube-system                 csi-nfs-controller-7b74694749-c2dwh                40m (0%)      0 (0%)      80Mi (0%)        900Mi (1%)     34m
  kube-system                 openebs-zfs-node-74wn8                             0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
  cert-manager                cert-manager-8444f6f86b-bxfww                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
  ix-cloudflared              cloudflared-5d8bc8d5cd-cnjlg                       10m (0%)      4 (33%)     50Mi (0%)        8Gi (12%)      34m
  ix-requestrr                requestrr-5b94d84495-7s9ql                         10m (0%)      4 (33%)     50Mi (0%)        8Gi (12%)      34m
  metallb-system              speaker-cc9v8                                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
  cnpg-system                 cnpg-controller-manager-5d74bc79fb-rtq5z           100m (0%)     100m (0%)   100Mi (0%)       200Mi (0%)     34m
  kube-system                 openebs-zfs-controller-0                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
  prometheus-operator         prometheus-operator-5dcffb7cb8-vvtdw               100m (0%)     200m (1%)   100Mi (0%)       200Mi (0%)     34m
  ix-jackett                  jackett-bd7f48b58-zcc2q                            20m (0%)      8 (66%)     100Mi (0%)       16Gi (25%)     34m
  ix-radarr                   radarr-74588c7f96-nxd96                            10m (0%)      4 (33%)     50Mi (0%)        8Gi (12%)      34m
  kube-system                 coredns-59b4f5bbd5-9t7td                           100m (0%)     0 (0%)      70Mi (0%)        170Mi (0%)     34m
  cert-manager                cert-manager-webhook-545bd5d7d8-zlcf7              0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
  ix-qbittorrent              qbittorrent-b9686749d-mds8f                        20m (0%)      8 (66%)     100Mi (0%)       16Gi (25%)     34m
  kube-system                 csi-nfs-node-xr5r8                                 30m (0%)      0 (0%)      60Mi (0%)        500Mi (0%)     17m
  kube-system                 csi-smb-controller-7fbbb8fb6f-dvwxb                30m (0%)      2 (16%)     60Mi (0%)        600Mi (0%)     34m
  ix-wyoming-piper            wyoming-piper-custom-app-7fbbc78649-qbk45          10m (0%)      4 (33%)     50Mi (0%)        8Gi (12%)      34m
  kube-system                 snapshot-controller-546868dfb4-fngtf               10m (0%)      0 (0%)      20Mi (0%)        300Mi (0%)     34m
  ix-plex                     plex-897c9965b-8gz4b                               10m (0%)      12 (100%)   50Mi (0%)        8Gi (12%)      34m
  kube-system                 svclb-dizquetv-bb5710f6-56xls                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
  kube-system                 svclb-wyoming-whisper-custom-app-c1cb0c8d-v28lp    0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
  cert-manager                cert-manager-cainjector-ffb4747bb-hbgcn            0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
  kube-system                 svclb-frigate-12-custom-app-3a50a40b-2b8bl         0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
  kube-system                 svclb-plex-955ab32e-z57fq                          0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
  kube-system                 svclb-wyoming-piper-custom-app-6374b442-7xgdf      0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                500m (4%)   46300m (385%)
  memory             940Mi (1%)  76598Mi (119%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
  nvidia.com/gpu     0           0
Events:
  Type     Reason                          Age                   From             Message
  ----     ------                          ----                  ----             -------
  Normal   NodeNotReady                    159m                  kubelet          Node ix-truenas status is now: NodeNotReady
  Normal   Starting                        159m                  kubelet          Starting kubelet.
  Warning  InvalidDiskCapacity             159m                  kubelet          invalid capacity 0 on image filesystem
  Normal   NodeAllocatableEnforced         159m                  kubelet          Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientMemory         159m                  kubelet          Node ix-truenas status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure           159m                  kubelet          Node ix-truenas status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID            159m                  kubelet          Node ix-truenas status is now: NodeHasSufficientPID
  Normal   NodePasswordValidationComplete  159m                  k3s-supervisor   Deferred node password secret validation complete
  Normal   RegisteredNode                  159m                  node-controller  Node ix-truenas event: Registered Node ix-truenas in Controller
  Warning  Rebooted                        44m (x675 over 159m)  kubelet          Node ix-truenas has been rebooted, boot id: e4f4f164-f984-4dbe-9073-dfc8c6f74123
  Normal   NodeHasSufficientPID            34m                   kubelet          Node ix-truenas status is now: NodeHasSufficientPID
  Normal   Starting                        34m                   kubelet          Starting kubelet.
  Warning  InvalidDiskCapacity             34m                   kubelet          invalid capacity 0 on image filesystem
  Normal   NodeAllocatableEnforced         34m                   kubelet          Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientMemory         34m                   kubelet          Node ix-truenas status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure           34m                   kubelet          Node ix-truenas status is now: NodeHasNoDiskPressure
  Normal   NodePasswordValidationComplete  34m                   k3s-supervisor   Deferred node password secret validation complete
  Normal   RegisteredNode                  34m                   node-controller  Node ix-truenas event: Registered Node ix-truenas in Controller
  Warning  Rebooted                        24m (x87 over 34m)    kubelet          Node ix-truenas has been rebooted, boot id: 0b8e8bd5-0216-4fe3-a25a-c9b6987c96ee
  Normal   NodeHasNoDiskPressure           17m                   kubelet          Node ix-truenas status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID            17m                   kubelet          Node ix-truenas status is now: NodeHasSufficientPID
  Normal   Starting                        17m                   kubelet          Starting kubelet.
  Warning  InvalidDiskCapacity             17m                   kubelet          invalid capacity 0 on image filesystem
  Normal   NodeAllocatableEnforced         17m                   kubelet          Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientMemory         17m                   kubelet          Node ix-truenas status is now: NodeHasSufficientMemory
  Normal   NodeNotReady                    17m                   kubelet          Node ix-truenas status is now: NodeNotReady
  Normal   NodePasswordValidationComplete  17m                   k3s-supervisor   Deferred node password secret validation complete
  Normal   RegisteredNode                  17m                   node-controller  Node ix-truenas event: Registered Node ix-truenas in Controller
  Warning  Rebooted                        7m44s (x60 over 17m)  kubelet          Node ix-truenas has been rebooted, boot id: 42e259d9-909a-4479-87c9-d007ab5c42a2
  Normal   Starting                        95s                   kubelet          Starting kubelet.
  Warning  InvalidDiskCapacity             95s                   kubelet          invalid capacity 0 on image filesystem
  Normal   NodeAllocatableEnforced         95s                   kubelet          Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientMemory         95s                   kubelet          Node ix-truenas status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure           95s                   kubelet          Node ix-truenas status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID            95s                   kubelet          Node ix-truenas status is now: NodeHasSufficientPID
  Normal   NodeReady                       94s                   kubelet          Node ix-truenas status is now: NodeReady
  Normal   NodePasswordValidationComplete  91s                   k3s-supervisor   Deferred node password secret validation complete
  Normal   RegisteredNode                  85s                   node-controller  Node ix-truenas event: Registered Node ix-truenas in Controller
  Warning  Rebooted                        85s (x18 over 95s)    kubelet          Node ix-truenas has been rebooted, boot id: e4291094-7048-4ac4-8d8c-595cf703dcc2

I have tried restarting multiple times, restoring from a recent config backup, unsetting and re-setting the app pool, and nothing seems to work. I'm at a loss for what to try next. Sometimes it will show App Services started, but spam logs with "failed to connect" when doing the container health checks. It's not super consistent.

hunterjm · Jan 10, 2024

When I do a journelctl -u k3s -f when unsetting app pool, restart, set app pool, I get the following errors related to PVCs:

Code:

Jan 10 23:22:27 truenas k3s[7690]: E0110 23:22:27.932400    7690 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/zfs.csi.openebs.io^pvc-5e394424-c7af-442d-a010-48a87fbece00 podName:8ce5cae5-940b-4833-b75e-f5fca00f0bbc nodeName:}" failed. No retries permitted until 2024-01-10 23:22:28.432363856 -0500 EST m=+421.617054329 (durationBeforeRetry 500ms). Error: UnmountVolume.TearDown failed for volume "pvc-5e394424-c7af-442d-a010-48a87fbece00" (UniqueName: "kubernetes.io/csi/zfs.csi.openebs.io^pvc-5e394424-c7af-442d-a010-48a87fbece00") pod "8ce5cae5-940b-4833-b75e-f5fca00f0bbc" (UID: "8ce5cae5-940b-4833-b75e-f5fca00f0bbc") : kubernetes.io/csi: Unmounter.TearDownAt failed to get CSI client: driver name zfs.csi.openebs.io not found in the list of registered CSI drivers

hunterjm · Jan 11, 2024

I couldn't figure it out and ended up unsetting the pool, deleting ix-applications, and starting from scratch :(

TheThirdReindeer · Jan 12, 2024

Hi did this end up working for you? I am running into the same issue but unsetting the pool, deleting ix-applications is not working for me.

hunterjm · Jan 14, 2024

TheThirdReindeer said:
Hi did this end up working for you? I am running into the same issue but unsetting the pool, deleting ix-applications is not working for me.

Yeah, it worked for me.

Steps:
1) Unset Pool (this shuts down kubernetes)
2) Restart
3) Delete ix-applications
4) Select Pool

LongCircle · Feb 13, 2024

Hi, I got the very same kind of problem also, on a fresh (10 days) install of TrueNAS-SCALE-23.10.1.3. I have never been able to run any app, they just get stuck in Deployement state, and via ssh I can see that the kube-system pods are CrashLoopBackOff.
Unsetting the pool / deleting ix-applications does not help at all, repeated several times to no avail.

Thanks for any lead / advice / trick that could help me find my way in this apps section!

Code:

sudo k3s kubectl get pods --all-namespaces
NAMESPACE     NAME                                   READY   STATUS             RESTARTS         AGE
kube-system   csi-nfs-node-8ctks                     0/3     CrashLoopBackOff   84 (5m3s ago)    117m
kube-system   coredns-59b4f5bbd5-hfphg               0/1     CrashLoopBackOff   13 (4m52s ago)   77m
kube-system   csi-smb-controller-7fbbb8fb6f-bmxff    0/3     CrashLoopBackOff   39 (4m39s ago)   77m
kube-system   openebs-zfs-node-pqgpz                 0/2     CrashLoopBackOff   23 (4m11s ago)   49m
kube-system   csi-nfs-controller-7b74694749-95fsl    0/4     CrashLoopBackOff   52 (3m39s ago)   77m
kube-system   openebs-zfs-controller-0               0/5     CrashLoopBackOff   59 (3m15s ago)   76m
kube-system   snapshot-controller-546868dfb4-t942t   0/1     CrashLoopBackOff   14 (59s ago)     77m
kube-system   snapshot-controller-546868dfb4-7r6vk   0/1     CrashLoopBackOff   14 (32s ago)     77m
kube-system   csi-smb-node-w5ct5                     0/3     Error              87               117m

Code:

sudo journalctl -u k3s -f
Feb 13 21:28:22 babylon k3s[104940]: I0213 21:28:22.405568  104940 scope.go:115] "RemoveContainer" containerID="1dffe151e2615b7500fdd91ff569f98a4623e17ee24e01bcc130e5914badd909"
Feb 13 21:28:22 babylon k3s[104940]: I0213 21:28:22.405597  104940 scope.go:115] "RemoveContainer" containerID="1532f94f1ac5ebbff1105a0f38cb0b42be6cf51f4ebb4bf87595a30670028793"
Feb 13 21:28:22 babylon k3s[104940]: I0213 21:28:22.405611  104940 scope.go:115] "RemoveContainer" containerID="2e270a94feba403f934b9820a8061c012395270c32a807e8870ffbe487509c9e"
Feb 13 21:28:22 babylon k3s[104940]: E0213 21:28:22.406585  104940 pod_workers.go:965] "Error syncing pod, skipping" err="[failed to \"StartContainer\" for \"liveness-probe\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=liveness-probe pod=csi-smb-node-sc4qs_kube-system(be4931ec-2e2d-4dfe-9290-6e6756061655)\", failed to \"StartContainer\" for \"node-driver-registrar\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=node-driver-registrar pod=csi-smb-node-sc4qs_kube-system(be4931ec-2e2d-4dfe-9290-6e6756061655)\", failed to \"StartContainer\" for \"smb\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=smb pod=csi-smb-node-sc4qs_kube-system(be4931ec-2e2d-4dfe-9290-6e6756061655)\"]" pod="kube-system/csi-smb-node-sc4qs" podUID=be4931ec-2e2d-4dfe-9290-6e6756061655
Feb 13 21:28:24 babylon k3s[104940]: I0213 21:28:24.418921  104940 scope.go:115] "RemoveContainer" containerID="c4c3b4b4db8feb3af97c6f9d3a583be4369667720d6609be5022b587394f4947"
Feb 13 21:28:24 babylon k3s[104940]: I0213 21:28:24.418973  104940 scope.go:115] "RemoveContainer" containerID="a4cefd43780b8bdeb43a6241a00186d46bdaf8fef8d62f18093ae809ca31e845"
Feb 13 21:28:24 babylon k3s[104940]: I0213 21:28:24.418997  104940 scope.go:115] "RemoveContainer" containerID="791f87d734bfbdbb98ff27a3cb809010e3212f6478c1e5c710749ac93e95a12b"
Feb 13 21:28:24 babylon k3s[104940]: E0213 21:28:24.420610  104940 pod_workers.go:965] "Error syncing pod, skipping" err="[failed to \"StartContainer\" for \"liveness-probe\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=liveness-probe pod=csi-nfs-node-wg22k_kube-system(785d1d2f-00e3-4055-87bb-f8b11e65ea9a)\", failed to \"StartContainer\" for \"node-driver-registrar\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=node-driver-registrar pod=csi-nfs-node-wg22k_kube-system(785d1d2f-00e3-4055-87bb-f8b11e65ea9a)\", failed to \"StartContainer\" for \"nfs\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=nfs pod=csi-nfs-node-wg22k_kube-system(785d1d2f-00e3-4055-87bb-f8b11e65ea9a)\"]" pod="kube-system/csi-nfs-node-wg22k" podUID=785d1d2f-00e3-4055-87bb-f8b11e65ea9a

Code:

sudo k3s kubectl describe node ix-truenas
Name:               ix-truenas
Roles:              control-plane,master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ix-truenas
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=true
                    node-role.kubernetes.io/master=true
                    openebs.io/nodeid=ix-truenas
                    openebs.io/nodename=ix-truenas
Annotations:        k3s.io/node-args:
                      ["server","--cluster-cidr","172.16.0.0/16","--cluster-dns","172.17.0.10","--data-dir","/mnt/SSD/ix-applications/k3s","--disable","servicel...
                    k3s.io/node-config-hash: SND67SWBONI7PZLWKX7CW5B7KUCU3EGDROUJ56ERKQXKMFV4DISQ====
                    k3s.io/node-env: {"K3S_DATA_DIR":"/mnt/SSD/ix-applications/k3s/data/38f8ae2f4b5f36a8da71d0524cbf39c437e8bb487753ff5a514da2b836b8e1d6"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Tue, 13 Feb 2024 19:03:12 -0500
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  ix-truenas
  AcquireTime:     <unset>
  RenewTime:       Tue, 13 Feb 2024 21:01:42 -0500
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Tue, 13 Feb 2024 20:57:01 -0500   Tue, 13 Feb 2024 19:03:11 -0500   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Tue, 13 Feb 2024 20:57:01 -0500   Tue, 13 Feb 2024 19:03:11 -0500   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Tue, 13 Feb 2024 20:57:01 -0500   Tue, 13 Feb 2024 19:03:11 -0500   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Tue, 13 Feb 2024 20:57:01 -0500   Tue, 13 Feb 2024 20:11:02 -0500   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  192.168.0.200
  Hostname:    ix-truenas
Capacity:
  cpu:                12
  ephemeral-storage:  228169600Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16269388Ki
  pods:               250
Allocatable:
  cpu:                12
  ephemeral-storage:  221963386706
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16269388Ki
  pods:               250
System Info:
  Machine ID:                 96d89c44618e4dafb0830387beee6271
  System UUID:                aed69294-69e8-7414-a4c8-d8bbc19ae5c1
  Boot ID:                    2a1c09de-9659-4652-9259-de0f11262b77
  Kernel Version:             6.1.63-debug+truenas
  OS Image:                   Debian GNU/Linux 12 (bookworm)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://Unknown
  Kubelet Version:            v1.26.6+k3s-e18037a7-dirty
  Kube-Proxy Version:         v1.26.6+k3s-e18037a7-dirty
PodCIDR:                      172.16.0.0/16
PodCIDRs:                     172.16.0.0/16
Non-terminated Pods:          (9 in total)
  Namespace                   Name                                    CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                    ------------  ----------  ---------------  -------------  ---
  kube-system                 csi-nfs-controller-7b74694749-95fsl     40m (0%)      0 (0%)      80Mi (0%)        900Mi (5%)     77m
  kube-system                 snapshot-controller-546868dfb4-t942t    10m (0%)      0 (0%)      20Mi (0%)        300Mi (1%)     77m
  kube-system                 snapshot-controller-546868dfb4-7r6vk    10m (0%)      0 (0%)      20Mi (0%)        300Mi (1%)     77m
  kube-system                 csi-smb-node-w5ct5                      30m (0%)      0 (0%)      60Mi (0%)        400Mi (2%)     118m
  kube-system                 coredns-59b4f5bbd5-hfphg                100m (0%)     0 (0%)      70Mi (0%)        170Mi (1%)     77m
  kube-system                 csi-nfs-node-8ctks                      30m (0%)      0 (0%)      60Mi (0%)        500Mi (3%)     118m
  kube-system                 openebs-zfs-controller-0                0 (0%)        0 (0%)      0 (0%)           0 (0%)         77m
  kube-system                 openebs-zfs-node-pqgpz                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         50m
  kube-system                 csi-smb-controller-7fbbb8fb6f-bmxff     30m (0%)      2 (16%)     60Mi (0%)        600Mi (3%)     77m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                250m (2%)   2 (16%)
  memory             370Mi (2%)  3170Mi (19%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:
  Type     Reason                          Age                  From             Message
  ----     ------                          ----                 ----             -------
  Normal   Starting                        118m                 kubelet          Starting kubelet.
  Warning  InvalidDiskCapacity             118m                 kubelet          invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory         118m (x2 over 118m)  kubelet          Node ix-truenas status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure           118m (x2 over 118m)  kubelet          Node ix-truenas status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID            118m (x2 over 118m)  kubelet          Node ix-truenas status is now: NodeHasSufficientPID
  Normal   NodePasswordValidationComplete  118m                 k3s-supervisor   Deferred node password secret validation complete
  Normal   NodeAllocatableEnforced         118m                 kubelet          Updated Node Allocatable limit across pods
  Normal   NodeReady                       118m                 kubelet          Node ix-truenas status is now: NodeReady
  Normal   RegisteredNode                  118m                 node-controller  Node ix-truenas event: Registered Node ix-truenas in Controller
  Normal   NodeHasNoDiskPressure           95m                  kubelet          Node ix-truenas status is now: NodeHasNoDiskPressure
  Normal   Starting                        95m                  kubelet          Starting kubelet.
  Warning  InvalidDiskCapacity             95m                  kubelet          invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory         95m                  kubelet          Node ix-truenas status is now: NodeHasSufficientMemory
  Normal   NodeHasSufficientPID            95m                  kubelet          Node ix-truenas status is now: NodeHasSufficientPID
  Normal   NodeNotReady                    95m                  kubelet          Node ix-truenas status is now: NodeNotReady
  Normal   NodeAllocatableEnforced         95m                  kubelet          Updated Node Allocatable limit across pods
  Normal   NodePasswordValidationComplete  95m                  k3s-supervisor   Deferred node password secret validation complete
  Normal   NodeReady                       95m                  kubelet          Node ix-truenas status is now: NodeReady
  Normal   RegisteredNode                  95m                  node-controller  Node ix-truenas event: Registered Node ix-truenas in Controller
  Normal   NodePasswordValidationComplete  50m                  k3s-supervisor   Deferred node password secret validation complete
  Normal   Starting                        50m                  kubelet          Starting kubelet.
  Warning  InvalidDiskCapacity             50m                  kubelet          invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory         50m                  kubelet          Node ix-truenas status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure           50m                  kubelet          Node ix-truenas status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID            50m                  kubelet          Node ix-truenas status is now: NodeHasSufficientPID
  Warning  Rebooted                        50m                  kubelet          Node ix-truenas has been rebooted, boot id: 2a1c09de-9659-4652-9259-de0f11262b77
  Normal   NodeNotReady                    50m                  kubelet          Node ix-truenas status is now: NodeNotReady
  Normal   NodeAllocatableEnforced         50m                  kubelet          Updated Node Allocatable limit across pods
  Normal   RegisteredNode                  50m                  node-controller  Node ix-truenas event: Registered Node ix-truenas in Controller
  Normal   NodeReady                       50m                  kubelet          Node ix-truenas status is now: NodeReady

LongCircle · Feb 13, 2024

In the console, I can also see a lot of veth deconnections. Message repeat every few seconds, with a different veth identifier:

Code:

Feb 13 22:06:46 babylon kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth9f34d111: link becomes ready
Feb 13 22:06:46 babylon kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Feb 13 22:06:46 babylon kernel: kube-bridge: port 3(veth9f34d111) entered blocking state
Feb 13 22:06:46 babylon kernel: kube-bridge: port 3(veth9f34d111) entered disabled state
Feb 13 22:06:46 babylon kernel: device veth9f34d111 entered promiscuous mode
Feb 13 22:06:46 babylon kernel: kube-bridge: port 3(veth9f34d111) entered blocking state
Feb 13 22:06:46 babylon kernel: kube-bridge: port 3(veth9f34d111) entered forwarding state

Mocoso · Mar 26, 2024

hunterjm said:
Yeah, it worked for me.

Steps:
1) Unset Pool (this shuts down kubernetes)
2) Restart
3) Delete ix-applications
4) Select Pool

I was NOT able to resolve with a simple "reboot" as a restart... I tried killing the nodes etc and that didnt work.. I finally was able to get it to work by doing a full cold boot as a restart (not just a warm boot) and selecting the latest truenas+debug in boot options... the fact a DNS issue causes it to get permanently hosed is a rather nasty bug... production level stuff will for sure go in VMs moving forward...

Nitrof · Mar 31, 2024

it does not help here... It was ok until I try to add the pool again. It get stuck at initialize apps service....

[EDIT]
Finally it worked by unset pool, dele ix-apps and reset pool... My server CPU is showing some age; it take 2 hours but it resolve it. Now everything is ok. all apps running.

Notice the problem here appear just after instaling truecharts...

Regards.

Nitrof

pangratt12345 · Apr 3, 2024

Hello,

In case someone stumbles upon this problem too, I have some information.
I had similar problems with Apps service. Apps were stuck in deploying state. Pods stuck in Pending state. Node not ready, etc.
The problems were resolved after I correctly configured the network configuration of TrueNAS and the settings of Apps/Kubernetes and after several restarts.

1. Network settings
Reference:

Apps stuck deploying

I've just migrated from Core to Scale 2 days ago. I thought this would just be a case of re-installing my plugins and copying over the config but every app I install is stuck in Deploying. I've waited on them for several hours. I've tried to delete the apps and start again to get something...

www.truenas.com

Apps stuck deploying

Hi, I've recently moved over to TrueNAS scale (TrueNAS-SCALE-22.02.3), main reason being I wanted access to all that container goodness and my better knowledge of linux vs FreeBSD. I have some level of experience working with Docker, writing my own compose files, using portainer etc. I have a...

www.truenas.com

There seem to be 2 ways to configure the network for TrueNAS.
1st - with DHCP turned on. This is the default after the installation of TrueNAS.
2nd - with a static LAN IP assigned to the TrueNAS device with turned off DHCP, like in the below link:

TrueNAS Scale: Basic Network Configuration - Tech Addressed

If you've setup a TrueNAS Scale server and intend to run applications, such as Pi-hole for example, on it, you'll need to ensure you have basic networking configured properly before you'll be able to do much in the way of installing apps. Here I've prepared a quick tutorial of how to modify the...

www.techaddressed.com

I attached screenshots at the bottom of this post showing correct network settings.

In network settings, it is important to set the correct Default Route (Gateway), which is the IP address of the router in the local network (LAN).
Default Route (router's IP) and network interface name and exact IP address of TrueNAS device can be obtained with this command:
ip route

Also, set the correct IP addresses of DNS servers.
for me 62.233.233.233 or 87.204.204.204 works fine for 1st option
or 1.1.1.1, 8.8.8.8, 9.9.9.9 works fine for 2nd option.

In Apps->Settings->Advanced Settings
Route interface (v4) and Route Gateway (v4) should be the same as in Network settings.
The Node IP probably should be 0.0.0.0. You can try typing the exact IP address of TrueNAS server here and see if that helps.

2. Simple restarting Kubernetes/Apps service in WebGUI

You can also try this in Apps->Settings like mentioned above but skipping restarting. This restarts Kubernetes/Apps service without restarting the TrueNAS server.
1) Unset Pool
2) Choose Pool

After restart You may need to wait some time after important pods get into the Running state. They often pull docker images at startup.

sudo k3s kubectl get events --all-namespaces
sudo k3s kubectl get pods --all-namespaces
sudo k3s kubectl describe pod Your_pod_name_here --namespace=namespace_name_here
sudo k3s kubectl logs Your_pod_name_here --namespace=namespace_name_here

Useful command to launch in another CLI window to observe in that window what is happening to pods
sudo k3s kubectl get pods --all-namespaces --watch

Also there is an important folder /etc/cni/net.d which has kubeconfig file for kube-router. It seems that this folder is automatically filled with important files after startup of TrueNAS apps service. Sometimes, due to some error this folder doesn't get filled and is empty which can be a cause of tainted ix-truenas node and stuck apps.

node.kubernetes.io/not-ready:NoSchedule
node.kubernetes.io/unschedulable:NoSchedule

ix-truenas node should be ready and registered and should have no taints, also /etc/cni/net.d folder should contain important files for apps like kubeconfig for kube-router. Apps won't run correctly if these prerequisites aren't fulfilled.
Taints of kubernetes nodes can with checked by describe node command

sudo k3s kubectl get node ix-truenas
sudo k3s kubectl describe node ix-truenas

The node description should be like below

Taints:             <none>
Unschedulable:      false

You can keep restarting TrueNAS server until this folder /etc/cni/net.d gets filled with important files by TrueNAS services and until everything else is okay.

3. Manual restart in CLI
I don't guarantee that this will help

All services installed in TrueNAS OS can be listed with this command.
sudo systemctl --type=service

You can check if there are any services that failed to run.
I think that there are 3 services that are important for Kubernetes/Apps to work properly. All these 3 services should be in a running state.
k3s, kube-router, cni-dhcp

sudo service cni-dhcp restart
sudo service k3s restart
sudo service kube-router restart
sudo systemctl daemon-reload
sudo systemctl restart middlewared

middlewared service is responsible for the WebGUI. So if WebGUI is stuck on some process it can be restarted with the above command.

sudo service kube-router status
sudo service k3s status
sudo journalctl -u kube-router
sudo journalctl -u k3s
sudo journalctl

End button to scroll down to the bottom of journalctl logs to see most recent events

Optionally, You can also start systemd-machined service.
sudo service systemd-machined start

You can restart all kubernetes deployments in all namespaces with this script, which can also be copied/pasted directly into CLI

Code:

for namespace in $(k3s kubectl get namespaces -o jsonpath="{.items[*].metadata.name}"); \
do \
    echo "Namespace: $namespace"; \
    echo "    restarting deployments in this namespace:"; \
    for deployment in $(k3s kubectl get deployments --namespace $namespace -o jsonpath="{.items[*].metadata.name}"); \
    do \
        echo "        deployment: $deployment"; \
        sudo k3s kubectl rollout restart deployment $deployment --namespace $namespace; \
    done \
done

You can delete all pods stuck in some ridiculous states with this script, which can also be copied/pasted directly into CLI

Code:

for namespace in $(k3s kubectl get namespaces -o jsonpath="{.items[*].metadata.name}"); \
do \
    echo "Deleting pods from namespace: $namespace"; \
    sudo k3s kubectl delete --all pods --force --namespace=$namespace --grace-period=0; \
done

After deleting all pods, new pods are automatically created by deployments. They sometimes pull new images, which can consume space when old images are no longer used. So to erase all old, unnecessary images You can run this command:
sudo k3s ctr images prune --all

Optionally You can can remove ix-truenas node before restarting kubernetes service for clean start. This node is recreated after k3s service restart.

sudo k3s kubectl cordon ix-truenas
sudo k3s kubectl drain ix-truenas --ignore-daemonsets --delete-emptydir-data
sudo k3s kubectl delete node ix-truenas

4. Reinstallation
If the problem still persists then there is also an option to wipe out all disks and reinstall TrueNAS. Can also try to install it on other computer or disks.

Important Announcement for the TrueNAS Community.

Apps Service won't start

hunterjm

Cadet

hunterjm

Cadet

hunterjm

Cadet

TheThirdReindeer

Cadet

hunterjm

Cadet

LongCircle

Cadet

LongCircle

Cadet

Mocoso

Cadet

Nitrof

Dabbler

pangratt12345

Cadet

Apps stuck deploying

Apps stuck deploying

TrueNAS Scale: Basic Network Configuration - Tech Addressed

Attachments

Similar threads