All apps stuck deploying/terminating

Tywele

Cadet
Joined
Jul 28, 2023
Messages
3
Hello,

I've already asked the truecharts Discord for help but after some chatting they send me here to get help since for them it didn't look like a problem with their apps.

Yesterday I noticed that one of my pools (Pool 1: see end of post) had a checksum error in one file which made my pool unhealthy. I ran all SMART tests (short/long) to see if the drives had any problems but nothing turned up so I assumed maybe the SATA cables were faulty so I switched them out (I used really cheap ones so I guess it's likely that they are at fault). I'm not sure if the unhealthy pool is related but according to my reporting view you can see that a few days ago CPU usage increased constantly until it always ran at almost 100%.

1694279984316.png


I saw that all my apps were in the deploying state, so I tried stopping them which didn't quite work since the CPU usage didn't really go down after that and some of them seemingly can't be stopped.

1694280071662.png


I ran
Code:
k3s kubectl get pods -A
in the shell and it returned:
Code:
root@truenas[~]# k3s kubectl get pods -A
NAMESPACE        NAME                               READY   STATUS        RESTARTS   AGE
ix-nextcloud     nextcloud-nginx-7759fc8999-7lqw7   0/1     Terminating   0          25d
ix-nextcloud     nextcloud-redis-0                  0/1     Terminating   0          25d
ix-blocky        blocky-redis-0                     0/1     Terminating   0          24d
kube-system      openebs-zfs-controller-0           0/5     Pending       0          3h36m
kube-system      coredns-75fc8f8fff-zv7b9           0/1     Pending       0          3h49m
ix-cloudflared   cloudflared-b899bc5d7-vdrhd        0/1     Pending       0          3h44m
ix-traefik       traefik-97dcf4c59-f2f98            0/1     Pending       0          3h26m
ix-nextcloud     nextcloud-notify-877c89bcb-lhnpl   0/1     Terminating   0          25d


Which looks like to me that some of the apps are still stuck terminating, but even though I rebooted the system multiple times now the state of the apps is not changing.

My TrueNAS version is TrueNAS-SCALE-22.12.3.3 and my system has the following components:
  • CPU: Intel i3-7100
  • Motherboard: ASRock Z270M-ITX/ac
  • Case: Fractal Node 304
  • RAM: 2x8GB DDR4 Corsair Vengeance LPX RAM
  • Boot Drive: 128 GB Samsung SSD 830 Evo
  • Pool 1 (Storage): 2x 4TB Seagate Ironwolf 3.5" HDDs
  • Pool 2 (App Pool): Crucial P3 NVMe SSD 500GB
  • PSU: beQuiet Pure Power 11 600W
  • CPU Cooler: Noctua NH L9i
Can someone tell me how I get my apps running again?
And as a small extra how I can get my pool healthy again after clearing the error with `zpool clear`?
 

uberthoth

Dabbler
Joined
Mar 15, 2022
Messages
11
I am seeing the exact same behavior here. There are some older issues that are very similar, e.g. https://www.truenas.com/community/t...eady-everything-is-stuck-or-something.102239/

Which I have noticed that I also have this same warning in my node description,

Ready False Mon, 11 Sep 2023 10:26:11 -0500 Mon, 11 Sep 2023 06:35:56 -0500 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
I don't know if this applies to TrueCharts apps only or to the iX ones as well, but in my case (only TrueCharts) this worked wonders:

 
Top