Tywele
Cadet
- Joined
- Jul 28, 2023
- Messages
- 3
Hello,
I've already asked the truecharts Discord for help but after some chatting they send me here to get help since for them it didn't look like a problem with their apps.
Yesterday I noticed that one of my pools (Pool 1: see end of post) had a checksum error in one file which made my pool unhealthy. I ran all SMART tests (short/long) to see if the drives had any problems but nothing turned up so I assumed maybe the SATA cables were faulty so I switched them out (I used really cheap ones so I guess it's likely that they are at fault). I'm not sure if the unhealthy pool is related but according to my reporting view you can see that a few days ago CPU usage increased constantly until it always ran at almost 100%.
I saw that all my apps were in the deploying state, so I tried stopping them which didn't quite work since the CPU usage didn't really go down after that and some of them seemingly can't be stopped.
I ran in the shell and it returned:
Which looks like to me that some of the apps are still stuck terminating, but even though I rebooted the system multiple times now the state of the apps is not changing.
My TrueNAS version is TrueNAS-SCALE-22.12.3.3 and my system has the following components:
And as a small extra how I can get my pool healthy again after clearing the error with `zpool clear`?
I've already asked the truecharts Discord for help but after some chatting they send me here to get help since for them it didn't look like a problem with their apps.
Yesterday I noticed that one of my pools (Pool 1: see end of post) had a checksum error in one file which made my pool unhealthy. I ran all SMART tests (short/long) to see if the drives had any problems but nothing turned up so I assumed maybe the SATA cables were faulty so I switched them out (I used really cheap ones so I guess it's likely that they are at fault). I'm not sure if the unhealthy pool is related but according to my reporting view you can see that a few days ago CPU usage increased constantly until it always ran at almost 100%.
I saw that all my apps were in the deploying state, so I tried stopping them which didn't quite work since the CPU usage didn't really go down after that and some of them seemingly can't be stopped.
I ran
Code:
k3s kubectl get pods -A
Code:
root@truenas[~]# k3s kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE ix-nextcloud nextcloud-nginx-7759fc8999-7lqw7 0/1 Terminating 0 25d ix-nextcloud nextcloud-redis-0 0/1 Terminating 0 25d ix-blocky blocky-redis-0 0/1 Terminating 0 24d kube-system openebs-zfs-controller-0 0/5 Pending 0 3h36m kube-system coredns-75fc8f8fff-zv7b9 0/1 Pending 0 3h49m ix-cloudflared cloudflared-b899bc5d7-vdrhd 0/1 Pending 0 3h44m ix-traefik traefik-97dcf4c59-f2f98 0/1 Pending 0 3h26m ix-nextcloud nextcloud-notify-877c89bcb-lhnpl 0/1 Terminating 0 25d
Which looks like to me that some of the apps are still stuck terminating, but even though I rebooted the system multiple times now the state of the apps is not changing.
My TrueNAS version is TrueNAS-SCALE-22.12.3.3 and my system has the following components:
- CPU: Intel i3-7100
- Motherboard: ASRock Z270M-ITX/ac
- Case: Fractal Node 304
- RAM: 2x8GB DDR4 Corsair Vengeance LPX RAM
- Boot Drive: 128 GB Samsung SSD 830 Evo
- Pool 1 (Storage): 2x 4TB Seagate Ironwolf 3.5" HDDs
- Pool 2 (App Pool): Crucial P3 NVMe SSD 500GB
- PSU: beQuiet Pure Power 11 600W
- CPU Cooler: Noctua NH L9i
And as a small extra how I can get my pool healthy again after clearing the error with `zpool clear`?