TrueNAS Scale App subsystem stopped working after botched upgrade and rollback.

etrigan63

Dabbler
Joined
Jul 6, 2021
Messages
18
The upgrade to 22.0.2.3 presented itself this morning and being a diligent admin, I applied it. The server ended up in a boot loop and had to rollback to 22.0.2.2.1. The App Subsystem will not launch any apps anymore. Running k3s kubectl get pods -A results in:
Code:
exodar% sudo k3s kubectl get pods -A
[sudo] password for guru: 
NAMESPACE            NAME                                        READY   STATUS              RESTARTS        AGE
ix-nzbget-movies     svclb-nzbget-movies-wkrcz                   0/1     ContainerCreating   0               4h19m
kube-system          coredns-d76bd69b-fgslm                      0/1     Error               0               4h20m
ix-minecraft         svclb-minecraft-minecraft-java-rw4q8        0/1     Error               0               4h20m
kube-system          openebs-zfs-controller-0                    0/5     Error               0               4h20m
ix-jellyfin          svclb-jellyfin-p4mmz                        0/1     Error               0               4h20m
ix-radarr            svclb-radarr-k2b85                          0/1     Error               0               4h20m
ix-nzbget-tv         svclb-nzbget-tv-hlnhl                       0/1     Error               0               4h20m
ix-heimdall          svclb-heimdall-k7lkl                        0/1     Error               0               4h20m
ix-minecraft         svclb-minecraft-minecraft-java-rcon-zg7b5   0/1     Error               0               4h19m
ix-ghost-echenique   svclb-ghost-echenique-2mcrt                 0/1     Error               0               4h19m
ix-filebrowser       svclb-filebrowser-fln77                     0/1     ExitCode:0          0               4h19m
ix-sonarr            svclb-sonarr-w7l97                          0/1     Error               0               4h19m
kube-system          amdgpu-device-plugin-daemonset-24f77        0/1     Error               0               4h19m
kube-system          nvidia-device-plugin-daemonset-nsgkz        0/1     ExitCode:0          0               4h19m
ix-traefik           svclb-traefik-xszkf                         0/1     Terminating         0               4h19m
ix-traefik           svclb-traefik-tcp-p92s8                     0/2     Terminating         0               4h20m
kube-system          openebs-zfs-node-shgj5                      1/2     CrashLoopBackOff    53 (4m9s ago)   4h19m


The apps report that:
Code:
network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized


I tried to reinstall
Code:
traefik
but there are two hung pods that won't go away preventing the install.

Am I hosed?
 

etrigan63

Dabbler
Joined
Jul 6, 2021
Messages
18
UPDATE: I have rebooted the server and the two troublesome pods went away. I reinstalled traefik from TrueCharts and they now seem to be coming up normally. However I am still hesitant about the 22.0.2.3 upgrade as that caused a boot loop.
 

etrigan63

Dabbler
Joined
Jul 6, 2021
Messages
18
Now I have this problem:
kube-system nvidia-device-plugin-daemonset-7zm4x 0/1 CrashLoopBackOff 8 (93s ago) 17m
which is preventing the jellyfin app from coming up.
 

etrigan63

Dabbler
Joined
Jul 6, 2021
Messages
18
I reapplied the upgrade and isolated the GPU (since I cannot use it yet). Everything is operational now.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Thanks for reporting the issues and sorry to hear of your problems.

The primary issue is the boot loop...we'll see if anyone else experiences it.
Can you actively report the hardware set-up
The combination of hardware and App set-up is likely to be the most unique thing about your system.
Our internal testing is only on a subset of potential hardware
 

etrigan63

Dabbler
Joined
Jul 6, 2021
Messages
18
Thanks for reporting the issues and sorry to hear of your problems.

The primary issue is the boot loop...we'll see if anyone else experiences it.
Can you actively report the hardware set-up
The combination of hardware and App set-up is likely to be the most unique thing about your system.
Our internal testing is only on a subset of potential hardware
The boot loop was caused by the traefik app. I switched to the TrueCharts version (which was newer) after a reboot to eliminate some stuck containers. After that it came right up. I was then able to apply the upgrade. There was an issue with my RTX 3050 GPU, but I isolated that for the time being (since you don't have the drivers for it installed).
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
The boot loop was caused by the traefik app. I switched to the TrueCharts version (which was newer) after a reboot to eliminate some stuck containers. After that it came right up. I was then able to apply the upgrade. There was an issue with my RTX 3050 GPU, but I isolated that for the time being (since you don't have the drivers for it installed).
Could it have been the combination of GPU and Traefik... ..
 

etrigan63

Dabbler
Joined
Jul 6, 2021
Messages
18
Could it have been the combination of GPU and Traefik... ..
Most likely. Any plans for improved Nvidia GPU support in Bluefin?

Also, my system is experiencing extremely slow performace during GRUB. The "Welcome to GRUB" message remains on the the screen for at least 90 seconds before displaying boot options. Is there a way to check if there is a problem there? My boot drive is a 512GB Nvme drive.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Most likely. Any plans for improved Nvidia GPU support in Bluefin?

Also, my system is experiencing extremely slow performace during GRUB. The "Welcome to GRUB" message remains on the the screen for at least 90 seconds before displaying boot options. Is there a way to check if there is a problem there? My boot drive is a 512GB Nvme drive.

Looking for more community involvement in GPU testing (and development)... Bluefin nightlies are available. This is the time to identify missing drivers etc.

We don't currently sell GPU-capable systems at iX, so there isn't an extensive "library" of systems in the standard QA process. We will improve, but community support is appreciated.

In particular, any assistance identifying/resolving Debian issues vs TrueNAS issues would be appreciated.
 
Top