[22.02.4 upgrade] Breaking k3s again

gbysec

Cadet
Joined
Oct 3, 2022
Messages
5
Hi !
I'm fairly new in the community, I've built my own NAS last summer and went straight away with TrueNAS SCALE, so far, extremely happy to use k3s and Truecharts !

The only big problem I've got, to be honest, is the upgrade system. I don't know how many times this has happened already, but probably 4 or 5 times, my whole k3s gets corrupted.
Orphaned pods, completely lost configurations (volume mounts mostly)... it's getting a bit annoying, and there seems to be no "simple" way of dumping all my app configurations to a backup at least, in case I need to redeploy all of them everytime.

Either a forced reboot (because of middlewared's memory leak, that was fixed a version or two ago I think), or an upgrade, will always corrupt the cluster.
It's also the openebs-zfs-controller-0 being completely broken sometimes, I have no clue what to do sometimes, apart from wiping everything away, which includes Plex, Sonarr, Radarr, Qbittorrent, Flood, Jackett, a Valheim server, my Homeassistant instance... you see where I'm going.

Jun 05 23:07:04 truenas k3s[390079]: E0605 23:07:04.934764 390079 kubelet.go:1431] "Failed to start ContainerManager" err="invalid kernel flag: vm/overcommit_memory, expected value: 1, actual value: 0"

Sometimes it was also

kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name zfs.csi.openebs.io not found in the list of registered CSI drivers



Another issue I've experienced is the slow storage explosion because of snapshots

 

gbysec

Cadet
Joined
Oct 3, 2022
Messages
5
To add maybe some hint of the issue: I noticed that now the apps have deprecated their old service type, and it forces you to use a new load balancer service type (not sure what was mine previously)

The problem seems that my services keep running even when I stop the apps, so I suspect that services are kept from the previous app configuration and not migrated over. I noticed that because my sonarr couldn’t connect to qbittorrent which is not the case usually, I just ping qbittorrent.ix-qbittorrent:10095 and it will work

Maybe it’s yet another broken deployment because of the previous services, I don’t know, but I’d rather have all the pods and services removed, just save my app configs and redeploy them from scratch, they will have the same folders in their host volumes anyways
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
To add maybe some hint of the issue: I noticed that now the apps have deprecated their old service type, and it forces you to use a new load balancer service type (not sure what was mine previously)

The problem seems that my services keep running even when I stop the apps, so I suspect that services are kept from the previous app configuration and not migrated over. I noticed that because my sonarr couldn’t connect to qbittorrent which is not the case usually, I just ping qbittorrent.ix-qbittorrent:10095 and it will work

Maybe it’s yet another broken deployment because of the previous services, I don’t know, but I’d rather have all the pods and services removed, just save my app configs and redeploy them from scratch, they will have the same folders in their host volumes anyways

thanks for reporting you experiences...

1st the snapshot issue is well known and is intended to be fixed in Bluefin. The fix is not complete yet.

On the K3s issues, there can be several source of problems:

1) Applications themselves
2) Race conditions at boot-up... these should be resolved by software, but sometime they are only evident in specific setups
3) Software updates bugs.. again might be specific to a config

Assuming you can power down and reboot reliably... is there any chance you can roll back and replicate the issue.
If so, please report-a-bug.
 

stavros-k

Patron
Joined
Dec 26, 2020
Messages
231
To add maybe some hint of the issue: I noticed that now the apps have deprecated their old service type, and it forces you to use a new load balancer service type (not sure what was mine previously)
The deprecated service type is just a label change. You can switch to LoadBalancer and it will still be functioning as before.
It's just a change that will consolidate things in the UI. Nothing changes in how it works.

Since you are seeing the Deprecated label, means you previously was using the Simple type. Which was in fact LoadBalacer.

Service Pods (svclb) do stay there always, even if app is stopped.

You can view the details about this change in our announcements, either on discord or on twitter, and other platforms we post it.
 

gbysec

Cadet
Joined
Oct 3, 2022
Messages
5
Yes I figured it was the same service type, but still, a lot of orphans came out of nowhere and broke all the apps unfortunately. I wiped everything clean, unset the pool and set it back, did a docker prune system etc... I hope this won't come back again, I have no clue what causes that...

About the LoadBalancer service type, did it change the hostnames for each pod/app ? I'm using the NAS ip to connect Radarr to Qbittorrent now, but previously i used <app_name>.ix-<app_name> as a hostname and it would be reachable, but not anymore.
 

stavros-k

Patron
Joined
Dec 26, 2020
Messages
231
About the LoadBalancer service type, did it change the hostnames for each pod/app ? I'm using the NAS ip to connect Radarr to Qbittorrent now, but previously i used <app_name>.ix-<app_name> as a hostname and it would be reachable, but not anymore.
No, nothing changed, it's only a label change in the UI.
It does missing a part (svc.cluster.local) on the FQDN you mentioned. While sometimes it works without it, it's not guaranteed.
You can use the generator at the bottom of the page.

 

gbysec

Cadet
Joined
Oct 3, 2022
Messages
5
Thanks a lot for the tool ! I think I just noticed something, I couldn't believe my eyes... check Qbittorrent in the charts you have... I have Truecharts and the official ones, and the official one has a typo, missing an "r" in qbittorRent !
 

stavros-k

Patron
Joined
Dec 26, 2020
Messages
231

truecharts

Guru
Joined
Aug 19, 2021
Messages
788
To add maybe some hint of the issue: I noticed that now the apps have deprecated their old service type, and it forces you to use a new load balancer service type (not sure what was mine previously)

This is TrueCharts specific and not related to iX Systems or TrueNAS.
We're unrelated and do not actively provide support here for issues like these.
But just to be clear: As far as we're aware, and as stated in our announcent the depricated types thing is non-destructive at this time.
At least not related to your issue.

The problem seems that my services keep running even when I stop the apps, so I suspect that services are kept from the previous app configuration and not migrated over. I noticed that because my sonarr couldn’t connect to qbittorrent which is not the case usually, I just ping qbittorrent.ix-qbittorrent:10095 and it will work

Stopping Apps does not remove or stop kubernetes services. "Stopping" does not even exists in kubernetes, it's an iX invention that means "scaling pods to 0".

If you do not want the svclb pods to always keep present, you could use metallb which is our go-to adviced alternative loadbalancer. Although services still will not get deleted.
 
Top