k3s not starting after upgrade

zyrex

Dabbler
Joined
Nov 20, 2019
Messages
17
After upgrading to Bluefin, my Apps weren't running.
When going to Apps -> Installed Apps, it's blank and says:

Applications are not running​

[View Catalog]

My pool is set and should be fine still.

After a bit I get this error in the alert-area:
Failed to configure PV/PVCs support: Cannot connect to host 127.0.0.1:6443 ssl:default [Connect call failed ('127.0.0.1', 6443)]

The k3s.service is not running when checking the CLI, the k3s_daemon.log is looping with some errors.
For example: k3s[371072]: E1219 14:18:17.994622 371072 kubelet.go:1397] "Failed to start ContainerManager" err="failed to initialize top level QOS containers: root container [kubepods] doesn't exist"

But I do not know kubernetes, so I shouldn't be digging too much here.

But everything else Truenas-wise seems fine, so not too bad.
 

browntiger

Explorer
Joined
Oct 18, 2022
Messages
58
Go to your apps tab. And hit settings / Advanced settings. Post image of your kubernetes settings.
 

zyrex

Dabbler
Joined
Nov 20, 2019
Messages
17
1671466892075.png


I have tried changing to my local ip for Node IP, however that didn't change much in terms of working :)
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Same problem here, with the same alert. Back to Angelfish I go. Silly me for trying a new iX release within a month of its release.
 

mraw435

Cadet
Joined
Sep 18, 2022
Messages
8
Fill in Route v4 Interface (whichever interface is listed on you Network tab) and then add your router IP as the Gateway. Save and then reboot to apply changes. These are the steps that worked for me!
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

Daisuke

Contributor
Joined
Jun 23, 2011
Messages
1,041
If you say so. Its killing apps for me
I had literally zero issues upgrading, but I spent time reading prior upgrade. If you just upgrade directly as any user should, that's where issues will make surface. The thing is, Bluefin aligned with most Linux standards, which some were broken in Angelfish. Is a very good step forward, but users need to get re-aligned with the changes. Just to give you a quick example, local user UID now starts with 3000, instead of 1000. I created a long thread with all fixable issues, you should go through it when you have time to read it.
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Just to give you a quick example, local user UID now starts with 3000, instead of 1000.
That, um, doesn't align with most Linux standards. To the extent there is a standard, it's to start UIDs at 1000 or 1001.
I had literally zero issues upgrading, but I spent time reading prior upgrade.
I also spent time reading, both your post (which I note has been significantly revised since you posted it) and the release notes. There's nothing in the release notes that says you ought to blow away all your local users and recreate them with UIDs >= 3000. There's nothing in there that says you should create shares at the root of your pool (a recommendation you've now removed). There's nothing in there that says you ought to run the web UI as a non-root user (there's mention that you can, but not that you should). There's nothing in there that says you ought to have apps on a separate pool. Etc., etc., etc. In any event, as you note in the thread I linked earlier, your checklist doesn't appear to address the error that's the subject of this (and that) thread.

But as this seems to involve a bug, filed a ticket here:

...but even though I checked the box in the UI to attach a debug file, it didn't. Looks like another bug.
 

Daisuke

Contributor
Joined
Jun 23, 2011
Messages
1,041
That, um, doesn't align with most Linux standards.
I totally agree, should be 1000. However iX decided to change that, so we wonder why, because is not explained anywhere like you said very well. What if iX decides to use UID 1000 for something else internally. I found the UID issue yesterday, by pure accident, while helping an user fix his apps. This was a clean Bluefin install, I was like "Why the non-root user UID is 3000??" But they sure did fix the Debian .profile, which was broken in Angelfish. I don't have answers to Why's but at least people are aware of these non-documented changes. My thread has a fix for the OP, see Kubernetes Service section. Let me know if that fixes the issue. I've been revising it every other day with new findings, like you said. Just trying to help.
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
My thread has a fix for the OP, see Kubernetes Service section.
Ah, though I read that post, I'd overlooked that section. But unset pool/reboot/choose pool doesn't resolve the issue.
 

Daisuke

Contributor
Joined
Jun 23, 2011
Messages
1,041
But unset pool/reboot/choose pool doesn't resolve the issue
describe node nodename will tell you exactly where is the issue:
Code:
# k3s kubectl get nodes
NAME         STATUS   ROLES                  AGE    VERSION
ix-truenas   Ready    control-plane,master   165d   v1.25.3+k3s-9afcd6b9-dirty
# k3s kubectl describe node ix-truenas

About the non-root local user, iX share into this thread some details, at the end. I think they plan to announce that gradually, they just wanted Bluefin out probably to respect the internal releases schedule.
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Code:
root@freenas2[~]# k3s kubectl get nodes
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
If you reboot the server, I presume you have the same error, Kubernetes service not running.
Correct.
Do you have two pools?
I do. Migrating the stuff back and forth sounds awfully hacky, but I guess it's possible. Is there a gui-fied way to do it, or is it all zfs snapshot -R tank/ix-applications@foo; zfs send tank/ix-applications@foo | zfs recv bar/ix-applications@...?
 

Daisuke

Contributor
Joined
Jun 23, 2011
Messages
1,041
Is there a gui-fied way to do it
Yes that's how I did it, through Bluefin UI. Apps > Settings > New Pool:

1671493191260.png


Once migrated to new pool, you need to reboot the server, your Kubernetes service should start without issues. Next, you move everything back to previous pool, the same way. Are you running your software pool on dual SSDs?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Apps > Settings > New Pool:
That's Choose Pool, right? But that requires the Kubernetes service to be running:
1671502923554.png

Are you running your software pool on dual SSDs?
I don't have a software pool; everything but boot in this system is on spinners. But I have a second pool of spinners I could move this to, and I'll probably be setting up a mirrored SSD pool for this purpose soon.
 

Daisuke

Contributor
Joined
Jun 23, 2011
Messages
1,041
But I have a second pool of spinners I could move this to.
That is fine. Since you already tried to Unset Pool/Reboot (is important to reboot)/Select Pool procedure without success, try these commands and let me know the result:
Code:
# systemctl restart k3s
# systemctl status k3s

As last resort, I suggest to delete the ix-applications dataset and reboot. I know you're not going to like it because it will destroy all your apps, but I guess you are at a point of no return anyways.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
That looks promising:
Code:
root@freenas2[~]# systemctl restart k3s
root@freenas2[~]# systemctl status k3s 
● k3s.service - Lightweight Kubernetes
     Loaded: loaded (/lib/systemd/system/k3s.service; disabled; vendor preset: disabled)
     Active: activating (start) since Mon 2022-12-19 21:43:05 EST; 3s ago
       Docs: https://k3s.io
    Process: 4075256 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 4075718 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
   Main PID: 4076084 (k3s-server)
      Tasks: 114
     Memory: 551.6M
        CPU: 10.872s
     CGroup: /system.slice/k3s.service
             └─4076084 /usr/local/bin/k3s server

Dec 19 21:43:08 freenas2 k3s[4076084]: W1219 21:43:08.506047 4076084 genericapiserver.go:656] Skipping API flowcontrol.apiserver.k8>
Dec 19 21:43:08 freenas2 k3s[4076084]: W1219 21:43:08.513070 4076084 genericapiserver.go:656] Skipping API apps/v1beta2 because it >
Dec 19 21:43:08 freenas2 k3s[4076084]: W1219 21:43:08.513089 4076084 genericapiserver.go:656] Skipping API apps/v1beta1 because it >
Dec 19 21:43:08 freenas2 k3s[4076084]: W1219 21:43:08.516608 4076084 genericapiserver.go:656] Skipping API admissionregistration.k8>
Dec 19 21:43:08 freenas2 k3s[4076084]: W1219 21:43:08.518986 4076084 genericapiserver.go:656] Skipping API events.k8s.io/v1beta1 be>
Dec 19 21:43:08 freenas2 k3s[4076084]: I1219 21:43:08.520177 4076084 plugins.go:158] Loaded 12 mutating admission controller(s) suc>
Dec 19 21:43:08 freenas2 k3s[4076084]: I1219 21:43:08.520197 4076084 plugins.go:161] Loaded 11 validating admission controller(s) s>
Dec 19 21:43:08 freenas2 k3s[4076084]: W1219 21:43:08.541155 4076084 genericapiserver.go:656] Skipping API apiregistration.k8s.io/v>
Dec 19 21:43:08 freenas2 k3s[4076084]: I1219 21:43:08.865386 4076084 trace.go:205] Trace[1761432602]: "List(recursive=true) etcd3" >
Dec 19 21:43:08 freenas2 k3s[4076084]: Trace[1761432602]: [805.863607ms] [805.863607ms] END

...but the migration still fails, with the same error. Oh, I see it's "activating", not actually "active." And when I wait a bit and check status again, I get:
Code:
root@freenas2[~]# systemctl status k3s
● k3s.service - Lightweight Kubernetes
     Loaded: loaded (/lib/systemd/system/k3s.service; disabled; vendor preset: disabled)
     Active: activating (auto-restart) (Result: exit-code) since Mon 2022-12-19 21:46:46 EST; 2s ago
       Docs: https://k3s.io
    Process: 966630 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 966809 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
    Process: 966832 ExecStart=/usr/local/bin/k3s server --flannel-backend=none --disable=traefik,metrics-server,local-storage --dis>
   Main PID: 966832 (code=exited, status=1/FAILURE)
        CPU: 22.162s

Dec 19 21:46:46 freenas2 systemd[1]: Failed to start Lightweight Kubernetes.
Dec 19 21:46:46 freenas2 systemd[1]: k3s.service: Consumed 22.162s CPU time.

...and the relevant error seems to be: Dec 19 21:50:16 freenas2 k3s[2133812]: E1219 21:50:16.469286 2133812 kubelet.go:1397] "Failed to start ContainerManager" err="failed to initialize top level QOS containers: root container [kubepods] doesn't exist"
 

Daisuke

Contributor
Joined
Jun 23, 2011
Messages
1,041
root container [kubepods] doesn't exist
You can either open a Jira ticket and wait for a response or delete ix-applications dataset. I never had to delete the dataset so we are entering unknown territory. Are you sure that unsetting the pool in UI and rebooting does not fix the issue? If anyone else has more input, please update us.

1671506678849.png
 
Top