k3s not starting after upgrade

zyrex · Dec 19, 2022

After upgrading to Bluefin, my Apps weren't running.
When going to Apps -> Installed Apps, it's blank and says:

Applications are not running

[View Catalog]

My pool is set and should be fine still.

After a bit I get this error in the alert-area:
Failed to configure PV/PVCs support: Cannot connect to host 127.0.0.1:6443 ssl:default [Connect call failed ('127.0.0.1', 6443)]

The k3s.service is not running when checking the CLI, the k3s_daemon.log is looping with some errors.
For example: k3s[371072]: E1219 14:18:17.994622 371072 kubelet.go:1397] "Failed to start ContainerManager" err="failed to initialize top level QOS containers: root container [kubepods] doesn't exist"

But I do not know kubernetes, so I shouldn't be digging too much here.

But everything else Truenas-wise seems fine, so not too bad.

browntiger · Dec 19, 2022

Go to your apps tab. And hit settings / Advanced settings. Post image of your kubernetes settings.

zyrex · Dec 19, 2022

I have tried changing to my local ip for Node IP, however that didn't change much in terms of working :)

danb35 · Dec 19, 2022

Same problem here, with the same alert. Back to Angelfish I go. Silly me for trying a new iX release within a month of its release.

mraw435 · Dec 19, 2022

Fill in Route v4 Interface (whichever interface is listed on you Network tab) and then add your router IP as the Gateway. Save and then reboot to apply changes. These are the steps that worked for me!

Daisuke · Dec 19, 2022

danb35 said:
Back to Angelfish I go.

Please stop complaining, Bluefin is amazing.

Where is the beer?

danb35 · Dec 19, 2022

See also https://www.truenas.com/community/t...t-and-now-kubernetes-fails-to-restart.106021/, just to try to link these together.

Daisuke said:
Bluefin is amazing.

If you say so. Its killing apps for me (and reverting to Angelfish didn't restore them, as in the thread linked above) doesn't do much for my opinion of the release. I really should have known better, too; iX seems incapable of getting a .0 release out the door without show-stopping bugs.

Daisuke · Dec 19, 2022

danb35 said:
If you say so. Its killing apps for me

I had literally zero issues upgrading, but I spent time reading prior upgrade. If you just upgrade directly as any user should, that's where issues will make surface. The thing is, Bluefin aligned with most Linux standards, which some were broken in Angelfish. Is a very good step forward, but users need to get re-aligned with the changes. Just to give you a quick example, local user UID now starts with 3000, instead of 1000. I created a long thread with all fixable issues, you should go through it when you have time to read it.

danb35 · Dec 19, 2022

Daisuke said:
Just to give you a quick example, local user UID now starts with 3000, instead of 1000.

That, um, doesn't align with most Linux standards. To the extent there is a standard, it's to start UIDs at 1000 or 1001.

Daisuke said:
I had literally zero issues upgrading, but I spent time reading prior upgrade.

I also spent time reading, both your post (which I note has been significantly revised since you posted it) and the release notes. There's nothing in the release notes that says you ought to blow away all your local users and recreate them with UIDs >= 3000. There's nothing in there that says you should create shares at the root of your pool (a recommendation you've now removed). There's nothing in there that says you ought to run the web UI as a non-root user (there's mention that you can, but not that you should). There's nothing in there that says you ought to have apps on a separate pool. Etc., etc., etc. In any event, as you note in the thread I linked earlier, your checklist doesn't appear to address the error that's the subject of this (and that) thread.

But as this seems to involve a bug, filed a ticket here:

Log in with Atlassian account

Log in to Jira, Confluence, and all other Atlassian Cloud products here. Not an Atlassian user? Sign up for free.

ixsystems.atlassian.net

...but even though I checked the box in the UI to attach a debug file, it didn't. Looks like another bug.

Daisuke · Dec 19, 2022

danb35 said:
That, um, doesn't align with most Linux standards.

I totally agree, should be 1000. However iX decided to change that, so we wonder why, because is not explained anywhere like you said very well. What if iX decides to use UID 1000 for something else internally. I found the UID issue yesterday, by pure accident, while helping an user fix his apps. This was a clean Bluefin install, I was like "Why the non-root user UID is 3000??" But they sure did fix the Debian .profile, which was broken in Angelfish. I don't have answers to Why's but at least people are aware of these non-documented changes. My thread has a fix for the OP, see Kubernetes Service section. Let me know if that fixes the issue. I've been revising it every other day with new findings, like you said. Just trying to help.

danb35 · Dec 19, 2022

Daisuke said:
My thread has a fix for the OP, see Kubernetes Service section.

Ah, though I read that post, I'd overlooked that section. But unset pool/reboot/choose pool doesn't resolve the issue.

Daisuke · Dec 19, 2022

danb35 said:
But unset pool/reboot/choose pool doesn't resolve the issue

describe node nodename will tell you exactly where is the issue:

Code:

# k3s kubectl get nodes
NAME         STATUS   ROLES                  AGE    VERSION
ix-truenas   Ready    control-plane,master   165d   v1.25.3+k3s-9afcd6b9-dirty
# k3s kubectl describe node ix-truenas

About the non-root local user, iX share into this thread some details, at the end. I think they plan to announce that gradually, they just wanted Bluefin out probably to respect the internal releases schedule.

danb35 · Dec 19, 2022

Code:

root@freenas2[~]# k3s kubectl get nodes
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?

Daisuke · Dec 19, 2022

If you reboot the server, I presume you have the same error, Kubernetes service not running. Do you have two pools? I fixed the issue for someone by migrating ix-applications dataset and apps to a different pool, rebooted the server then migrated them back. If this fixes it for you, I'll add the info to the thread.

danb35 · Dec 19, 2022

Daisuke said:
If you reboot the server, I presume you have the same error, Kubernetes service not running.

Correct.

Daisuke said:
Do you have two pools?

I do. Migrating the stuff back and forth sounds awfully hacky, but I guess it's possible. Is there a gui-fied way to do it, or is it all zfs snapshot -R tank/ix-applications@foo; zfs send tank/ix-applications@foo | zfs recv bar/ix-applications@...?

Daisuke · Dec 19, 2022

danb35 said:
Is there a gui-fied way to do it

Yes that's how I did it, through Bluefin UI. Apps > Settings > New Pool:

Once migrated to new pool, you need to reboot the server, your Kubernetes service should start without issues. Next, you move everything back to previous pool, the same way. Are you running your software pool on dual SSDs?

danb35 · Dec 19, 2022

Daisuke said:
Apps > Settings > New Pool:

That's Choose Pool, right? But that requires the Kubernetes service to be running:

Daisuke said:
Are you running your software pool on dual SSDs?

I don't have a software pool; everything but boot in this system is on spinners. But I have a second pool of spinners I could move this to, and I'll probably be setting up a mirrored SSD pool for this purpose soon.

Daisuke · Dec 19, 2022

danb35 said:
But I have a second pool of spinners I could move this to.

That is fine. Since you already tried to Unset Pool/Reboot (is important to reboot)/Select Pool procedure without success, try these commands and let me know the result:

Code:

# systemctl restart k3s
# systemctl status k3s

As last resort, I suggest to delete the ix-applications dataset and reboot. I know you're not going to like it because it will destroy all your apps, but I guess you are at a point of no return anyways.

danb35 · Dec 19, 2022

That looks promising:

Code:

root@freenas2[~]# systemctl restart k3s
root@freenas2[~]# systemctl status k3s 
● k3s.service - Lightweight Kubernetes
     Loaded: loaded (/lib/systemd/system/k3s.service; disabled; vendor preset: disabled)
     Active: activating (start) since Mon 2022-12-19 21:43:05 EST; 3s ago
       Docs: https://k3s.io
    Process: 4075256 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 4075718 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
   Main PID: 4076084 (k3s-server)
      Tasks: 114
     Memory: 551.6M
        CPU: 10.872s
     CGroup: /system.slice/k3s.service
             └─4076084 /usr/local/bin/k3s server

Dec 19 21:43:08 freenas2 k3s[4076084]: W1219 21:43:08.506047 4076084 genericapiserver.go:656] Skipping API flowcontrol.apiserver.k8>
Dec 19 21:43:08 freenas2 k3s[4076084]: W1219 21:43:08.513070 4076084 genericapiserver.go:656] Skipping API apps/v1beta2 because it >
Dec 19 21:43:08 freenas2 k3s[4076084]: W1219 21:43:08.513089 4076084 genericapiserver.go:656] Skipping API apps/v1beta1 because it >
Dec 19 21:43:08 freenas2 k3s[4076084]: W1219 21:43:08.516608 4076084 genericapiserver.go:656] Skipping API admissionregistration.k8>
Dec 19 21:43:08 freenas2 k3s[4076084]: W1219 21:43:08.518986 4076084 genericapiserver.go:656] Skipping API events.k8s.io/v1beta1 be>
Dec 19 21:43:08 freenas2 k3s[4076084]: I1219 21:43:08.520177 4076084 plugins.go:158] Loaded 12 mutating admission controller(s) suc>
Dec 19 21:43:08 freenas2 k3s[4076084]: I1219 21:43:08.520197 4076084 plugins.go:161] Loaded 11 validating admission controller(s) s>
Dec 19 21:43:08 freenas2 k3s[4076084]: W1219 21:43:08.541155 4076084 genericapiserver.go:656] Skipping API apiregistration.k8s.io/v>
Dec 19 21:43:08 freenas2 k3s[4076084]: I1219 21:43:08.865386 4076084 trace.go:205] Trace[1761432602]: "List(recursive=true) etcd3" >
Dec 19 21:43:08 freenas2 k3s[4076084]: Trace[1761432602]: [805.863607ms] [805.863607ms] END

...but the migration still fails, with the same error. Oh, I see it's "activating", not actually "active." And when I wait a bit and check status again, I get:

Code:

root@freenas2[~]# systemctl status k3s
● k3s.service - Lightweight Kubernetes
     Loaded: loaded (/lib/systemd/system/k3s.service; disabled; vendor preset: disabled)
     Active: activating (auto-restart) (Result: exit-code) since Mon 2022-12-19 21:46:46 EST; 2s ago
       Docs: https://k3s.io
    Process: 966630 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 966809 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
    Process: 966832 ExecStart=/usr/local/bin/k3s server --flannel-backend=none --disable=traefik,metrics-server,local-storage --dis>
   Main PID: 966832 (code=exited, status=1/FAILURE)
        CPU: 22.162s

Dec 19 21:46:46 freenas2 systemd[1]: Failed to start Lightweight Kubernetes.
Dec 19 21:46:46 freenas2 systemd[1]: k3s.service: Consumed 22.162s CPU time.

...and the relevant error seems to be:

Dec 19 21:50:16 freenas2 k3s[2133812]: E1219 21:50:16.469286 2133812 kubelet.go:1397] "Failed to start ContainerManager" err="failed to initialize top level QOS containers: root container [kubepods] doesn't exist"

Daisuke · Dec 19, 2022

danb35 said:
root container [kubepods] doesn't exist

You can either open a Jira ticket and wait for a response or delete ix-applications dataset. I never had to delete the dataset so we are entering unknown territory. Are you sure that unsetting the pool in UI and rebooting does not fix the issue? If anyone else has more input, please update us.

Important Announcement for the TrueNAS Community.

k3s not starting after upgrade

Dabbler

Applications are not running​

Explorer

Dabbler

Hall of Famer

Cadet

Contributor

Hall of Famer

Contributor

Hall of Famer

Contributor

Hall of Famer

Contributor

Hall of Famer

Contributor

Hall of Famer

Contributor

Hall of Famer

Contributor

Hall of Famer

Contributor

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "k3s not starting after upgrade"

Similar threads

Applications are not running