App Service fails to lookup bridge interface during boot after upgrade to TNS 23.10.1

RARgames · Mar 15, 2024

Here is my quick fix for it:
1) Access shell
2) crontab -e
3) Add
@Reboot sleep 60 && cli -c 'app kubernetes update configure_gpus=false' && cli -c 'app kubernetes update configure_gpus=true
at the end of the file, save and reboot (apps service will start correctly now).

Also I tried system settings/advanced:
- init scripts - not working for some reason
- cron jobs - cannot configure reboot cron job using a menu

RARgames · Mar 15, 2024

Correct command:

@reboot sleep 60 && cli -c 'app kubernetes update configure_gpus=false' && cli -c 'app kubernetes update configure_gpus=true'

MrCaspan · Mar 15, 2024

I am having the same issue

Just as an FYI for everyone the GPU setting has nothing to do with the issue at hand. The issue seems to be that the bridge interface is not ready when the kubernetes engin starts up. The reson why people are suggesting to uncheck the GPU settings is becasue by unchecking the GPU support it restarts the app engine (kubernetes) by the time you do this your bridge interface has had enough time to get up and running so kubernetes can successfuly start. BUT you have to check it again to put your settings back to its original settings. Basically you are restarting the kubernetes service twice by doing this. no real harm in that but it takes some time to restart it twice. This is why adding a cron job like RARgames in the post above mentions is becuase the cron job is doing this for you 60 seconds after a reboot.

So I would argue that you should just run the command to start the service if you are using a cron job. I know nothing about the kubernetes service so I am not sure what that commnad would be but it seems the better thing to do here. OR if in the GUI you can do this faster by going to `Apps > Settings > Unset Pool` this stops the kubernetes engine (if it was running but its not so its almost instant) then select `Choose Pool` and select your app pool again which starts the kubernetes engine.

I am awaiting a fix as well! I will post this thread on the DIscord for other users to read!

PyCoder · Mar 15, 2024

MrCaspan said:
I am having the same issue

Just as an FYI for everyone the GPU setting has nothing to do with the issue at hand. The issue seems to be that the bridge interface is not ready when the kubernetes engin starts up. The reson why people are suggesting to uncheck the GPU settings is becasue by unchecking the GPU support it restarts the app engine (kubernetes) by the time you do this your bridge interface has had enough time to get up and running so kubernetes can successfuly start. BUT you have to check it again to put your settings back to its original settings. Basically you are restarting the kubernetes service twice by doing this. no real harm in that but it takes some time to restart it twice. This is why adding a cron job like RARgames in the post above mentions is becuase the cron job is doing this for you 60 seconds after a reboot.

So I would argue that you should just run the command to start the service if you are using a cron job. I know nothing about the kubernetes service so I am not sure what that commnad would be but it seems the better thing to do here. OR if in the GUI you can do this faster by going to `Apps > Settings > Unset Pool` this stops the kubernetes engine (if it was running but its not so its almost instant) then select `Choose Pool` and select your app pool again which starts the kubernetes engine.

I am awaiting a fix as well! I will post this thread on the DIscord for other users to read!

You didn't read it well.

The GPU "thing" is a workaround to restart kubernetes services, which failed at start if a bridge is present.

MrCaspan · Mar 15, 2024

PyCoder said:
You didn't read it well.

The GPU "thing" is a workaround to restart kubernetes services, which failed at start if a bridge is present.

There is no need to be rude to people helping. I am just putting it in a bit more plane English and explaining it exactly so other can understand the issue and why everyone keeps saying the GPU thing . If you want to be rude then maybe support forums are not for you

MrCaspan · Mar 15, 2024

I have created a new ticket for this on Jira as the other cases do seem to be dead. I will report back what they say!

https://ixsystems.atlassian.net/browse/NAS-127870

PyCoder · Mar 15, 2024

MrCaspan said:
There is no need to be rude to people helping. I am just putting it in a bit more plane English and explaining it exactly so other can understand the issue and why everyone keeps saying the GPU thing . If you want to be rude then maybe support forums are not for you

How is that rude?

The whole thread is about the same issue you have and the gpu thing is a workaround, we know it has nothing to do with the gpu itself, but there isn't any better solution for the moment.

MrCaspan · Mar 18, 2024

My ticket is now under review, engineering team is looking at the issue.

help! · Mar 18, 2024

Same issue i thinbk

Failed to start kubernetes cluster for Applications: [EFAULT] Unable to configure node: Containerd socket is not available

2024-03-18 20:32:10 (Europe/Londo

MrCaspan · Mar 18, 2024

help! said:
to start kubernetes cluster for A

help! said:
Same issue i thinbk

Failed to start kubernetes cluster for Applications: [EFAULT] Unable to configure node: Containerd socket is not available
2024-03-18 20:32:10 (Europe/Londo

Not sure it is the same sorry. Please open a ticket with TrueNAS and Jira

MrCaspan · Mar 21, 2024

hey, I'm curious if anybody else can test this. I'm not sure if it's just my instance or the same result on everyone else. If I restart my server this problem happens. but if I shut it down and manually start it back up again, the problem doesn't happen. I was doing some hardware upgrades yesterday and when I turned it back on I realized I didn't get the email saying that the service failed and I looked and it came up as expected. The first thing I thought is they fixed it but then I realized I don't have any auto updates turned on so it's still the same OS. Then I realized the only thing I did different was I turned the machine off and then turned it back on. I've updated the ticket with the debug information after doing a shutdown and a restart, so hopefully this narrows the problem down but I'm just curious if others see the same symptom.

jambono5 · Mar 22, 2024

Happy to test this issue, seem to be suffering the same issue.

MrCaspan · Mar 22, 2024

Thanks... this makes me wonder what is different (OS software wise) between a reboot vs a hard shut down then power up?

thedestroyer · Mar 23, 2024

@MrCaspan - I've also duplicated your hypothesis... this issue occurs on a reboot, but not after a hard shutdown and then a restart.

jambono5 · Mar 24, 2024

Tried a few things here:

Shutdown / Manual power on
Adding systemd config mentioned previous to k3s also tried with the kube-router service (which also seemed to struggle to start) too just in case that helped, no luck.

Code:

[Unit]
After=network.target
Requires=br0.device

I tried re-creating the bridge with a different name, given someone mentioned their database seemed to have some confusion with the bridge name.

I can manually reconfigure the network settings for Kubernetes once the system is booted. But as soon as the system reboots, the networking fails and my app completely disappears, which isn't ideal.

I'm not to savy wit Kubernetes to know how there system modules stitch and the correct order they should load. I wonder if the bridge is reporting as up, but isn't quite resolving addresses which then causes Kubernetes' networking to fail.

Black_Duck · Mar 29, 2024

This appears to be (finally) fixed in Dragonfish (24.04 RC1). I've only run it in a test VM, but it appears to work every time - on a hard boot or restart.

jambono5 · Apr 2, 2024

Fingers crossed for the release date...

Anticipated: 24.04.0 Stable 23 April 2024

24.04 (Dragonfish) Version Notes

Highlights, change log, and known issues for the latest SCALE nightly development version.

www.truenas.com

Black_Duck · Apr 4, 2024

So disappointing. I upgraded my TrueNAS to Dragonfish 24.01 RC1 to solve this problem, but it's still occurring. Works fine in a VM, but suspect that's because of different timings with network startup and Kubernetes startup.
I haven't raised a ticket as I had to revert to Cobia as my Time Machine Backup was also failing

.
I'll take another look at both issues later when I have some time, but for now, I can report that the problem is not fixed in Dragonfish - at least for me.

MrCaspan · Apr 5, 2024

Look sliek we finally have a resolution. This forum is read-only so i have edited my last comment to post this change

Please follow the instruction below
https://ixsystems.atlassian.net/browse/NAS-127870 go to the last post and be aware you are altering the base install but once you update to 24.04.1 it will become part of the stock image so wait for it then or if you need the hot fix do it now!

Important Announcement for the TrueNAS Community.

App Service fails to lookup bridge interface during boot after upgrade to TNS 23.10.1

Cadet

Cadet

Dabbler

Dabbler

Dabbler

Dabbler

Dabbler

Dabbler

Explorer

Failed to start kubernetes cluster for Applications: [EFAULT] Unable to configure node: Containerd socket is not available​

Dabbler

Failed to start kubernetes cluster for Applications: [EFAULT] Unable to configure node: Containerd socket is not available​

Dabbler

Cadet

Dabbler

Cadet

Cadet

Explorer

Cadet

Explorer

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "App Service fails to lookup bridge interface during boot after upgrade to TNS 23.10.1"

Similar threads

Failed to start kubernetes cluster for Applications: [EFAULT] Unable to configure node: Containerd socket is not available

Failed to start kubernetes cluster for Applications: [EFAULT] Unable to configure node: Containerd socket is not available