Looking for Bluefin systems where Apps don't start

SeiyaGame

Cadet
Joined
Feb 19, 2023
Messages
4
Hello, I encounter the problem where Apps are failing to run.
My current setup is :
  • Ryzen 5 3100
  • Motherboard B550 AORUS ELITE V2
  • 32 Go DDR4 3200Mhz
  • PCie LSI 9200-8E
  • PCie SFP+ Chelsio T520-CR (in bond mode)
I tried many things, to solve the problem, but nothing works:
  • Unset the pool, remove the ix-applications and restart
  • Moved ix-applications of pool
  • Change the advanced Kubernetes Settings ( Node IP, Route Interface/Gateway )
  • Not used Truecharts
  • Disable IPV6
  • Check if my datetime is correct (NTP servers etc...)
It is good to know that before I was on Truenas CORE. To migrate to Truenas SCALE, I made a clean installation and imported my saved configuration.
I hope this can help you solve the problem!
 

Attachments

  • debug-srv-nas-20230219223959.tgz
    16.4 MB · Views: 109

efalsken

Cadet
Joined
Feb 17, 2023
Messages
8
I have encounter these problems at different server, and i reinstall servel times, every times is new installation ,new pool(no import) and new dataset(ix-applications).

The process is here. Install Apps(include Truecharts App), Disable Host Path Safety Checks, Add SMB Path and ACL to share apps mount data, then change system datasets to boot-pool.
Reboot and problem come again, here is the alarts:
Failed to start kubernetes cluster for Applications: [EFAULT] Failed to configure PV/PVCs support: Cannot connect to host 127.0.0.1:6443 ssl:default [Connect call failed ('127.0.0.1', 6443)]
* Glusterd work directory dataset is not mounted.
I had similar problem. Just fixed it.... here's my steps:

  1. Apps > Unset Pool
  2. Reboot
  3. Apps > Set Pool
  4. WAIT UNTIL ALL JOBS COMPLETE
  5. Apps > Advanced Settings > Unset "Route v4 Interface" and "Route v4 Gateway"
  6. WAIT WAIT WAIT It took about an hour, but it finally deployed my app.
 

Big-T

Cadet
Joined
Feb 20, 2023
Messages
5
I have an interesting home setup where I have Bluefin installed on bare metal with the Plex app installed, and I'm using a VM within TrueNAS to host OPNsense for internet connectivity. For performance reasons, I have two NICs passed through to the VM via PCIe - they are not accessible to TrueNAS- as well as a dedicated TrueNAS NIC. The LAN interface in OPNSense is connected to the TrueNAS NIC via an external switch. TrueNAS is using DHCP to get its IP from OPNSense, though it's a reserved IP so it gets the same one every time.

I also have the Plex app working fine... until I reboot the TrueNAS machine. I think the important thing to note is that when TrueNAS tries to init the app, the vm hasn't booted yet and so TrueNAS doesn't yet have an internet connection. This results in errors about failing the catalog sync and a plex app that is stuck in deploy. Once the VM is up, I can unset, and then set, the app pool and the app starts right up.

I don't think internet access should be a prerequisite for app init, or if it is I'm fairly certain that would be documented somewhere, and I haven't seen anything about it.
I haven't yet experimented with the netwait feature; I presume it will cause the vm boot to wait as well and therefore not work for my purpose but I have not tried it. I also have not tried giving TrueNAS a true static IP- perhaps the app fails to deploy simply because it lacks an IP address (since DHCP is unavailable).

When I get around to testing either of those possible solutions, I'll reply here in case they work, but in the meantime I wanted to leave this post in case anyone knows the answer already, or in case anyone else is having similar issues.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
I've got one. I had bluefin running just fine for months. It was on a bonded ethernet connection. Moved the server to a new location and I deleted the bond. (same primary IP) and now none of the apps start and k3s is failing. Combinations of rebooting and unset-reset pool did not help.
Thanks.. if you can confirm all networking is otherwise fine, please start a new thread and we can report the bug. If you want to tests with 22.12.1.. it will be out this week.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
I have an interesting home setup where I have Bluefin installed on bare metal with the Plex app installed, and I'm using a VM within TrueNAS to host OPNsense for internet connectivity. For performance reasons, I have two NICs passed through to the VM via PCIe - they are not accessible to TrueNAS- as well as a dedicated TrueNAS NIC. The LAN interface in OPNSense is connected to the TrueNAS NIC via an external switch. TrueNAS is using DHCP to get its IP from OPNSense, though it's a reserved IP so it gets the same one every time.

I also have the Plex app working fine... until I reboot the TrueNAS machine. I think the important thing to note is that when TrueNAS tries to init the app, the vm hasn't booted yet and so TrueNAS doesn't yet have an internet connection. This results in errors about failing the catalog sync and a plex app that is stuck in deploy. Once the VM is up, I can unset, and then set, the app pool and the app starts right up.

I don't think internet access should be a prerequisite for app init, or if it is I'm fairly certain that would be documented somewhere, and I haven't seen anything about it.
I haven't yet experimented with the netwait feature; I presume it will cause the vm boot to wait as well and therefore not work for my purpose but I have not tried it. I also have not tried giving TrueNAS a true static IP- perhaps the app fails to deploy simply because it lacks an IP address (since DHCP is unavailable).

When I get around to testing either of those possible solutions, I'll reply here in case they work, but in the meantime I wanted to leave this post in case anyone knows the answer already, or in case anyone else is having similar issues.
Stable networking including Internet access is really a prerequisite for Apps. Once they are up and running, they can run while there is an outage.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
I had similar problem. Just fixed it.... here's my steps:

  1. Apps > Unset Pool
  2. Reboot
  3. Apps > Set Pool
  4. WAIT UNTIL ALL JOBS COMPLETE
  5. Apps > Advanced Settings > Unset "Route v4 Interface" and "Route v4 Gateway"
  6. WAIT WAIT WAIT It took about an hour, but it finally deployed my app.

Thanks.. congrats on resolving the issue.

This sequence is more complex than it should be.

I would have thought Step 5...the "kubernetes Settings" screen should be done prior to Step 3. Then Steps 4 and 6 happen together.

If that didn't work... it's a problem.
 

Big-T

Cadet
Joined
Feb 20, 2023
Messages
5
Stable networking including Internet access is really a prerequisite for Apps. Once they are up and running, they can run while there is an outage.
Even if this is the case, it still seems like there should be a way to timeout on an app deployment and recover from such a situation without unsetting and resetting the pool.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Even if this is the case, it still seems like there should be a way to timeout on an app deployment and recover from such a situation without unsetting and resetting the pool.

I don't disagree... if we are still seeing issues with 22.12.1, please make a suggestion.
 

patrick339

Cadet
Joined
Feb 19, 2023
Messages
2
22.12.1 Still have same problem.
But sometimes the alert does not show up, and the app gets stuck during deployment.
 

Sparx

Contributor
Joined
Apr 18, 2017
Messages
107
Are you still looking? I have apps that shows up as active and seems to work according to logs but no GUI (Fixed with removing the kubernetes v4add that was probably added by myself at some point during fault tracing). Some apps that refuse to start (something strange with app storage permissions that was fixed). Nvidia driver refuses to work and I cant roll back to 22.02 because kubernetes wont run at all when trying. So many faults from this release i dont really know where to start fixing stuff. Cant get the VNC to connect. Or if the VM doesnt start. Cant really tell.
 
Last edited:

Sparx

Contributor
Joined
Apr 18, 2017
Messages
107
Probably could be. But I dont have any chelsio at least. But there are other cards in the server. Some PLX devices. One HBA. And intel built in X722nic.
 

bbarnhill

Cadet
Joined
Jun 26, 2019
Messages
1
None of my apps showed up at first after the update. Had to try to save the Advanced settings without v4 and gateway addresses set. Get the error message then set them and save. Then they all showed up as "deploying". Still waiting to see if they deploy.
 

SAINT

Dabbler
Joined
Jun 20, 2015
Messages
16
I had similar problem. Just fixed it.... here's my steps:

  1. Apps > Unset Pool
  2. Reboot
  3. Apps > Set Pool
  4. WAIT UNTIL ALL JOBS COMPLETE
  5. Apps > Advanced Settings > Unset "Route v4 Interface" and "Route v4 Gateway"
  6. WAIT WAIT WAIT It took about an hour, but it finally deployed my app.
This worked for most of my apps. However, I noticed an issue with any container that used a host path volume inside an SMB share. Disabling SMB shares (the share itself, not the SMB service) fixed the issue and allowed it to run, but reenabling the share causes intermittent issues. Sometimes the apps will crash, other times they just can't actually access the files in the host path volume. Sometimes they'll run just fine. Not sure if this is a bug to report or if I have some weird SMB configuration though.
 

LarsR

Guru
Joined
Oct 23, 2020
Messages
719
That's Not a Bug. That's the hostpath validation that got introduced with bluefin. That topic has been discussed numerous Times
 

SAINT

Dabbler
Joined
Jun 20, 2015
Messages
16
That's Not a Bug. That's the hostpath validation that got introduced with bluefin. That topic has been discussed numerous Times
Ah, I've not seen that. Just had a quick read though and that's probably a good idea, yeah.
 

SeiyaGame

Cadet
Joined
Feb 19, 2023
Messages
4
Stop smb service, let apps start then start smb service
This is a workaround for applications that do not start
 

PyCoder

Dabbler
Joined
Nov 5, 2019
Messages
30
I have issues with as example ferdi, ferdium or even lychee.

According to the logs the ports are blocked?!
Even on a fresh installed Scale without anything I get the same message!

See:


32GB ECC Ram with a 5600.

I had no issues with ferdi, lychee or ferdium under unraid or fedora.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Port
I have issues with as example ferdi, ferdium or even lychee.

According to the logs the ports are blocked?!
Even on a fresh installed Scale without anything I get the same message!

See:


32GB ECC Ram with a 5600.

I had no issues with ferdi, lychee or ferdium under unraid or fedora.

Replied on other thread.... this does not seem to be a case of Apps not starting. Its a specific App (ferdi) using a specific TCP port 3333 that is being blocked.

If anyone has run Ferdi, perhaps they can provide the solution.
 

derWalter

Explorer
Joined
Dec 5, 2020
Messages
88
Thanks for asking, lucky me found the thread :) I ve updated just yesterday, wanted to wait for a later version,
to NOT get stuck in such a mess :P

two apps aren't starting and hanging in deploying, the more important one Traefik (hence none of my services is reachable atm)

Application events:

2023-04-16 19:16:30
Updated LoadBalancer with new IPs: [] -> [192.168.0.112]
2023-04-16 19:16:30
Container image "ghcr.io/truecharts/alpine:v3.14.2@sha256:4095394abbae907e94b1f2fd2e2de6c4f201a5b9704573243ca8eb16db8cdb7c" already present on machine
2023-04-16 19:16:30
Error: Error response from daemon: invalid volume specification: '/usr/bin:/host/usr/bin': '/usr/bin' 'path' not allowed to be mounted
2023-04-16 19:16:29
Updated LoadBalancer with new IPs: [] -> [192.168.0.112]
2023-04-16 19:16:29
Created container autopermissions
2023-04-16 19:16:29
Started container autopermissions
2023-04-16 19:16:28
Add eth0 [172.16.1.115/16] from ix-net
2023-04-16 19:16:28
Container image "ghcr.io/truecharts/alpine:v3.14.2@sha256:4095394abbae907e94b1f2fd2e2de6c4f201a5b9704573243ca8eb16db8cdb7c" already present on machine
2023-04-16 19:16:27
Ensuring load balancer
2023-04-16 19:16:27
Applied LoadBalancer DaemonSet kube-system/svclb-traefik-3861cbce
2023-04-16 19:16:27
Ensuring load balancer
2023-04-16 19:16:27
Applied LoadBalancer DaemonSet kube-system/svclb-traefik-tcp-ed4c2336
2023-04-16 19:16:27
Scaled up replica set traefik-759745fd85 to 1
2023-04-16 19:16:27
Created pod: traefik-759745fd85-v2twt
2023-04-16 19:16:27
Successfully assigned ix-traefik/traefik-759745fd85-v2twt to ix-truenas


if some one got a hint, so far I didn't stumble upon anything :|
 
Top