Why my apps restart without any good reason?

mehran

Cadet
Joined
Apr 24, 2023
Messages
6
I have TrueNAS Scale (TrueNAS-SCALE-23.10.2) setup at home. I use it for data storage as well as providing some home lab services. For example, I have a MLFlow service running (some docker image) which logs my machine learning experiments' metrics. Simply put, it's a web application which I call its APIs to save some data into some DB. Just to be clear, the ML experiment is not running on TN Scale, only the results are saved there.

If this web server is unavailable, my machine which is doing the actual ML experiment will fail (with API inaccessible error). And this happens from time to time and I don't know why. There's no good reason for it. I mean if it was a power outage (I don't have UPS), first my other machine would be affected too (it is not) also the TN would not be back on after it (TN is still running) - this has happened before but not in this case.

One other observation is that, I have a "uptime-kuma" app running on the same TN and each time it restarts, it sends me a notification. And each time MLFlow API fails, I also receive a notification from uptime-kuma (uptime-kuma is not monitoring MLFlow, it's monitoring some other apps). It seems the whole kubernetes or TN is reset.

I checked the logs and this is all I see:

Code:
$ tail /var/logs/messages -n 500 | grep "Mar 11"

Mar 11 04:02:41 truenas kernel: kube-bridge: port 51(veth43d2ef41) entered blocking state
Mar 11 04:02:41 truenas kernel: kube-bridge: port 51(veth43d2ef41) entered forwarding state
Mar 11 04:02:51 truenas kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth8aca59cb: link becomes ready
Mar 11 04:02:51 truenas kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Mar 11 04:02:51 truenas kernel: kube-bridge: port 55(veth8aca59cb) entered blocking state
Mar 11 04:02:51 truenas kernel: kube-bridge: port 55(veth8aca59cb) entered disabled state
Mar 11 04:02:51 truenas kernel: device veth8aca59cb entered promiscuous mode
Mar 11 04:02:51 truenas kernel: kube-bridge: port 55(veth8aca59cb) entered blocking state
Mar 11 04:02:51 truenas kernel: kube-bridge: port 55(veth8aca59cb) entered forwarding state
Mar 11 04:03:19 truenas kernel: kube-bridge: port 56(vethf71e63c4) entered disabled state
Mar 11 04:03:19 truenas kernel: device vethf71e63c4 left promiscuous mode
Mar 11 04:03:19 truenas kernel: kube-bridge: port 56(vethf71e63c4) entered disabled state
Mar 11 20:59:11 truenas systemd-journald[473]: Data hash table of /var/log/journal/a7d8b70ff4f9462d8d4f33d50337384c/system.journal has a fill level at 75.0 (8533 of 11377 items, 6553600 file size, 768 bytes per hash table item), suggesting rotation.
Mar 11 20:59:11 truenas systemd-journald[473]: /var/log/journal/a7d8b70ff4f9462d8d4f33d50337384c/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 11 20:59:11 truenas systemd-journald[473]: Failed to set ACL on /var/log/journal/a7d8b70ff4f9462d8d4f33d50337384c/user-3000.journal, ignoring: Operation not supported


These are close to the time I suspect the problem occured but they are not really matching the exact time.

Any idea how I can figure out the problem?
 

mehran

Cadet
Joined
Apr 24, 2023
Messages
6
Looking into different apps, I realized that they were all restarted around 4 am. Then I went to Cron Jobs to see if I have any job scheduled around that time. And I do!

Code:
bash /root/heavy_script/heavy_script.sh --self-update -b 10 -rsp -u 10


As I recall, this is from TrueCharts team to backup their apps. I disabled it for now. I'll dig more into why this is restarting my apps.
 

sfatula

Guru
Joined
Jul 5, 2022
Messages
608
This says to update itself, backup yours apps and keep 10 copies, refresh the catalogs, prune old junk, and, the key part, update your apps and rollback if the update fails. Updating means restart the app. If you are using Truecharts, I used to, they can update daily sometimes when they are doing major changes. Or, if you have an app update failing, it will try every single day.

You can tune that command as you wish. See: https://github.com/Heavybullets8/heavy_script
 
Top