NextCloud TrueChart cronjob causing error

aekt

Dabbler
Joined
Jul 22, 2022
Messages
13
I have a brand new setup that has been running the latest TrueNAS Scale as of today, testing nextcloud stability.

pas few days I have received 2 core dump notifications:
Core files for the following executables were found: /usr/bin/mount. Please create a ticket at https://jira.ixsystems.com/ and attach the relevant core files along with a system debug. Once the core files have been archived and attached to the ticket, they may be removed by running the following command in shell: 'rm /var/db/system/cores/*'.
with kernel log showing:
kernel: mount[PID]: segfault at 6240 ip 0000000000006240 sp 00007ffea49df700 error 14
kernel: Code: Unable to access opcode bytes at RIP 0x6216.
error 14 indicates an userspace invalid page memory write operation.

Instead of filing a ticket, I decided to analyze the core dump by executing:
Code:
gdb /usr/bin/mount <dump_file>

both of the dump were caused by same root cause:
[New LWP PID]
Core was generated by `mount -t tmpfs -o size=8589934592 tmpfs /var/lib/kubelet/pods/UUID'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000006240 in ?? ()
with only the pod uuid and PID being different.

then I poked around a bit and found that in /var/log/error there are bunch of
ntpd[4945]: bind(58) AF_INET6 fe80::8c77:30ff:fe07:97eb%1149#123 flags 0x11 failed: Cannot assign requested address
ntpd[4945]: unable to create socket on veth9b804dd8 (1820) for fe80::8c77:30ff:fe07:97eb%1149#123
with IPv6 address and interface being different.

further debugging revealed that this was caused by cronjob of nextcloud app
when the cronjob is running, the vnet interface will show up, and when the cronjob completed, the interface vanishes.

my current high confidence assumption is that the cronjob terminated before network binding can be finished, causing the above binding error.
and a low confidence hypothesis being the constant spawning of the interface will eventually lead to some sort of kernel memory issue, causing the segfault.

i believe this can be repeated in any TrueNAS Scale system with the current TrueChart nextcloud 24.0.3_15.2.28 app being deployed.
anyone else can confirm whether my assumptions hit the mark? or is there more to this?
 

truecharts

Guru
Joined
Aug 19, 2021
Messages
788
I have a brand new setup that has been running the latest TrueNAS Scale as of today, testing nextcloud stability.

pas few days I have received 2 core dump notifications:

with kernel log showing:

error 14 indicates an userspace invalid page memory write operation.

Instead of filing a ticket, I decided to analyze the core dump by executing:
Code:
gdb /usr/bin/mount <dump_file>

both of the dump were caused by same root cause:

with only the pod uuid and PID being different.

then I poked around a bit and found that in /var/log/error there are bunch of

with IPv6 address and interface being different.

further debugging revealed that this was caused by cronjob of nextcloud app
when the cronjob is running, the vnet interface will show up, and when the cronjob completed, the interface vanishes.

my current high confidence assumption is that the cronjob terminated before network binding can be finished, causing the above binding error.
and a low confidence hypothesis being the constant spawning of the interface will eventually lead to some sort of kernel memory issue, causing the segfault.

i believe this can be repeated in any TrueNAS Scale system with the current TrueChart nextcloud 24.0.3_15.2.28 app being deployed.
anyone else can confirm whether my assumptions hit the mark? or is there more to this?

This is a bug with TrueNAS SCALE Operating system, not inherent to our Cronjob.

However: IX Systems does not take bugreports on the forums, so if you want this solved it's highly adviseable to file a bugreport with iX Systems on their "Jira" Bugtracker instead.
 
Top