I have a brand new setup that has been running the latest TrueNAS Scale as of today, testing nextcloud stability.
pas few days I have received 2 core dump notifications:
Instead of filing a ticket, I decided to analyze the core dump by executing:
both of the dump were caused by same root cause:
then I poked around a bit and found that in /var/log/error there are bunch of
further debugging revealed that this was caused by cronjob of nextcloud app
when the cronjob is running, the vnet interface will show up, and when the cronjob completed, the interface vanishes.
my current high confidence assumption is that the cronjob terminated before network binding can be finished, causing the above binding error.
and a low confidence hypothesis being the constant spawning of the interface will eventually lead to some sort of kernel memory issue, causing the segfault.
i believe this can be repeated in any TrueNAS Scale system with the current TrueChart nextcloud 24.0.3_15.2.28 app being deployed.
anyone else can confirm whether my assumptions hit the mark? or is there more to this?
pas few days I have received 2 core dump notifications:
with kernel log showing:Core files for the following executables were found: /usr/bin/mount. Please create a ticket at https://jira.ixsystems.com/ and attach the relevant core files along with a system debug. Once the core files have been archived and attached to the ticket, they may be removed by running the following command in shell: 'rm /var/db/system/cores/*'.
error 14 indicates an userspace invalid page memory write operation.kernel: mount[PID]: segfault at 6240 ip 0000000000006240 sp 00007ffea49df700 error 14
kernel: Code: Unable to access opcode bytes at RIP 0x6216.
Instead of filing a ticket, I decided to analyze the core dump by executing:
Code:
gdb /usr/bin/mount <dump_file>
both of the dump were caused by same root cause:
with only the pod uuid and PID being different.[New LWP PID]
Core was generated by `mount -t tmpfs -o size=8589934592 tmpfs /var/lib/kubelet/pods/UUID'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000006240 in ?? ()
then I poked around a bit and found that in /var/log/error there are bunch of
with IPv6 address and interface being different.ntpd[4945]: bind(58) AF_INET6 fe80::8c77:30ff:fe07:97eb%1149#123 flags 0x11 failed: Cannot assign requested address
ntpd[4945]: unable to create socket on veth9b804dd8 (1820) for fe80::8c77:30ff:fe07:97eb%1149#123
further debugging revealed that this was caused by cronjob of nextcloud app
when the cronjob is running, the vnet interface will show up, and when the cronjob completed, the interface vanishes.
my current high confidence assumption is that the cronjob terminated before network binding can be finished, causing the above binding error.
and a low confidence hypothesis being the constant spawning of the interface will eventually lead to some sort of kernel memory issue, causing the segfault.
i believe this can be repeated in any TrueNAS Scale system with the current TrueChart nextcloud 24.0.3_15.2.28 app being deployed.
anyone else can confirm whether my assumptions hit the mark? or is there more to this?