Lidarr.exe process hangs occassionally, process trying to get a lock in kernel space

rmccullough

Patron
Joined
May 17, 2018
Messages
269
I am running Lidarr (like Sonarr/Radarr but for music) in an iocage jail on FreeNAS 11.2 U1. Over the last 2 months I have observed the process hang 2-3 times where the Web UI is unresponsive and the mono/Lidarr.exe process is using 100% of a CPU. Restarting the service is stuck waiting for the PID to end. Performing a "kill -9" does not kill the process. Attempting to reboot the FreeNAS server also hangs with the system unable to end the process. I am forced to hard reset the system.

I opened a case against Lidarr on their Github page: https://github.com/lidarr/Lidarr/issues/602

They asked for a "truss" log against the mono PID when the issue occurs. Since I cannot reproduce this, I don't know how to obtain that. For fun I ran the "truss" command and provided some output here: https://pastebin.com/xnnbrt4S

This is their response:
Yes, that is extremely helpful. The pointer here is a flood of UMTX_OP_WAIT_UINT_PRIVATE going on, indicating the process is spinning uselessly trying to get a lock in kernel space. Most likely related to a file operation or high rate of file operations. This does NOT reproduce on FreeBSD 11.2 or 12.0 with ZFS.

There are complaints about numerous other applications that access files frequently running into the same problems, making it clearly Yet Another FreeNAS Specific Issue

Is this a known problem? Should I go log a ticket on redmine?
 

rmccullough

Patron
Joined
May 17, 2018
Messages
269
FWIW, this happened again. I see a lot more locked processes in my iocage jail when running the 'ps' command:
Code:
[root@lidarr /]# ps auxwwww
USER     PID  %CPU %MEM    VSZ    RSS TT  STAT STARTED       TIME COMMAND
lidarr 51252 100.0  1.3 496644 448348  -  IsJ  Sat14   1196:11.17 /usr/local/bin/mono /usr/local/share/Lidarr/Lidarr.exe --nobrowser --data=/usr/local/lidarr (mono-sgen)
root    5990   0.0  0.0   6544   2356  -  IsJ  Thu12      0:00.06 dhclient: epair0b [priv] (dhclient)
_dhcp   6009   0.0  0.0   6544   2476  -  DCsJ Thu12    117:50.15 dhclient: epair0b (dhclient)
root    6216   0.0  0.0   6412   2456  -  SsJ  Thu12      0:00.69 /usr/sbin/syslogd -c -ss
root    6272   0.0  0.0   6464   2432  -  IsJ  Thu12      0:01.52 /usr/sbin/cron -J 15 -s
root   82953   0.0  0.0   6464   2436  -  IJ   03:01      0:00.00 cron: running job (cron)
root   82957   0.0  0.0   7064   2796  -  IsJ  03:01      0:00.00 /bin/sh - /usr/sbin/periodic daily
root   83823   0.0  0.0   4236   1964  -  IJ   03:01      0:00.00 lockf -t 0 /var/run/periodic.daily.lock /bin/sh /usr/sbin/periodic LOCKED daily
root   83824   0.0  0.0   7064   2796  -  IJ   03:01      0:00.01 /bin/sh /usr/sbin/periodic LOCKED daily
root   83833   0.0  0.0   7064   2804  -  IJ   03:01      0:00.01 /bin/sh /usr/sbin/periodic LOCKED daily
root   83834   0.0  0.0   6388   2184  -  IJ   03:01      0:00.00 mail -E -s lidarr daily run output root
root   83920   0.0  0.0   7064   2800  -  IJ   03:01      0:00.00 /bin/sh /etc/periodic/daily/450.status-security
root   83921   0.0  0.0   7064   2796  -  IJ   03:01      0:00.00 /bin/sh - /usr/sbin/periodic security
root   83923   0.0  0.0   4236   1964  -  IJ   03:01      0:00.00 lockf -t 0 /var/run/periodic.security.lock /bin/sh /usr/sbin/periodic LOCKED security
root   83924   0.0  0.0   7064   2796  -  IJ   03:01      0:00.01 /bin/sh /usr/sbin/periodic LOCKED security
root   83932   0.0  0.0   7064   2804  -  IJ   03:01      0:00.01 /bin/sh /usr/sbin/periodic LOCKED security
root   83933   0.0  0.0   6388   2152  -  IJ   03:01      0:00.00 mail -E -s lidarr daily security run output root
root   88428   0.0  0.0   7064   2820  -  IJ   03:13      0:00.00 /bin/sh - /usr/local/etc/periodic/security/460.pkg-checksum
root   88431   0.0  0.0   7064   2820  -  IJ   03:13      0:00.00 /bin/sh - /usr/local/etc/periodic/security/460.pkg-checksum
root   88432   0.0  0.0  13572   8312  -  IJ   03:13      0:00.00 /usr/local/sbin/pkg check -qsa
root   88433   0.0  0.0   6324   2096  -  IJ   03:13      0:00.00 sed -e s/ checksum mismatch for//
root   88434   0.0  0.0   6288   2000  -  ICJ  03:13      0:00.00 tee /dev/stderr
root   88435   0.0  0.0   6508   2244  -  IJ   03:13      0:00.00 wc -l
root   88436   0.0  0.0  13572   9152  -  DJ   03:13     98:42.15 /usr/local/sbin/pkg check -qsa
root   80857   0.0  0.0   7872   4024  0  SJ   09:05      0:00.01 bash
root   80885   0.0  0.0   6992   2908  0  R+J  09:05      0:00.00 ps auxwwww


And 'truss' does not produce any output:
Code:
[root@lidarr /]# truss -f -p 51252
^Ctruss: Unexpect stop in waitpid: Interrupted system call
[root@lidarr /]#
 
Top