Restart of 'collectd' is stuck. Affecting FreeNAS operation.

Status
Not open for further replies.

Arik Yavilevich

Dabbler
Joined
Sep 22, 2015
Messages
12
Hi,

I have an issue where the collectd service is stuck if restarted. While it is stuck, I can't complete certain operations in the GUI, such as auto-import.

To reproduce, I restart the server and run:
'/usr/sbin/service collectd restart'
the shell then stuck at
Code:
Stopping collectd.
Waiting for PIDS: [pid]

if I 'pkill -9 collectd' it resolves the issue and I can now restart with 'service collectd restart' until the next reboot.

The machine is for testing, not production. It is a VM in ESXi. 4 vcpus and 8GB vram.

Is this a known issue? Why would something like this happen?

Please advise,
Arik.

+--------------------------------------------------------------------------------+
+ FreeNAS-9.3-STABLE-201509160044 +
+--------------------------------------------------------------------------------+
Operating system type: FreeBSD
Operating system release: 9.3-RELEASE-p25
Operating system revision: 199506
Kernel version: FreeBSD 9.3-RELEASE-p25 #0 r281084+d3a5bf7: Tue Sep 15 17:52:04 PDT 2015
root@build3.ixsystems.com:/tank/home/jkh/build/FN/objs/os-base/amd64/tank/home/jkh/build/FN/FreeBSD/src/sys/FREENAS.amd64
Hostname: freenas.local
Name of kernel file booted: /boot/kernel/kernel
 

Arik Yavilevich

Dabbler
Joined
Sep 22, 2015
Messages
12
Why are you restarting collectd manually?

Just as a means to reproduce a flow from the GUI that doesn't work. In the current condition, any operation in which the notifier will try to do a restart of collecd will become stuck. If part of atomic transaction, then the system db stays locked as well.

For example, the log will be
Code:
Sep 22 11:54:05 freenas manage.py: [middleware.notifier:228] Calling: restart(collectd)
Sep 22 11:54:05 freenas manage.py: [middleware.notifier:175] Executing: /usr/sbin/service ix-collectd quietstart
Sep 22 11:54:05 freenas manage.py: [middleware.notifier:189] Executed: /usr/sbin/service ix-collectd quietstart -> 0
Sep 22 11:54:05 freenas manage.py: [middleware.notifier:175] Executing: /usr/sbin/service collectd restart

and end there. Note no "Executed" for the restart.
 

Arik Yavilevich

Dabbler
Joined
Sep 22, 2015
Messages
12
Hi dlavigne,

Thanks for your suggestions. I tried with 1 vpcu and the issue is still reproducible.
It seems that if I leave the system running for some time, the "collectd" service changes to a state that does allow for a restart. So this is an issue that lasts several hours after a reboot.

With regards to the number of cpus. I found conflicting information in the documentation and in the forums. Here are some quotes:

http://www.freenas.org/whats-new/2015/05/yes-you-can-virtualize-freenas.html
"At least two vCPUs"
Josh Paetzel
iXsystems Senior Engineer
May 2015

http://doc.freenas.org/9.3/freenas_install.html#vmware-esxi
originates from: http://olddoc.freenas.org/index.php/FreeNAS®_in_a_Virtual_Environment
"Under "CPUs", make sure that only 1 virtual processor is listed, otherwise you will be unable to start any FreeNAS® services."
History
http://olddoc.freenas.org/index.php...ent&action=historysubmit&diff=4171&oldid=4169
by Dru
Feb 2012

https://forums.freenas.org/index.php?threads/freenas-in-vmware-with-more-than-1-virtual-cpu.8737/
"My virtual machine has 5.5GB of RAM allocated and 1 CPU with 3 cores."
cyberjock, Sep 7, 2012

https://forums.freenas.org/index.ph...ide-to-not-completely-losing-your-data.12714/
"I now use 3 vCPUs for FreeNAS"
cyberjock, May 12, 2013

Is there a way to debug collectd? does it write some kind of a log somewhere?

Arik.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Just a word of warning.. quotes from 2012 and 2013 really aren't indications of how things should/will/do work with current builds. We're running a totally different kernel, drivers, etc. So yeah.. not really helpful unfortunately. :(

Considering this issue is only on your system, I tend to think this issue is because of virtualization and not a bug.
 
Status
Not open for further replies.
Top