Massive System Slowdown

mrMuppet

Contributor
Joined
Mar 14, 2014
Messages
192
I've had the same problems in the last days. I think i fixed it for me by stopping all iocages one by one - by doing this i found out that one old couchpotato-jail did the 100% Processor load. By stopping that jail it jumped to 10 to 30 % . I hope that helps you.
I was wrong! This night my NAS had the same issues as before. o_O (But without the couchpotato that was responsible as i thought).
 

AirborneTrooper

Contributor
Joined
Jun 20, 2014
Messages
148
My GUI is frozen up like predicted it would be. My jails are working because I can get into all my apps. I am running
Code:
ps auxw | grep python3.8
and will have to wait until the results are returned. Definitely taking a few minutes so far to run.
 

AirborneTrooper

Contributor
Joined
Jun 20, 2014
Messages
148
I waited 10 mins and no results. I killed a few processes from what I posted yesterday and I was still unable to get the GUI to load. I feel like I'm going to need to schedule reboots every day to avoid this unless a solid fix is made.
 

AirborneTrooper

Contributor
Joined
Jun 20, 2014
Messages
148
Don't know if this will do anything but after rebooting, here's my results. I really am not familiar enough with FreeNAS logging to get more info. Been using FreeNAS since 9.3 and this is the first time I've had any serious problems.

Code:
root@truenas[~]# ps auxw | grep python3.8
root        461    2.5  2.3 1800756 1512676  -  S    07:24     0:41.52 python3.8: middlewared (python3.8)
root        465    0.0  0.0   21532   11936  -  I    07:24     0:00.05 /usr/local/bin/python3.8 -c from multiprocessing.resource_tracker import main;main(11)
root        507    0.0  0.2  205280  164944  -  S    07:24     0:06.94 python3.8: middlewared (worker) (python3.8)
root        508    0.0  0.2  202108  162160  -  S    07:24     0:04.80 python3.8: middlewared (worker) (python3.8)
root        509    0.0  0.2  231732  160448  -  S    07:24     0:04.75 python3.8: middlewared (worker) (python3.8)
root        510    0.0  0.2  207604  162788  -  S    07:24     0:04.89 python3.8: middlewared (worker) (python3.8)
root        511    0.0  0.2  201288  161364  -  S    07:24     0:04.64 python3.8: middlewared (worker) (python3.8)
root        865    0.0  0.1   59524   47632  -  I    07:24     0:00.55 python3.8: /usr/local/bin/python3.8 -c from multiprocessing.spawn import spawn_main; spawn
daemon     1639    0.0  0.1   72336   44660  -  I    07:25     0:00.25 python3 /usr/local/bin/wsdd.py (python3.8)
root       1910    0.0  0.1   49232   35716 v0  Is+  07:25     0:00.30 python3 /etc/netcli (python3.8)
root       7144    0.0  0.0    2624    1788  0  R+   07:45     0:00.00 grep python3.8

 

Prophet4NO1

Dabbler
Joined
Sep 11, 2016
Messages
20
Been out for work a couple of days. Got home, server stuffed again. I did the GREP searches listed a few posts. SNMP showed nothing. Python pulls the following.


Code:
root@freenas:~ # ps auxw | grep python
root        424   0.0  2.7 1053900 886900  -  S    16:50      1:09.71 python3.8: middlewared (python3.8)
root        428   0.0  0.0   21532  11936  -  I    16:50      0:00.03 /usr/local/bin/python3.8 -c from multiprocessing.resource_tracker import main;main(11)
root        470   0.0  0.5  202284 162432  -  I    16:50      0:06.52 python3.8: middlewared (worker) (python3.8)
root        471   0.0  0.5  202412 160596  -  I    16:50      0:06.62 python3.8: middlewared (worker) (python3.8)
root        472   0.0  0.5  226852 164588  -  I    16:50      0:07.02 python3.8: middlewared (worker) (python3.8)
root        473   0.0  0.5  201944 162140  -  I    16:50      0:06.57 python3.8: middlewared (worker) (python3.8)
root        474   0.0  0.5  202520 162568  -  I    16:50      0:06.86 python3.8: middlewared (worker) (python3.8)
root        756   0.0  0.2   79708  56576  -  S    16:50      0:00.88 python3.8: middlewared (zettarepl) (python3.8)
root        828   0.0  0.1   59524  47628  -  I    16:50      0:00.35 python3.8: /usr/local/bin/python3.8 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=12, pipe_handle=45) --multiprocessing-fork (python3.8)
daemon     1711   0.0  0.1   72244  44732  -  I    16:51      0:00.41 python3 /usr/local/bin/wsdd.py (python3.8)
root       1974   0.0  0.1   49232  35772 v0  Is+  16:51      0:00.19 python3 /etc/netcli (python3.8)
root       8312   0.0  0.0   11508   2964  0  S+   21:07      0:00.00 grep python


Nothing stands out to me. No heavy CPU or memory use. So, not really sure what to make of this.
 

mrMuppet

Contributor
Joined
Mar 14, 2014
Messages
192
This morning (while the next slowdown) i watched the system processes that took nearly 100% of processor: this time it was "sysctl" and there were quite a lot of sysctl-processes running. I'm not sure why this is, because now on a fresh started system (only running for 2 Hours now) there is not one "sysctl" process when i list processes with "top" .

What can i do?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Don't know if this will do anything but after rebooting, here's my results. I really am not familiar enough with FreeNAS logging to get more info. Been using FreeNAS since 9.3 and this is the first time I've had any serious problems.
I saw you're running UPS... I had some issues with my email config in the setup of that... maybe try running without the UPS service (or at least verify your config in there).

Also, check /var/log/messages for anything.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
What can i do?
Have a look at this thread which covers your issues more closely than the one you're in, which is more about the SNMP issue...
 

AirborneTrooper

Contributor
Joined
Jun 20, 2014
Messages
148
I suspect my system can't even make a full 24 hours or just barely over it before freezing. top and htop commands won't execute in ssh. My only option is to reboot. I will turn off UPS and check any other service to make sure SNMP isn't being used. I know UPS has a thing to email me so maybe that's one hold up. TrueNAS needs to release a patch ASAP.
 

AirborneTrooper

Contributor
Joined
Jun 20, 2014
Messages
148
Also, check /var/log/messages for anything.

After rebooting and being able to get back in, I see a lot of these

Code:
Oct 30 08:05:21 truenas 1 2020-10-30T08:05:21.985936-04:00 truenas.local collectd 1790 - - Traceback (most recent call last):
  File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
    with Client() as c:
  File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 281, in __init__
    self._ws.connect()
  File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 124, in connect
    rv = super(WSClient, self).connect()
  File "/usr/local/lib/python3.8/site-packages/ws4py/client/__init__.py", line 223, in connect
    bytes = self.sock.recv(128)
socket.timeout: timed out


I see a lot of these as well. I have turned off the service now.
Code:
Oct 30 05:58:02 truenas 1 2020-10-30T05:58:02.020700-04:00 truenas.local upsmon 1707 - - Poll UPS [ups@localhost:3493] failed - Server disconnected
Oct 30 05:58:10 truenas 1 2020-10-30T05:58:02.250753-04:00 truenas.local upsmon 1707 - - Communications with UPS ups@localhost:3493 lost
Oct 30 05:58:10 truenas 1 2020-10-30T05:58:04.067364-04:00 truenas.local collectd 1790 - - nut plugin: nut_read: upscli_list_start (ups) failed: Server disconnected
Oct 30 05:58:10 truenas 1 2020-10-30T05:58:07.928305-04:00 truenas.local upsd 1700 - - Data for UPS [ups] is stale - check driver
Oct 30 05:58:10 truenas 1 2020-10-30T05:58:07.928374-04:00 truenas.local upsd 1700 - - write() failed for 127.0.0.1: Broken pipe
Oct 30 05:58:10 truenas 1 2020-10-30T05:58:07.928420-04:00 truenas.local upsd 1700 - - write() failed for 127.0.0.1: Broken pipe
Oct 30 05:58:15 truenas 1 2020-10-30T05:58:15.308779-04:00 truenas.local upsmon 1707 - - Can't login to UPS [ups@localhost:3493]: Server disconnected
Oct 30 05:58:15 truenas 1 2020-10-30T05:58:15.477471-04:00 truenas.local upsd 1700 - - write() failed for 127.0.0.1: Broken pipe
Oct 30 05:58:15 truenas 1 2020-10-30T05:58:15.849232-04:00 truenas.local collectd 1790 - - nut plugin: nut_read: upscli_list_start (ups) failed: Data stale
Oct 30 05:58:29 truenas 1 2020-10-30T05:58:29.417022-04:00 truenas.local upsmon 1707 - - Poll UPS [ups@localhost:3493] failed - Driver not connected
Oct 30 05:58:29 truenas 1 2020-10-30T05:58:29.471866-04:00 truenas.local upsd 1700 - - UPS [ups] data is no longer stale
Oct 30 05:58:34 truenas 1 2020-10-30T05:58:34.428216-04:00 truenas.local upsmon 1707 - - Communications with UPS ups@localhost:3493 established
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
After rebooting and being able to get back in, I see a lot of these
I think this is something already being looked into and isn't related to the slowdown... seems we all have it.

I see a lot of these as well. I have turned off the service now.
I was seeing the same... seems something changed about the UPS setup and I didn't manage to figure out what needed changing to fix it yet (as I got distracted by the SNMP problem in between).
 

AirborneTrooper

Contributor
Joined
Jun 20, 2014
Messages
148
Well I rebooted around 9:45am so I guess I’ll see if turning off UPS service let’s it stay unfrozen past tomorrow.
 

FrankNAS

Contributor
Joined
Dec 3, 2017
Messages
111
the sysctl/snmp issue should be fixed in the next release: https://jira.ixsystems.com/browse/NAS-108050

edit: if possible try and disable anything that calls
Code:
sysctl -a
as a few of the sysctl -a calls seem to take a very long time and thus cause the slowdowns
I think even generating a debug log will cause this behavior too
you can take a look at the likely problematic ones in the pull request https://github.com/freenas/freenas/pull/5939
edit2: im going to try and make the changes from the pull request and see if that resolves the issue. I like having snmp and am too impatient to wait for the next release
edit3:
Code:
 sysctl kstat.zfs 
for me is the one that causes the unresponsiveness as in its been over a minute and still hasnt output anything past
Code:
 kstat.zfs.misc.dbufstats.cache_count 
all of my other services and jails are working fine but both console and gui are unresponsive
edit4:
Code:
 File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read 
once that occurred it finished up quickly. I manually made the changes to the files from pull request and turned snmp back on. ill see what happens next
 
Last edited:

tangles

Dabbler
Joined
Jan 12, 2018
Messages
33
For me, stopping my 3 Jails cleared up the slowness… SSH and GUI would stall for > 60 seconds at a time.
sysctl was constantly > 95% on one cpu when using top.
70GB 0f 96GB RAM free and ARC only sitting at 10GB now after uptime of 10 mins.
ARC would normally grow to leave around 3GB free once it fills up servicing the 3 pools.

Services running:
SSH​
SMB​
ZeroTier​
Jails are:
Netdata​
Hoobs​
QBittorrent​
Am happy to delete jails and rebuild them from scratch (bit annoying but not the end of the world) when I have more time.

I noticed I cannot change the train in the GUI for updates… Is this because there's no nightlies offered post 12.0-release yet?
 

jsylvia007

Explorer
Joined
Oct 4, 2011
Messages
84
In another thread, I did this https://www.truenas.com/community/threads/snmp-agent-py-continuously-hi-cpu-use.88262/post-611348

Won't be able to tell if it works or not until tomorrow morning. I turned off UPS service yesterday and GUI was still frozen this morning. I wish 12.0 let you change trains to nightly because this is super annoying having to reboot every morning.
You've already said that you have the SNMP service disabled. Changing that script won't likely make a difference for you, unless the script is running without your knowledge.
 

AirborneTrooper

Contributor
Joined
Jun 20, 2014
Messages
148
Wasn’t exactly sure but I’ll just reboot every morning until the next update comes out then. Learned my lesson and will not do this big of an update until I read forums to see if people are having issues
 

jsylvia007

Explorer
Joined
Oct 4, 2011
Messages
84
Wasn’t exactly sure but I’ll just reboot every morning until the next update comes out then. Learned my lesson and will not do this big of an update until I read forums to see if people are having issues
This really is uncharacteristic too. I feel your pain. Luckily my issue was solved by disabling SNMP. I specifically waited through all the RCs because I didn't want to see these issues. Apparently none of this was found in the RCs though, which is slightly troubling.

Still, I will say that FreeNAS/TrueNAS has been one of the most stable pieces in my environment for over the last 15 years, and the value proposition is excellent.
 

AirborneTrooper

Contributor
Joined
Jun 20, 2014
Messages
148
I’ve never had this kind of problem. The worst thing I ever experienced was when Corral came out and I rolled back and gladly waited till that dumpster fire was put out. If I didn’t upgrade my zpools I would go back to 11.3 as it was working great.
 

jsylvia007

Explorer
Joined
Oct 4, 2011
Messages
84
I’ve never had this kind of problem. The worst thing I ever experienced was when Corral came out and I rolled back and gladly waited till that dumpster fire was put out. If I didn’t upgrade my zpools I would go back to 11.3 as it was working great.
Oh dear lord... I had actually forgotten about that abomination... Thankfully though, they saw the error of their ways almost immediately LOL.
 
Top