Massive System Slowdown

mrMuppet · Oct 29, 2020

mrMuppet said:
I've had the same problems in the last days. I think i fixed it for me by stopping all iocages one by one - by doing this i found out that one old couchpotato-jail did the 100% Processor load. By stopping that jail it jumped to 10 to 30 % . I hope that helps you.

I was wrong! This night my NAS had the same issues as before.

(But without the couchpotato that was responsible as i thought).

AirborneTrooper · Oct 29, 2020

My GUI is frozen up like predicted it would be. My jails are working because I can get into all my apps. I am running

Code:

ps auxw | grep python3.8

and will have to wait until the results are returned. Definitely taking a few minutes so far to run.

AirborneTrooper · Oct 29, 2020

I waited 10 mins and no results. I killed a few processes from what I posted yesterday and I was still unable to get the GUI to load. I feel like I'm going to need to schedule reboots every day to avoid this unless a solid fix is made.

AirborneTrooper · Oct 29, 2020

Don't know if this will do anything but after rebooting, here's my results. I really am not familiar enough with FreeNAS logging to get more info. Been using FreeNAS since 9.3 and this is the first time I've had any serious problems.

Code:

root@truenas[~]# ps auxw | grep python3.8
root        461    2.5  2.3 1800756 1512676  -  S    07:24     0:41.52 python3.8: middlewared (python3.8)
root        465    0.0  0.0   21532   11936  -  I    07:24     0:00.05 /usr/local/bin/python3.8 -c from multiprocessing.resource_tracker import main;main(11)
root        507    0.0  0.2  205280  164944  -  S    07:24     0:06.94 python3.8: middlewared (worker) (python3.8)
root        508    0.0  0.2  202108  162160  -  S    07:24     0:04.80 python3.8: middlewared (worker) (python3.8)
root        509    0.0  0.2  231732  160448  -  S    07:24     0:04.75 python3.8: middlewared (worker) (python3.8)
root        510    0.0  0.2  207604  162788  -  S    07:24     0:04.89 python3.8: middlewared (worker) (python3.8)
root        511    0.0  0.2  201288  161364  -  S    07:24     0:04.64 python3.8: middlewared (worker) (python3.8)
root        865    0.0  0.1   59524   47632  -  I    07:24     0:00.55 python3.8: /usr/local/bin/python3.8 -c from multiprocessing.spawn import spawn_main; spawn
daemon     1639    0.0  0.1   72336   44660  -  I    07:25     0:00.25 python3 /usr/local/bin/wsdd.py (python3.8)
root       1910    0.0  0.1   49232   35716 v0  Is+  07:25     0:00.30 python3 /etc/netcli (python3.8)
root       7144    0.0  0.0    2624    1788  0  R+   07:45     0:00.00 grep python3.8

Prophet4NO1 · Oct 29, 2020

Been out for work a couple of days. Got home, server stuffed again. I did the GREP searches listed a few posts. SNMP showed nothing. Python pulls the following.

Code:

root@freenas:~ # ps auxw | grep python
root        424   0.0  2.7 1053900 886900  -  S    16:50      1:09.71 python3.8: middlewared (python3.8)
root        428   0.0  0.0   21532  11936  -  I    16:50      0:00.03 /usr/local/bin/python3.8 -c from multiprocessing.resource_tracker import main;main(11)
root        470   0.0  0.5  202284 162432  -  I    16:50      0:06.52 python3.8: middlewared (worker) (python3.8)
root        471   0.0  0.5  202412 160596  -  I    16:50      0:06.62 python3.8: middlewared (worker) (python3.8)
root        472   0.0  0.5  226852 164588  -  I    16:50      0:07.02 python3.8: middlewared (worker) (python3.8)
root        473   0.0  0.5  201944 162140  -  I    16:50      0:06.57 python3.8: middlewared (worker) (python3.8)
root        474   0.0  0.5  202520 162568  -  I    16:50      0:06.86 python3.8: middlewared (worker) (python3.8)
root        756   0.0  0.2   79708  56576  -  S    16:50      0:00.88 python3.8: middlewared (zettarepl) (python3.8)
root        828   0.0  0.1   59524  47628  -  I    16:50      0:00.35 python3.8: /usr/local/bin/python3.8 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=12, pipe_handle=45) --multiprocessing-fork (python3.8)
daemon     1711   0.0  0.1   72244  44732  -  I    16:51      0:00.41 python3 /usr/local/bin/wsdd.py (python3.8)
root       1974   0.0  0.1   49232  35772 v0  Is+  16:51      0:00.19 python3 /etc/netcli (python3.8)
root       8312   0.0  0.0   11508   2964  0  S+   21:07      0:00.00 grep python

Nothing stands out to me. No heavy CPU or memory use. So, not really sure what to make of this.

mrMuppet · Oct 30, 2020

This morning (while the next slowdown) i watched the system processes that took nearly 100% of processor: this time it was "sysctl" and there were quite a lot of sysctl-processes running. I'm not sure why this is, because now on a fresh started system (only running for 2 Hours now) there is not one "sysctl" process when i list processes with "top" .

What can i do?

sretalla · Oct 30, 2020

AirborneTrooper said:
Don't know if this will do anything but after rebooting, here's my results. I really am not familiar enough with FreeNAS logging to get more info. Been using FreeNAS since 9.3 and this is the first time I've had any serious problems.

I saw you're running UPS... I had some issues with my email config in the setup of that... maybe try running without the UPS service (or at least verify your config in there).

Also, check /var/log/messages for anything.

sretalla · Oct 30, 2020

mrMuppet said:
What can i do?

Have a look at this thread which covers your issues more closely than the one you're in, which is more about the SNMP issue...

VERY HIGH CPU after upgrade from FreeNAS 11.3

System has ran great for months on FreeNAS 11.3. Upgraded to 12 Release a couple days ago. Its got dual Xeon 6 core processors and 32GB of RAM. Where do I start to diagnose this problem? GUI takes forever to load as well.

www.truenas.com

AirborneTrooper · Oct 30, 2020

I suspect my system can't even make a full 24 hours or just barely over it before freezing. top and htop commands won't execute in ssh. My only option is to reboot. I will turn off UPS and check any other service to make sure SNMP isn't being used. I know UPS has a thing to email me so maybe that's one hold up. TrueNAS needs to release a patch ASAP.

AirborneTrooper · Oct 30, 2020

sretalla said:
Also, check /var/log/messages for anything.

After rebooting and being able to get back in, I see a lot of these

Code:

Oct 30 08:05:21 truenas 1 2020-10-30T08:05:21.985936-04:00 truenas.local collectd 1790 - - Traceback (most recent call last):
  File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
    with Client() as c:
  File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 281, in __init__
    self._ws.connect()
  File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 124, in connect
    rv = super(WSClient, self).connect()
  File "/usr/local/lib/python3.8/site-packages/ws4py/client/__init__.py", line 223, in connect
    bytes = self.sock.recv(128)
socket.timeout: timed out

I see a lot of these as well. I have turned off the service now.

Code:

Oct 30 05:58:02 truenas 1 2020-10-30T05:58:02.020700-04:00 truenas.local upsmon 1707 - - Poll UPS [ups@localhost:3493] failed - Server disconnected
Oct 30 05:58:10 truenas 1 2020-10-30T05:58:02.250753-04:00 truenas.local upsmon 1707 - - Communications with UPS ups@localhost:3493 lost
Oct 30 05:58:10 truenas 1 2020-10-30T05:58:04.067364-04:00 truenas.local collectd 1790 - - nut plugin: nut_read: upscli_list_start (ups) failed: Server disconnected
Oct 30 05:58:10 truenas 1 2020-10-30T05:58:07.928305-04:00 truenas.local upsd 1700 - - Data for UPS [ups] is stale - check driver
Oct 30 05:58:10 truenas 1 2020-10-30T05:58:07.928374-04:00 truenas.local upsd 1700 - - write() failed for 127.0.0.1: Broken pipe
Oct 30 05:58:10 truenas 1 2020-10-30T05:58:07.928420-04:00 truenas.local upsd 1700 - - write() failed for 127.0.0.1: Broken pipe
Oct 30 05:58:15 truenas 1 2020-10-30T05:58:15.308779-04:00 truenas.local upsmon 1707 - - Can't login to UPS [ups@localhost:3493]: Server disconnected
Oct 30 05:58:15 truenas 1 2020-10-30T05:58:15.477471-04:00 truenas.local upsd 1700 - - write() failed for 127.0.0.1: Broken pipe
Oct 30 05:58:15 truenas 1 2020-10-30T05:58:15.849232-04:00 truenas.local collectd 1790 - - nut plugin: nut_read: upscli_list_start (ups) failed: Data stale
Oct 30 05:58:29 truenas 1 2020-10-30T05:58:29.417022-04:00 truenas.local upsmon 1707 - - Poll UPS [ups@localhost:3493] failed - Driver not connected
Oct 30 05:58:29 truenas 1 2020-10-30T05:58:29.471866-04:00 truenas.local upsd 1700 - - UPS [ups] data is no longer stale
Oct 30 05:58:34 truenas 1 2020-10-30T05:58:34.428216-04:00 truenas.local upsmon 1707 - - Communications with UPS ups@localhost:3493 established

sretalla · Oct 30, 2020

AirborneTrooper said:
After rebooting and being able to get back in, I see a lot of these

I think this is something already being looked into and isn't related to the slowdown... seems we all have it.

AirborneTrooper said:
I see a lot of these as well. I have turned off the service now.

I was seeing the same... seems something changed about the UPS setup and I didn't manage to figure out what needed changing to fix it yet (as I got distracted by the SNMP problem in between).

AirborneTrooper · Oct 30, 2020

Well I rebooted around 9:45am so I guess I’ll see if turning off UPS service let’s it stay unfrozen past tomorrow.

FrankNAS · Oct 30, 2020

the sysctl/snmp issue should be fixed in the next release: https://jira.ixsystems.com/browse/NAS-108050

edit: if possible try and disable anything that calls

Code:

sysctl -a

as a few of the sysctl -a calls seem to take a very long time and thus cause the slowdowns
I think even generating a debug log will cause this behavior too
you can take a look at the likely problematic ones in the pull request https://github.com/freenas/freenas/pull/5939
edit2: im going to try and make the changes from the pull request and see if that resolves the issue. I like having snmp and am too impatient to wait for the next release
edit3:

Code:

 sysctl kstat.zfs

for me is the one that causes the unresponsiveness as in its been over a minute and still hasnt output anything past

Code:

 kstat.zfs.misc.dbufstats.cache_count

all of my other services and jails are working fine but both console and gui are unresponsive
edit4:

Code:

 File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read

once that occurred it finished up quickly. I manually made the changes to the files from pull request and turned snmp back on. ill see what happens next

tangles · Oct 30, 2020

For me, stopping my 3 Jails cleared up the slowness… SSH and GUI would stall for > 60 seconds at a time.
sysctl was constantly > 95% on one cpu when using top.
70GB 0f 96GB RAM free and ARC only sitting at 10GB now after uptime of 10 mins.
ARC would normally grow to leave around 3GB free once it fills up servicing the 3 pools.

Services running:

SSH

SMB

ZeroTier

Jails are:

Netdata

Hoobs

QBittorrent

Am happy to delete jails and rebuild them from scratch (bit annoying but not the end of the world) when I have more time.

I noticed I cannot change the train in the GUI for updates… Is this because there's no nightlies offered post 12.0-release yet?

AirborneTrooper · Oct 31, 2020

In another thread, I did this https://www.truenas.com/community/threads/snmp-agent-py-continuously-hi-cpu-use.88262/post-611348

Won't be able to tell if it works or not until tomorrow morning. I turned off UPS service yesterday and GUI was still frozen this morning. I wish 12.0 let you change trains to nightly because this is super annoying having to reboot every morning.

jsylvia007 · Oct 31, 2020

AirborneTrooper said:
In another thread, I did this https://www.truenas.com/community/threads/snmp-agent-py-continuously-hi-cpu-use.88262/post-611348

Won't be able to tell if it works or not until tomorrow morning. I turned off UPS service yesterday and GUI was still frozen this morning. I wish 12.0 let you change trains to nightly because this is super annoying having to reboot every morning.

You've already said that you have the SNMP service disabled. Changing that script won't likely make a difference for you, unless the script is running without your knowledge.

AirborneTrooper · Oct 31, 2020

Wasn’t exactly sure but I’ll just reboot every morning until the next update comes out then. Learned my lesson and will not do this big of an update until I read forums to see if people are having issues

jsylvia007 · Oct 31, 2020

AirborneTrooper said:
Wasn’t exactly sure but I’ll just reboot every morning until the next update comes out then. Learned my lesson and will not do this big of an update until I read forums to see if people are having issues

This really is uncharacteristic too. I feel your pain. Luckily my issue was solved by disabling SNMP. I specifically waited through all the RCs because I didn't want to see these issues. Apparently none of this was found in the RCs though, which is slightly troubling.

Still, I will say that FreeNAS/TrueNAS has been one of the most stable pieces in my environment for over the last 15 years, and the value proposition is excellent.

AirborneTrooper · Oct 31, 2020

I’ve never had this kind of problem. The worst thing I ever experienced was when Corral came out and I rolled back and gladly waited till that dumpster fire was put out. If I didn’t upgrade my zpools I would go back to 11.3 as it was working great.

jsylvia007 · Oct 31, 2020

AirborneTrooper said:
I’ve never had this kind of problem. The worst thing I ever experienced was when Corral came out and I rolled back and gladly waited till that dumpster fire was put out. If I didn’t upgrade my zpools I would go back to 11.3 as it was working great.

Oh dear lord... I had actually forgotten about that abomination... Thankfully though, they saw the error of their ways almost immediately LOL.

Important Announcement for the TrueNAS Community.

Massive System Slowdown

Contributor

Contributor

Contributor

Contributor

Dabbler

Contributor

Powered by Neutrality

Powered by Neutrality

Contributor

Contributor

Powered by Neutrality

Contributor

Contributor

Dabbler

Contributor

Explorer

Contributor

Explorer

Contributor

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Massive System Slowdown"

Similar threads