Massive System Slowdown

FrankNAS

Contributor
Joined
Dec 3, 2017
Messages
111
its been about 1 day since I made the changes posted in the pull request and thus far have no issues with snmp causing slow downs. it still is typically at the top of cpu usage (10-15%) but never near the 100% + unresponsive console/gui that existed
 

tangles

Dabbler
Joined
Jan 12, 2018
Messages
33
For me, stopping my 3 Jails cleared up the slowness… SSH and GUI would stall for > 60 seconds at a time.
sysctl was constantly > 95% on one cpu when using top.
70GB 0f 96GB RAM free and ARC only sitting at 10GB now after uptime of 10 mins.
ARC would normally grow to leave around 3GB free once it fills up servicing the 3 pools.

Services running:
SSH​
SMB​
ZeroTier​
Jails are:
Netdata​
Hoobs​
QBittorrent​
Am happy to delete jails and rebuild them from scratch (bit annoying but not the end of the world) when I have more time.

I noticed I cannot change the train in the GUI for updates… Is this because there's no nightlies offered post 12.0-release yet?
For me it was the Hoobs jail that caused sysctl -a to consume >95% of one cpu.

I deleted the jail and re-added it and still got the same result without configuring Hoobs at all, so it's not any previous config settings causing the issue for me.

Additionally, I applied the change to line 32 for file /usr/local/bin/snmp-agent.py which didn't help me and so clearly there's a mix of root causes for people. (I've never ran the snmp agent and so wasn't expecting this to help)
 

FrankNAS

Contributor
Joined
Dec 3, 2017
Messages
111
the changes from the pull request only fix issues with 1) snmp service provided by truenas via snmp-agent.py and 2) debug logs with zfs config....I think.
anything else that does sysctl -a or even sysctl kstat.zfs will have a bad time. there is a CTLFLAG_SKIP under SYSCTL(9) so I guess you could write a kernel module that skips the problematic entries and load that...assuming that SYSCTL(8) honors that flag which...um...I cant tell from just a quick search.
 

AirborneTrooper

Contributor
Joined
Jun 20, 2014
Messages
148
I left htop running all day yesterday so I could capture what is causing the GUI to become unresponsive and mine is definitely sysctl -a. I'm not sure how to fix it or if I just have to wait for TrueNAS to release the next update with the patch.
 

Attachments

  • htop.PNG
    htop.PNG
    26.9 KB · Views: 197

AirborneTrooper

Contributor
Joined
Jun 20, 2014
Messages
148
the changes from the pull request only fix issues with 1) snmp service provided by truenas via snmp-agent.py and 2) debug logs with zfs config....I think.
anything else that does sysctl -a or even sysctl kstat.zfs will have a bad time. there is a CTLFLAG_SKIP under SYSCTL(9) so I guess you could write a kernel module that skips the problematic entries and load that...assuming that SYSCTL(8) honors that flag which...um...I cant tell from just a quick search.

Can you tell me how to actually make those sysctl changes from the pull request? I don't know where to even begin.
 

tangles

Dabbler
Joined
Jan 12, 2018
Messages
33
Can you tell me how to actually make those sysctl changes from the pull request? I don't know where to even begin.
First of all, as FrankNAS pointed out, the pull requests may only help depending on what you're running.
What jails/plugins do you have running? and what services do you have enabled.
 

AirborneTrooper

Contributor
Joined
Jun 20, 2014
Messages
148
I have Sonarr/Radarr, sabnzbd, plex, organizr, tautulli, transmission, NGINX reverse proxy, and homebridge for jails. For services I have SMART, SMB, and SSH on.
 

FrankNAS

Contributor
Joined
Dec 3, 2017
Messages
111
Pull request I referenced for snmp works only for snmp-agent.py and shows up as that in h/top
Do you have reporting turned on? (from gui: System -> Reporting) as that is the only other thing that I can think of that would call sysctl
 

tangles

Dabbler
Joined
Jan 12, 2018
Messages
33
turn off homebridge.

That's what my money's on… :wink:
 

AirborneTrooper

Contributor
Joined
Jun 20, 2014
Messages
148
Ok I’ll turn it off and see if GUI is still working in the morning or not. Just odd cause all my apps work, it’s just the GUI that’s completely locked up.
 

tangles

Dabbler
Joined
Jan 12, 2018
Messages
33
Sounds the same as me… I'd have 30 seconds of complete unresponsive in the GUI and ssh shell until I switched off Hoobs (which is a wrapper for homebridge). If I enabled Hoobs again, unresponsiveness instantly returned.

when you run top, do you have any process constantly above 90%? for me it was "sysctl -a" which completely went away when switching off Hoobs.
 

AirborneTrooper

Contributor
Joined
Jun 20, 2014
Messages
148
Yeah mine was at 100% this morning. My GUI doesn’t respond at all though. Only thing I’ve been able to do so far to fix is reboot daily. I’ll check if stopping my homebridge jail prevents the lockup tomorrow.
 

tangles

Dabbler
Joined
Jan 12, 2018
Messages
33
don't just wait until tomorrow… I'd be immediately looking at "Display System Processes" just below Virtual Machines on the LHS and observe.
because you might as well disable all jails if Hoobs makes no difference and then switch them on one at a time to see if you can identify the one(s) that might be your root cause.
 

AirborneTrooper

Contributor
Joined
Jun 20, 2014
Messages
148
Guess it was HomeBridge because my GUI never froze today. I kinda need that turned on but I suppose I'll have to leave it off until the next TrueNAS update.
 

tangles

Dabbler
Joined
Jan 12, 2018
Messages
33
Glad you identified your root cause.
Did you happen to systematically switch back on your other jails to rule out any others? You were running plenty more services than me.
Could be helpful for others should the issue be hiding in other jails other than Hoobs/Homebridge.
You could setup/move Homebridge to a VM or via Docker in the meantime perhaps.
 

tangles

Dabbler
Joined
Jan 12, 2018
Messages
33
Well, am glad you got it sorted.

Cheers.
 

AirborneTrooper

Contributor
Joined
Jun 20, 2014
Messages
148
Now let’s just hope the next update fixes it. IIRC, homebridge runs on an older version of python so maybe that has something to do with it.
 

FrankNAS

Contributor
Joined
Dec 3, 2017
Messages
111
well, if homebridge ever calls sysctl -a or sysctl kstat.zfs it will run into some very expensive procedures
 
Top