VERY HIGH CPU after upgrade from FreeNAS 11.3

prophoto

Explorer
Joined
Jul 27, 2015
Messages
61
System has ran great for months on FreeNAS 11.3. Upgraded to 12 Release a couple days ago. Its got dual Xeon 6 core processors and 32GB of RAM. Where do I start to diagnose this problem? GUI takes forever to load as well.

Screen Shot 2020-10-28 at 9.28.47 AM.png
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Do you have the SNMP service running?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
OK, so ruled out one of the possible causes of those symptoms... SNMP causing high CPU in a python3.8 process.
 

prophoto

Explorer
Joined
Jul 27, 2015
Messages
61
What log files should I be looking through and what am I looking for? Last night when I rebooted it was fine until I left. Came in this morning and its doing the same thing.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Missed this question in my last post. No, I have not upgraded my pools. Will that fix this problem?
No - it will allow you to roll back to FreeNAS 11.3 which doesn't have this bug, and wait for iXsystems to merge the upstream FreeBSD fix. Don't upgrade your pool yet as that's a one-way street.

Worth noting that I compared my service -e output to yours and the only additional thing you have running is rsyncd - have you tried disabling that as a test?
 

prophoto

Explorer
Joined
Jul 27, 2015
Messages
61
How will that affect my upgraded/new jails?

I stopped rsyncd. Im guessing I won't know if it helps until tomorrow. That will disable my cloud sync tasks, right?
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
How will that affect my upgraded/new jails?
Break them, probably. I believe there was a change in jail handling between 11.3 and 12.0.

I stopped rsyncd. Im guessing I won't know if it helps until tomorrow. That will disable my cloud sync tasks, right?

Does your top have a bunch of sysctl's chewing up CPU right now? And if you're using anything that depends on rsyncd yes it will fail - not sure if this would extend to a plugin-based one that may not use rsync but any FreeNAS replication jobs will break for sure.
 

prophoto

Explorer
Joined
Jul 27, 2015
Messages
61
Break them, probably. I believe there was a change in jail handling between 11.3 and 12.0.
I might have to wait then. Hopefully we can find a temporary workaround.

Does your top have a bunch of sysctl's chewing up CPU right now? And if you're using anything that depends on rsyncd yes it will fail - not sure if this would extend to a plugin-based one that may not use rsync but any FreeNAS replication jobs will break for sure.
Yes, it did, just as in the first screenshot. I ended up rebooting it. I don't have any replication jobs setup yet, just some file sync for local cloud backup. They can wait a bit since I've stopped rsync. If you think of any other way to prevent this from happening I'd appreciate it.

Here is current screenshot of top. This has turned into a semi mission critical machine since it runs Elasticsearch & Kibana in jails for monitoring our AWS web servers.

Screen Shot 2020-10-28 at 2.52.29 PM.png
 

prophoto

Explorer
Joined
Jul 27, 2015
Messages
61
Found this, SNMPTrap was enabled. Just disabled it. Not sure if this could have been a cause since @sretalla mentioned SNMP being an issue. At the time I could only access the CLI, GUI was unresponsive before a reboot.


Screen Shot 2020-10-28 at 4.05.29 PM.png
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I might have to wait then. Hopefully we can find a temporary workaround.
Any sysctl's hanging out in top still?

I've turned SNMP on for one of my test rigs, I had a bunch of python PIDs pop up but nothing hanging on sysctl yet. Manually running the supposedly long-stalling command to pull the dbufs from kstat doesn't choke my machine either. It takes a half-second or so, but by no means a grind to a halt for 5 minutes as the FreeBSD bug submitter described.
 

mjt5282

Contributor
Joined
Mar 19, 2013
Messages
139
pre-workaround patch snmp patch, i found my TN core 12.0-RELEASE machines take 2-3 days before they become laggy and unresponsive. generally, 64Gb Skylake and Kaby Lake Intel CPUs.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Hat tip to @bmh.01 for the quickfix in another thread, if this is related to the sysctl stall. Worth trying?

If you change snmp-agent.py line 32 to "kstat.zfs.misc.arcstats" it'll avoid the (presently unneeded) problem dbufs sysctl and work as it did before. Hopefully they'll get the kernel patch in for the next update/release :confused:.

I get the console spam as well as trouble with missing graphite data, it's caused (from my initial troubleshooting) when the sysctl poll causes a process hang. With the problem agent script if you run 'top -aS -s 1' you'll see the output hang for 5-10+ seconds at a time.
 

prophoto

Explorer
Joined
Jul 27, 2015
Messages
61
Any sysctl's hanging out in top still?
Its been 4 1/2 hours since reboot. I've seen one sysctl randomly popup in top. I've had the window open for a while and glancing at it here and there. I also have a few Python3.8's but that could be from my jails?
 

prophoto

Explorer
Joined
Jul 27, 2015
Messages
61
Just hit. Top is full of sysctl PIDs 5 hours 15 minutes after reboot. can I just restart sysctl? Do I need to reboot after making that code change listed above?
 

prophoto

Explorer
Joined
Jul 27, 2015
Messages
61
Here's what I see when I logged in this morning after 13 hours of uptime. I applied the fix from @bmh.01 but it didn't seem to help. Next suggestion?

Screen Shot 2020-10-29 at 8.23.06 AM.png
 
Top