snmp-agent.py continuously hi CPU use

Brezlord

Contributor
Joined
Jan 7, 2017
Messages
189
Is anyone else seeing snmp-agent.py continuously using a lot of CPU?

Screenshot from 2020-10-25 11-30-19.png
 

bmh.01

Explorer
Joined
Oct 4, 2013
Messages
70
Doing a bit more digging into it it seems to be blocking when the script attempts to attempts to get a filtered list of kstat.zfs sysctls. The problem child seems to be kstat.zfs.misc.dbufs as this will cause sysctl to hang as well so looks like an OS bug. I was surprised at the use of FreeBSD 12.2-RC3 for the release version of 12.0 and it seems to be fixed in the nightly build.
 

Brezlord

Contributor
Joined
Jan 7, 2017
Messages
189
Thanks. I'm also getting the console spammed with;
Code:
Oct 25 15:35:10 nas1.breznet.com syslog-ng[10836]: syslog-ng starting up; version='3.25.1'
Oct 25 15:35:10 nas1.breznet.com kernel: pid 2352 (syslog-ng), jid 0, uid 0: exited on signal 6 (core dumped)
Oct 25 15:44:51 nas1.breznet.com 1 2020-10-25T15:44:51.618802+08:00 nas1.breznet.com collectd 8855 - - Traceback (most recent call last):
  File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
    with Client() as c:
  File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 281, in __init__
    self._ws.connect()
  File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 124, in connect
    rv = super(WSClient, self).connect()
  File "/usr/local/lib/python3.8/site-packages/ws4py/client/__init__.py", line 223, in connect
    bytes = self.sock.recv(128)
socket.timeout: timed out
Oct 25 16:34:05 nas1.breznet.com 1 2020-10-25T08:34:05.106777+00:00 nas1.breznet.com ctld 12066 - - 192.168.110.201: read: connection lost
Oct 25 16:34:05 nas1.breznet.com 1 2020-10-25T16:34:05.107130+08:00 nas1.breznet.com ctld 12031 - - child process 12066 terminated with exit status 1
Oct 25 16:34:05 nas1.breznet.com 1 2020-10-25T16:34:05.701865+08:00 nas1.breznet.com ctld 12067 - - 192.168.110.201 (iqn.2020-06.com.example:74500800): read: Connection reset by peer
Oct 25 16:34:05 nas1.breznet.com 1 2020-10-25T16:34:05.702124+08:00 nas1.breznet.com ctld 12031 - - child process 12067 terminated with exit status 1
Oct 25 16:34:19 nas1.breznet.com 1 2020-10-25T16:34:19.613028+08:00 nas1.breznet.com ctld 12078 - - 192.168.110.201: read: connection lost
Oct 25 16:34:19 nas1.breznet.com 1 2020-10-25T16:34:19.613308+08:00 nas1.breznet.com ctld 12031 - - child process 12078 terminated with exit status 1
Oct 25 16:34:20 nas1.breznet.com 1 2020-10-25T16:34:20.132292+08:00 nas1.breznet.com ctld 12079 - - 192.168.110.201 (iqn.2020-06.com.example:74500800): read: Connection reset by peer
Oct 25 16:34:20 nas1.breznet.com 1 2020-10-25T16:34:20.132630+08:00 nas1.breznet.com ctld 12031 - - child process 12079 terminated with exit status 1
Oct 25 16:34:20 nas1.breznet.com 1 2020-10-25T16:34:20.138839+08:00 nas1.breznet.com ctld 12080 - - 192.168.120.201: read: connection lost
Oct 25 16:34:20 nas1.breznet.com 1 2020-10-25T16:34:20.139100+08:00 nas1.breznet.com ctld 12031 - - child process 12080 terminated with exit status 1
Oct 25 16:34:20 nas1.breznet.com 1 2020-10-25T16:34:20.329054+08:00 nas1.breznet.com ctld 12082 - - 192.168.110.201: read: connection lost
Oct 25 16:34:20 nas1.breznet.com 1 2020-10-25T16:34:20.329335+08:00 nas1.breznet.com ctld 12031 - - child process 12082 terminated with exit status 1
Oct 25 16:34:29 nas1.breznet.com 1 2020-10-25T16:34:29.144562+08:00 nas1.breznet.com ctld 12084 - - 192.168.110.201: read: connection lost
Oct 25 16:34:29 nas1.breznet.com 1 2020-10-25T16:34:29.144844+08:00 nas1.breznet.com ctld 12031 - - child process 12084 terminated with exit status 1
Oct 25 16:34:29 nas1.breznet.com 1 2020-10-25T16:34:29.692936+08:00 nas1.breznet.com ctld 12085 - - 192.168.110.201 (iqn.2020-06.com.example:74500800): read: Connection reset by peer
Oct 25 16:34:29 nas1.breznet.com 1 2020-10-25T16:34:29.693318+08:00 nas1.breznet.com ctld 12031 - - child process 12085 terminated with exit status 1
Oct 25 16:34:29 nas1.breznet.com 1 2020-10-25T16:34:29.700095+08:00 nas1.breznet.com ctld 12086 - - 192.168.120.201: read: connection lost
Oct 25 16:34:29 nas1.breznet.com 1 2020-10-25T16:34:29.700375+08:00 nas1.breznet.com ctld 12031 - - child process 12086 terminated with exit status 1
Oct 25 16:34:29 nas1.breznet.com 1 2020-10-25T16:34:29.950358+08:00 nas1.breznet.com ctld 12088 - - 192.168.110.201: read: connection lost
Oct 25 16:34:29 nas1.breznet.com 1 2020-10-25T16:34:29.950645+08:00 nas1.breznet.com ctld 12031 - - child process 12088 terminated with exit status 1
Oct 25 16:35:03 nas1.breznet.com 1 2020-10-25T16:35:03.416192+08:00 nas1.breznet.com ctld 12113 - - 192.168.120.201: read: connection lost
Oct 25 16:35:03 nas1.breznet.com 1 2020-10-25T16:35:03.416498+08:00 nas1.breznet.com ctld 12031 - - child process 12113 terminated with exit status 1
Oct 25 16:35:03 nas1.breznet.com 1 2020-10-25T16:35:03.662563+08:00 nas1.breznet.com ctld 12114 - - 192.168.120.201 (iqn.2020-06.com.example:74500800): read: Connection reset by peer
Oct 25 16:35:03 nas1.breznet.com 1 2020-10-25T16:35:03.662794+08:00 nas1.breznet.com ctld 12031 - - child process 12114 terminated with exit status 1
Oct 25 16:35:08 nas1.breznet.com 1 2020-10-25T16:35:08.131250+08:00 nas1.breznet.com ctld 12121 - - 192.168.120.201: read: connection lost
Oct 25 16:35:08 nas1.breznet.com 1 2020-10-25T16:35:08.131521+08:00 nas1.breznet.com ctld 12031 - - child process 12121 terminated with exit status 1
Oct 25 16:35:08 nas1.breznet.com 1 2020-10-25T16:35:08.662640+08:00 nas1.breznet.com ctld 12130 - - 192.168.120.201 (iqn.2020-06.com.example:74500800): read: Connection reset by peer
Oct 25 16:35:08 nas1.breznet.com 1 2020-10-25T16:35:08.662933+08:00 nas1.breznet.com ctld 12031 - - child process 12130 terminated with exit status 1
Oct 25 16:35:08 nas1.breznet.com 1 2020-10-25T16:35:08.668263+08:00 nas1.breznet.com ctld 12133 - - 192.168.120.201: read: connection lost
Oct 25 16:35:08 nas1.breznet.com 1 2020-10-25T16:35:08.668488+08:00 nas1.breznet.com ctld 12031 - - child process 12133 terminated with exit status 1
Oct 25 16:35:08 nas1.breznet.com 1 2020-10-25T16:35:08.889613+08:00 nas1.breznet.com ctld 12135 - - 192.168.110.201: read: connection lost
Oct 25 16:35:08 nas1.breznet.com 1 2020-10-25T16:35:08.889839+08:00 nas1.breznet.com ctld 12031 - - child process 12135 terminated with exit status 1
Oct 25 16:35:09 nas1.breznet.com 1 2020-10-25T16:35:09.779898+08:00 nas1.breznet.com smartd 8967 - - Device: /dev/nvme0, Temperature 51 Celsius reached critical limit of 50 Celsius (Min/Max 48/51!)
Oct 25 16:35:09 nas1.breznet.com 1 2020-10-25T16:35:09.780093+08:00 nas1.breznet.com smartd 8967 - - Device: /dev/nvme0, Temperature 51 Celsius reached critical limit of 50 Celsius (Min/Max 48/51!)
Oct 25 16:35:32 nas1.breznet.com 1 2020-10-25T16:35:32.989771+08:00 nas1.breznet.com ctld 12155 - - 192.168.120.201: read: connection lost
Oct 25 16:35:32 nas1.breznet.com 1 2020-10-25T16:35:32.990079+08:00 nas1.breznet.com ctld 12031 - - child process 12155 terminated with exit status 1
Oct 25 16:35:33 nas1.breznet.com 1 2020-10-25T16:35:33.502262+08:00 nas1.breznet.com ctld 12156 - - 192.168.120.201 (iqn.2020-06.com.example:74500800): read: Connection reset by peer
Oct 25 16:35:33 nas1.breznet.com 1 2020-10-25T16:35:33.502639+08:00 nas1.breznet.com ctld 12031 - - child process 12156 terminated with exit status 1
Oct 25 16:35:33 nas1.breznet.com 1 2020-10-25T16:35:33.507587+08:00 nas1.breznet.com ctld 12157 - - 192.168.120.201: read: connection lost
Oct 25 16:35:33 nas1.breznet.com 1 2020-10-25T16:35:33.507859+08:00 nas1.breznet.com ctld 12031 - - child process 12157 terminated with exit status 1
Oct 25 16:35:33 nas1.breznet.com 1 2020-10-25T16:35:33.709594+08:00 nas1.breznet.com ctld 12159 - - 192.168.110.201: read: connection lost
Oct 25 16:35:33 nas1.breznet.com 1 2020-10-25T16:35:33.709883+08:00 nas1.breznet.com ctld 12031 - - child process 12159 terminated with exit status 1
Oct 25 16:35:53 nas1.breznet.com 1 2020-10-25T16:35:53.154735+08:00 nas1.breznet.com ctld 12161 - - 192.168.120.201: read: connection lost
Oct 25 16:35:53 nas1.breznet.com 1 2020-10-25T16:35:53.155054+08:00 nas1.breznet.com ctld 12031 - - child process 12161 terminated with exit status 1
Oct 25 16:35:53 nas1.breznet.com 1 2020-10-25T16:35:53.602552+08:00 nas1.breznet.com ctld 12162 - - 192.168.120.201 (iqn.2020-06.com.example:74500800): read: Connection reset by peer
Oct 25 16:35:53 nas1.breznet.com 1 2020-10-25T16:35:53.602914+08:00 nas1.breznet.com ctld 12031 - - child process 12162 terminated with exit status 1
Oct 25 16:35:53 nas1.breznet.com 1 2020-10-25T16:35:53.608107+08:00 nas1.breznet.com ctld 12163 - - 192.168.120.201: read: connection lost
Oct 25 16:35:53 nas1.breznet.com 1 2020-10-25T16:35:53.608364+08:00 nas1.breznet.com ctld 12031 - - child process 12163 terminated with exit status 1
Oct 25 16:35:53 nas1.breznet.com 1 2020-10-25T16:35:53.793097+08:00 nas1.breznet.com ctld 12165 - - 192.168.110.201: read: connection lost
Oct 25 16:35:53 nas1.breznet.com 1 2020-10-25T16:35:53.793368+08:00 nas1.breznet.com ctld 12031 - - child process 12165 terminated with exit status 1
Oct 25 16:36:14 nas1.breznet.com 1 2020-10-25T16:36:14.616283+08:00 nas1.breznet.com ctld 12192 - - 192.168.120.201: read: connection lost
Oct 25 16:36:14 nas1.breznet.com 1 2020-10-25T16:36:14.616536+08:00 nas1.breznet.com ctld 12031 - - child process 12192 terminated with exit status 1
Oct 25 16:36:15 nas1.breznet.com 1 2020-10-25T16:36:15.162472+08:00 nas1.breznet.com ctld 12193 - - 192.168.120.201 (iqn.2020-06.com.example:74500800): read: Connection reset by peer
Oct 25 16:36:15 nas1.breznet.com 1 2020-10-25T16:36:15.162715+08:00 nas1.breznet.com ctld 12031 - - child process 12193 terminated with exit status 1
Oct 25 16:36:15 nas1.breznet.com 1 2020-10-25T16:36:15.167273+08:00 nas1.breznet.com ctld 12194 - - 192.168.120.201: read: connection lost
Oct 25 16:36:15 nas1.breznet.com 1 2020-10-25T16:36:15.167498+08:00 nas1.breznet.com ctld 12031 - - child process 12194 terminated with exit status 1
Oct 25 16:36:15 nas1.breznet.com 1 2020-10-25T16:36:15.390102+08:00 nas1.breznet.com ctld 12196 - - 192.168.110.201: read: connection lost
Oct 25 16:36:15 nas1.breznet.com 1 2020-10-25T16:36:15.390323+08:00 nas1.breznet.com ctld 12031 - - child process 12196 terminated with exit status 1
Oct 25 17:29:05 nas1.breznet.com 1 2020-10-25T17:29:05.495744+08:00 nas1.breznet.com collectd 12658 - - Traceback (most recent call last):
  File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
    with Client() as c:
  File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 281, in __init__
    self._ws.connect()
  File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 124, in connect
    rv = super(WSClient, self).connect()
  File "/usr/local/lib/python3.8/site-packages/ws4py/client/__init__.py", line 223, in connect
    bytes = self.sock.recv(128)
socket.timeout: timed out
Oct 25 17:34:05 nas1.breznet.com 1 2020-10-25T17:34:05.494812+08:00 nas1.breznet.com collectd 12658 - - Traceback (most recent call last):
  File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
    with Client() as c:
  File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 281, in __init__
    self._ws.connect()
  File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 124, in connect
    rv = super(WSClient, self).connect()
  File "/usr/local/lib/python3.8/site-packages/ws4py/client/__init__.py", line 223, in connect
    bytes = self.sock.recv(128)
socket.timeout: timed out
Oct 25 17:49:06 nas1.breznet.com 1 2020-10-25T17:49:06.328622+08:00 nas1.breznet.com collectd 12658 - - Traceback (most recent call last):
  File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
    with Client() as c:
  File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 284, in __init__
    raise ClientException('Failed connection handshake')
middlewared.client.client.ClientException: Failed connection handshake
Oct 25 20:09:05 nas1.breznet.com 1 2020-10-25T20:09:05.492004+08:00 nas1.breznet.com collectd 12658 - - Traceback (most recent call last):
  File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
    with Client() as c:
  File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 281, in __init__
    self._ws.connect()
  File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 124, in connect
    rv = super(WSClient, self).connect()
  File "/usr/local/lib/python3.8/site-packages/ws4py/client/__init__.py", line 223, in connect
    bytes = self.sock.recv(128)
socket.timeout: timed out
Oct 25 20:34:05 nas1.breznet.com 1 2020-10-25T20:34:05.491245+08:00 nas1.breznet.com collectd 12658 - - Traceback (most recent call last):
  File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
    with Client() as c:
  File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 281, in __init__
    self._ws.connect()
  File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 124, in connect
    rv = super(WSClient, self).connect()
  File "/usr/local/lib/python3.8/site-packages/ws4py/client/__init__.py", line 223, in connect
    bytes = self.sock.recv(128)
socket.timeout: timed out
 

Brezlord

Contributor
Joined
Jan 7, 2017
Messages
189
Can also confirm that snmp is not reporting to Zabbix anymore.
 

bmh.01

Explorer
Joined
Oct 4, 2013
Messages
70
If you change snmp-agent.py line 32 to "kstat.zfs.misc.arcstats" it'll avoid the (presently unneeded) problem dbufs sysctl and work as it did before. Hopefully they'll get the kernel patch in for the next update/release :confused:.

I get the console spam as well as trouble with missing graphite data, it's caused (from my initial troubleshooting) when the sysctl poll causes a process hang. With the problem agent script if you run 'top -aS -s 1' you'll see the output hang for 5-10+ seconds at a time.
 

bmh.01

Explorer
Joined
Oct 4, 2013
Messages
70
From the bug report you can see collected/graphite data with snmp enabled vs disabled. The longer the system runs the worse the hangs get.

processes-freenas-dynamic-dashboard.png
 

mjt5282

Contributor
Joined
Mar 19, 2013
Messages
139
I am also seeing this snmp agent behavior. it is interfering with system throughput and "snappiness". I never noticed it with the betas or RC's.
Much thanks to Brezlord for documenting the issue *and* providing a patch.

Added: After 2-3 days of running unpatched 12.0-RELEASE, my system became borderline unresponsive and required reboot. I might even recommend a hotfix release as I felt the performance dropped so badly left unpatched.
 
Last edited:
Joined
Jan 13, 2017
Messages
5
Just to chime in with thanks... Was really getting annoyed with SNMP misbehavior, had tracked it down to zfs_dbgmsg making sysctl slow. Finally found this thread, and woohoo, it works! Thank you, @bmh.01!
 

ddrucker

Dabbler
Joined
Feb 24, 2020
Messages
19
I'm still seeing this 3+ years later, on 13.0-U6.1. Any suggestions? The smtp-agent.py file has changed so I can't apply the @bmh.01 fix - I don't know what line to apply it to.
 

t7MevELx0

Cadet
Joined
Feb 8, 2024
Messages
1
I'm seeing this on our TrueNAS running version TrueNAS-13.0-U5.3.

Why was this never fixed? Do the TrueNAS devs monitor this forum? Has anyone opened a ticket with them about this?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
Top