TrueNAS-12.0-U8.1 not responding

marahin

Cadet
Joined
Jul 10, 2022
Messages
5
Hi.
I'm running TrueNAS-12.0-U8.1 on Intel(R) Core(TM) i3-8100 CPU @ 3.60GHz, 8GB RAM with two pools (one being HDD 1,64TiB Available space Z2, second 848 GiB SSD Z1).
I've noticed that sometimes I get my own reporting flashing that the host is unavailable, and all NFS shares inacessible through Kubernetes. It is worth to mention that this is a primary objective of this TrueNAS instance: exposing storage to applications over NFS.
This doesn't look like it's happening on a regular basis. I've noticed that it doesn't happen more often than 2 weeks, usually being 2-3 weeks apart.

I've noticed that this TrueNAS instance is, during this period, unresponding to SSH or through web interface.
It "comes back" after some time (between 20 minutes and 1hr30m), but this is how reporting looks during those periods: https://imgur.com/a/vfkgEUF

XOqAIAb.png
9x941yw.png
V281sme.png


What can that be? I've tried solution of loading a "if_re.ko" tunable in the past: https://www.truenas.com/community/t...on-alder-lake-cpu-i5-12600k.98195/post-680526 - and that helped, or so I thought.
But it has been coming up recently again.

/var/log/messages indeed show some interesting errors about the times the server was unresponsive: https://gist.github.com/Marahin/2344d3042193e6823e205981b99de160

Jul 16 08:16:12 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 17542, size: 65536 Jul 16 08:16:12 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 5972, size: 12288 Jul 16 08:18:45 storagemaster swap_pager[1063]: Last message 'indefinite wait buff' repeated 1 times, suppressed by syslog-ng on storagemaster.local Jul 16 08:20:10 storagemaster 1 2022-07-16T08:20:09.975838+02:00 storagemaster.local collectd 1407 - - Traceback (most recent call last): File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read with Client() as c: File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 283, in __init__ self._ws.connect() File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 124, in connect rv = super(WSClient, self).connect() File "/usr/local/lib/python3.9/site-packages/ws4py/client/__init__.py", line 223, in connect bytes = self.sock.recv(128) socket.timeout: timed out Jul 16 10:19:51 storagemaster 1 2022-07-16T10:13:28.153135+02:00 storagemaster.local collectd 1407 - - Traceback (most recent call last): File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read with Client() as c: File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 283, in __init__ self._ws.connect() File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 124, in connect rv = super(WSClient, self).connect() File "/usr/local/lib/python3.9/site-packages/ws4py/client/__init__.py", line 223, in connect bytes = self.sock.recv(128) socket.timeout: timed out Jul 16 10:22:08 storagemaster 1 2022-07-16T10:21:28.226691+02:00 storagemaster.local collectd 1407 - - Timeout collecting disk temperatures Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 32717, size: 8192 Jul 16 10:35:38 storagemaster swap_pager[1063]: Last message 'indefinite wait buff' repeated 2 times, suppressed by syslog-ng on storagemaster.local Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 15640, size: 40960 Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 528217, size: 49152 Jul 16 10:35:38 storagemaster swap_pager[1063]: Last message 'indefinite wait buff' repeated 1 times, suppressed by syslog-ng on storagemaster.local Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 32717, size: 8192 Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 528217, size: 49152 Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 36489, size: 16384 Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 15640, size: 40960
 

marahin

Cadet
Joined
Jul 10, 2022
Messages
5
While I was writing this, the server "froze" again. Unaccessible by SSH or by web interface. After a few minutes of waiting I gave up and hard rebooted. This is what /var/log/messages spew during the exact moment when the inaccessibility began:

Jul 16 12:25:50 storagemaster 1 2022-07-16T12:25:49.777287+02:00 storagemaster.local collectd 1407 - - Traceback (most recent call last):
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
with Client() as c:
File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 283, in __init__
self._ws.connect()
File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 124, in connect
rv = super(WSClient, self).connect()
File "/usr/local/lib/python3.9/site-packages/ws4py/client/__init__.py", line 223, in connect
bytes = self.sock.recv(128)
socket.timeout: timed out
 
Last edited:
Top