TrueNAS-12.0-U8.1 not responding

marahin · Jul 16, 2022

Hi.
I'm running TrueNAS-12.0-U8.1 on Intel(R) Core(TM) i3-8100 CPU @ 3.60GHz, 8GB RAM with two pools (one being HDD 1,64TiB Available space Z2, second 848 GiB SSD Z1).
I've noticed that sometimes I get my own reporting flashing that the host is unavailable, and all NFS shares inacessible through Kubernetes. It is worth to mention that this is a primary objective of this TrueNAS instance: exposing storage to applications over NFS.
This doesn't look like it's happening on a regular basis. I've noticed that it doesn't happen more often than 2 weeks, usually being 2-3 weeks apart.

I've noticed that this TrueNAS instance is, during this period, unresponding to SSH or through web interface.
It "comes back" after some time (between 20 minutes and 1hr30m), but this is how reporting looks during those periods: https://imgur.com/a/vfkgEUF

What can that be? I've tried solution of loading a "if_re.ko" tunable in the past: https://www.truenas.com/community/t...on-alder-lake-cpu-i5-12600k.98195/post-680526 - and that helped, or so I thought.
But it has been coming up recently again.

/var/log/messages indeed show some interesting errors about the times the server was unresponsive: https://gist.github.com/Marahin/2344d3042193e6823e205981b99de160

Jul 16 08:16:12 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 17542, size: 65536
Jul 16 08:16:12 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 5972, size: 12288
Jul 16 08:18:45 storagemaster swap_pager[1063]: Last message 'indefinite wait buff' repeated 1 times, suppressed by syslog-ng on storagemaster.local
Jul 16 08:20:10 storagemaster 1 2022-07-16T08:20:09.975838+02:00 storagemaster.local collectd 1407 - - Traceback (most recent call last):
  File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
    with Client() as c:
  File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 283, in __init__
    self._ws.connect()
  File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 124, in connect
    rv = super(WSClient, self).connect()
  File "/usr/local/lib/python3.9/site-packages/ws4py/client/__init__.py", line 223, in connect
    bytes = self.sock.recv(128)
socket.timeout: timed out
Jul 16 10:19:51 storagemaster 1 2022-07-16T10:13:28.153135+02:00 storagemaster.local collectd 1407 - - Traceback (most recent call last):
  File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
    with Client() as c:
  File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 283, in __init__
    self._ws.connect()
  File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 124, in connect
    rv = super(WSClient, self).connect()
  File "/usr/local/lib/python3.9/site-packages/ws4py/client/__init__.py", line 223, in connect
    bytes = self.sock.recv(128)
socket.timeout: timed out
Jul 16 10:22:08 storagemaster 1 2022-07-16T10:21:28.226691+02:00 storagemaster.local collectd 1407 - - Timeout collecting disk temperatures
Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 32717, size: 8192
Jul 16 10:35:38 storagemaster swap_pager[1063]: Last message 'indefinite wait buff' repeated 2 times, suppressed by syslog-ng on storagemaster.local
Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 15640, size: 40960
Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 528217, size: 49152
Jul 16 10:35:38 storagemaster swap_pager[1063]: Last message 'indefinite wait buff' repeated 1 times, suppressed by syslog-ng on storagemaster.local
Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 32717, size: 8192
Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 528217, size: 49152
Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 36489, size: 16384
Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 15640, size: 40960

marahin · Jul 16, 2022

I/O of disks

marahin · Jul 16, 2022

While I was writing this, the server "froze" again. Unaccessible by SSH or by web interface. After a few minutes of waiting I gave up and hard rebooted. This is what /var/log/messages spew during the exact moment when the inaccessibility began:

Jul 16 12:25:50 storagemaster 1 2022-07-16T12:25:49.777287+02:00 storagemaster.local collectd 1407 - - Traceback (most recent call last):
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
with Client() as c:
File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 283, in __init__
self._ws.connect()
File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 124, in connect
rv = super(WSClient, self).connect()
File "/usr/local/lib/python3.9/site-packages/ws4py/client/__init__.py", line 223, in connect
bytes = self.sock.recv(128)
socket.timeout: timed out

Important Announcement for the TrueNAS Community.

TrueNAS-12.0-U8.1 not responding

marahin

Cadet

marahin

Cadet

marahin

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

TrueNAS-12.0-U8.1 not responding

marahin

Cadet

marahin

Cadet

marahin

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "TrueNAS-12.0-U8.1 not responding"

Similar threads