marahin
Cadet
- Joined
- Jul 10, 2022
- Messages
- 5
Hi.
I'm running TrueNAS-12.0-U8.1 on Intel(R) Core(TM) i3-8100 CPU @ 3.60GHz, 8GB RAM with two pools (one being HDD 1,64TiB Available space Z2, second 848 GiB SSD Z1).
I've noticed that sometimes I get my own reporting flashing that the host is unavailable, and all NFS shares inacessible through Kubernetes. It is worth to mention that this is a primary objective of this TrueNAS instance: exposing storage to applications over NFS.
This doesn't look like it's happening on a regular basis. I've noticed that it doesn't happen more often than 2 weeks, usually being 2-3 weeks apart.
I've noticed that this TrueNAS instance is, during this period, unresponding to SSH or through web interface.
It "comes back" after some time (between 20 minutes and 1hr30m), but this is how reporting looks during those periods: https://imgur.com/a/vfkgEUF
What can that be? I've tried solution of loading a "if_re.ko" tunable in the past: https://www.truenas.com/community/t...on-alder-lake-cpu-i5-12600k.98195/post-680526 - and that helped, or so I thought.
But it has been coming up recently again.
/var/log/messages indeed show some interesting errors about the times the server was unresponsive: https://gist.github.com/Marahin/2344d3042193e6823e205981b99de160
I'm running TrueNAS-12.0-U8.1 on Intel(R) Core(TM) i3-8100 CPU @ 3.60GHz, 8GB RAM with two pools (one being HDD 1,64TiB Available space Z2, second 848 GiB SSD Z1).
I've noticed that sometimes I get my own reporting flashing that the host is unavailable, and all NFS shares inacessible through Kubernetes. It is worth to mention that this is a primary objective of this TrueNAS instance: exposing storage to applications over NFS.
This doesn't look like it's happening on a regular basis. I've noticed that it doesn't happen more often than 2 weeks, usually being 2-3 weeks apart.
I've noticed that this TrueNAS instance is, during this period, unresponding to SSH or through web interface.
It "comes back" after some time (between 20 minutes and 1hr30m), but this is how reporting looks during those periods: https://imgur.com/a/vfkgEUF



What can that be? I've tried solution of loading a "if_re.ko" tunable in the past: https://www.truenas.com/community/t...on-alder-lake-cpu-i5-12600k.98195/post-680526 - and that helped, or so I thought.
But it has been coming up recently again.
/var/log/messages indeed show some interesting errors about the times the server was unresponsive: https://gist.github.com/Marahin/2344d3042193e6823e205981b99de160
Jul 16 08:16:12 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 17542, size: 65536
Jul 16 08:16:12 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 5972, size: 12288
Jul 16 08:18:45 storagemaster swap_pager[1063]: Last message 'indefinite wait buff' repeated 1 times, suppressed by syslog-ng on storagemaster.local
Jul 16 08:20:10 storagemaster 1 2022-07-16T08:20:09.975838+02:00 storagemaster.local collectd 1407 - - Traceback (most recent call last):
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
with Client() as c:
File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 283, in __init__
self._ws.connect()
File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 124, in connect
rv = super(WSClient, self).connect()
File "/usr/local/lib/python3.9/site-packages/ws4py/client/__init__.py", line 223, in connect
bytes = self.sock.recv(128)
socket.timeout: timed out
Jul 16 10:19:51 storagemaster 1 2022-07-16T10:13:28.153135+02:00 storagemaster.local collectd 1407 - - Traceback (most recent call last):
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
with Client() as c:
File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 283, in __init__
self._ws.connect()
File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 124, in connect
rv = super(WSClient, self).connect()
File "/usr/local/lib/python3.9/site-packages/ws4py/client/__init__.py", line 223, in connect
bytes = self.sock.recv(128)
socket.timeout: timed out
Jul 16 10:22:08 storagemaster 1 2022-07-16T10:21:28.226691+02:00 storagemaster.local collectd 1407 - - Timeout collecting disk temperatures
Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 32717, size: 8192
Jul 16 10:35:38 storagemaster swap_pager[1063]: Last message 'indefinite wait buff' repeated 2 times, suppressed by syslog-ng on storagemaster.local
Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 15640, size: 40960
Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 528217, size: 49152
Jul 16 10:35:38 storagemaster swap_pager[1063]: Last message 'indefinite wait buff' repeated 1 times, suppressed by syslog-ng on storagemaster.local
Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 32717, size: 8192
Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 528217, size: 49152
Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 36489, size: 16384
Jul 16 10:35:38 storagemaster swap_pager: indefinite wait buffer: bufobj: 0, blkno: 15640, size: 40960