Basil Hendroff
Wizard
- Joined
- Jan 4, 2014
- Messages
- 1,644
I have been running a SCALE test server for some time now (specs in my signature block - ref. truenas-s). My attention has been around migrating several single container apps from a Ubuntu VM on a CORE server running Docker and Docker-Compose to SCALE apps.
Every once in a while (not frequently, days/weeks can pass), I lose contact with the SCALE server. I can't access the webUI,nor can I access the server via SSH. Curiously, I can still access the apps that have been set up. This is reminiscent of the Python crashes that were happening with CORE 12.0-U4 Jira ticket NAS-109709 - many users are reporting python38 core dumping. With that ticket, access to the UI was lost, but you could SSH in the box and execute a The issue here is much more sinister now as SSH access to SCALE is also unavailable.
I generally become aware of the situation through the TrueCommand dashboard.
TC alert messages are logged when this happens...
The console also becomes unresponsive. The only way I've found to recover from this situation, and it's not ideal, is to do a hard reset of the server.
JIRA ticket NAS-112912.
Every once in a while (not frequently, days/weeks can pass), I lose contact with the SCALE server. I can't access the webUI,
service middleware restart
to get the UI back. I generally become aware of the situation through the TrueCommand dashboard.
TC alert messages are logged when this happens...
Code:
2021/10/18 - 23:51:57 Failed to check for alert SyslogNg:<br>Traceback (most recent call last):<br> File "/usr/lib/python3/dist-packages/middlewared/plugins/alert.py", line 772, in __run_source<br> alerts = (await alert_source.check()) or []<br> File "/usr/lib/python3/dist-packages/middlewared/alert/base.py", line 214, in check<br> return await self.middleware.run_in_thread(self.check_sync)<br> File "/usr/lib/python3/dist-packages/middlewared/utils/run_in_thread.py", line 10, in run_in_thread<br> return await self.loop.run_in_executor(self.run_in_thread_executor, functools.partial(method, *args, **kwargs))<br> File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run<br> result = self.fn(*self.args, **self.kwargs)<br> File "/usr/lib/python3/dist-packages/middlewared/alert/source/syslog_ng.py", line 15, in check_sync<br> p1 = subprocess.Popen(["/usr/sbin/service", "syslog-ng", "status"], stdout=subprocess.PIPE,<br> File "/usr/lib/python3.9/subprocess.py", line 951, in __init__<br> self._execute_child(args, executable, preexec_fn, close_fds,<br> File "/usr/lib/python3.9/subprocess.py", line 1722, in _execute_child<br> errpipe_read, errpipe_write = os.pipe()<br>OSError: [Errno 24] Too many open files<br> 2021/10/18 - 23:52:49 Failed to check for alert VolumeVersion:<br>Traceback (most recent call last):<br> File "/usr/lib/python3/dist-packages/middlewared/plugins/alert.py", line 772, in __run_source<br> alerts = (await alert_source.check()) or []<br> File "/usr/lib/python3/dist-packages/middlewared/alert/source/volume_version.py", line 24, in check<br> if not await self.middleware.call("pool.is_upgraded", pool["id"]):<br> File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1305, in call<br> return await self._call(<br> File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1262, in _call<br> return await methodobj(*prepared_call.args)<br> File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1182, in nf<br> return await func(*args, **kwargs)<br> File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1092, in nf<br> res = await f(*args, **kwargs)<br> File "/usr/lib/python3/dist-packages/middlewared/plugins/pool.py", line 420, in is_upgraded<br> proc = await Popen(<br> File "/usr/lib/python3.9/asyncio/subprocess.py", line 216, in create_subprocess_shell<br> transport, protocol = await loop.subprocess_shell(<br> File "/usr/lib/python3.9/asyncio/base_events.py", line 1628, in subprocess_shell<br> transport = await self._make_subprocess_transport(<br> File "/usr/lib/python3.9/asyncio/unix_events.py", line 197, in _make_subprocess_transport<br> transp = _UnixSubprocessTransport(self, protocol, args, shell,<br> File "/usr/lib/python3.9/asyncio/base_subprocess.py", line 36, in __init__<br> self._start(args=args, shell=shell, stdin=stdin, stdout=stdout,<br> File "/usr/lib/python3.9/asyncio/unix_events.py", line 789, in _start<br> self._proc = subprocess.Popen(<br> File "/usr/lib/python3.9/subprocess.py", line 829, in __init__<br> errread, errwrite) = self._get_handles(stdin, stdout, stderr)<br> File "/usr/lib/python3.9/subprocess.py", line 1605, in _get_handles<br> errread, errwrite = os.pipe()<br>OSError: [Errno 24] Too many open files<br> 2021/10/18 - 23:53:49 Failed to check for alert SyslogNg:<br>Traceback (most recent call last):<br> File "/usr/lib/python3/dist-packages/middlewared/plugins/alert.py", line 772, in __run_source<br> alerts = (await alert_source.check()) or []<br> File "/usr/lib/python3/dist-packages/middlewared/alert/base.py", line 214, in check<br> return await self.middleware.run_in_thread(self.check_sync)<br> File "/usr/lib/python3/dist-packages/middlewared/utils/run_in_thread.py", line 10, in run_in_thread<br> return await self.loop.run_in_executor(self.run_in_thread_executor, functools.partial(method, *args, **kwargs))<br> File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run<br> result = self.fn(*self.args, **self.kwargs)<br> File "/usr/lib/python3/dist-packages/middlewared/alert/source/syslog_ng.py", line 15, in check_sync<br> p1 = subprocess.Popen(["/usr/sbin/service", "syslog-ng", "status"], stdout=subprocess.PIPE,<br> File "/usr/lib/python3.9/subprocess.py", line 829, in __init__<br> errread, errwrite) = self._get_handles(stdin, stdout, stderr)<br> File "/usr/lib/python3.9/subprocess.py", line 1593, in _get_handles<br> c2pread, c2pwrite = os.pipe()<br>OSError: [Errno 24] Too many open files<br> 2021/10/18 - 23:54:48 Failed to check for alert CoreFilesArePresent:<br>Traceback (most recent call last):<br> File "/usr/lib/python3/dist-packages/middlewared/plugins/alert.py", line 772, in __run_source<br> alerts = (await alert_source.check()) or []<br> File "/usr/lib/python3/dist-packages/middlewared/alert/source/cores.py", line 25, in check<br> File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1305, in call<br> return await self._call(<br> File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1262, in _call<br> return await methodobj(*prepared_call.args)<br> File "/usr/lib/python3/dist-packages/middlewared/plugins/system_/coredump.py", line 14, in coredumps<br> File "/usr/lib/python3/dist-packages/middlewared/utils/__init__.py", line 50, in run<br> proc = await asyncio.create_subprocess_exec(*args, **kwargs)<br> File "/usr/lib/python3.9/asyncio/subprocess.py", line 236, in create_subprocess_exec<br> transport, protocol = await loop.subprocess_exec(<br> File "/usr/lib/python3.9/asyncio/base_events.py", line 1661, in subprocess_exec<br> transport = await self._make_subprocess_transport(<br> File "/usr/lib/python3.9/asyncio/unix_events.py", line 197, in _make_subprocess_transport<br> transp = _UnixSubprocessTransport(self, protocol, args, shell,<br> File "/usr/lib/python3.9/asyncio/base_subprocess.py", line 36, in __init__<br> self._start(args=args, shell=shell, stdin=stdin, stdout=stdout,<br> File "/usr/lib/python3.9/asyncio/unix_events.py", line 789, in _start<br> self._proc = subprocess.Popen(<br> File "/usr/lib/python3.9/subprocess.py", line 829, in __init__<br> errread, errwrite) = self._get_handles(stdin, stdout, stderr)<br> File "/usr/lib/python3.9/subprocess.py", line 1593, in _get_handles<br> c2pread, c2pwrite = os.pipe()<br>OSError: [Errno 24] Too many open files<br> 2021/10/18 - 23:54:50 Failed to check for alert IPMISELSpaceLeft:<br>Traceback (most recent call last):<br> File "/usr/lib/python3/dist-packages/middlewared/plugins/alert.py", line 772, in __run_source<br> alerts = (await alert_source.check()) or []<br> File "/usr/lib/python3/dist-packages/middlewared/alert/source/ipmi_sel.py", line 179, in check<br> return self._produce_alert_for_ipmitool_output(await ipmitool(["sel", "info"]))<br> File "/usr/lib/python3/dist-packages/middlewared/alert/source/ipmi_sel.py", line 24, in __call__<br> result = await run(["ipmitool"] + args, check=False, encoding="utf8", errors="ignore")<br> File "/usr/lib/python3/dist-packages/middlewared/utils/__init__.py", line 50, in run<br> proc = await asyncio.create_subprocess_exec(*args, **kwargs)<br> File "/usr/lib/python3.9/asyncio/subprocess.py", line 236, in create_subprocess_exec<br> transport, protocol = await loop.subprocess_exec(<br> File "/usr/lib/python3.9/asyncio/base_events.py", line 1661, in subprocess_exec<br> transport = await self._make_subprocess_transport(<br> File "/usr/lib/python3.9/asyncio/unix_events.py", line 197, in _make_subprocess_transport<br> transp = _UnixSubprocessTransport(self, protocol, args, shell,<br> File "/usr/lib/python3.9/asyncio/base_subprocess.py", line 36, in __init__<br> self._start(args=args, shell=shell, stdin=stdin, stdout=stdout,<br> File "/usr/lib/python3.9/asyncio/unix_events.py", line 789, in _start<br> self._proc = subprocess.Popen(<br> File "/usr/lib/python3.9/subprocess.py", line 829, in __init__<br> errread, errwrite) = self._get_handles(stdin, stdout, stderr)<br> File "/usr/lib/python3.9/subprocess.py", line 1593, in _get_handles<br> c2pread, c2pwrite = os.pipe()<br>OSError: [Errno 24] Too many open files<br> 2021/10/18 - 23:54:50 Failed to check for alert ZpoolCapacity:<br>Traceback (most recent call last):<br> File "/usr/lib/python3/dist-packages/middlewared/plugins/alert.py", line 772, in __run_source<br> alerts = (await alert_source.check()) or []<br> File "/usr/lib/python3/dist-packages/middlewared/alert/base.py", line 214, in check<br> return await self.middleware.run_in_thread(self.check_sync)<br> File "/usr/lib/python3/dist-packages/middlewared/utils/run_in_thread.py", line 10, in run_in_thread<br> return await self.loop.run_in_executor(self.run_in_thread_executor, functools.partial(method, *args, **kwargs))<br> File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run<br> result = self.fn(*self.args, **self.kwargs)<br> File "/usr/lib/python3/dist-packages/middlewared/alert/source/zpool_capacity.py", line 38, in check_sync<br> proc = subprocess.Popen([<br> File "/usr/lib/python3.9/subprocess.py", line 829, in __init__<br> errread, errwrite) = self._get_handles(stdin, stdout, stderr)<br> File "/usr/lib/python3.9/subprocess.py", line 1593, in _get_handles<br> c2pread, c2pwrite = os.pipe()<br>OSError: [Errno 24] Too many open files<br> 2021/10/18 - 23:54:50 Failed to check for alert IPMISEL:<br>Traceback (most recent call last):<br> File "/usr/lib/python3/dist-packages/middlewared/plugins/alert.py", line 772, in __run_source<br> alerts = (await alert_source.check()) or []<br> File "/usr/lib/python3/dist-packages/middlewared/alert/source/ipmi_sel.py", line 116, in check<br> return await self._produce_alerts_for_ipmitool_output(await ipmitool(["-c", "sel", "elist"]))<br> File "/usr/lib/python3/dist-packages/middlewared/alert/source/ipmi_sel.py", line 24, in __call__<br> result = await run(["ipmitool"] + args, check=False, encoding="utf8", errors="ignore")<br> File "/usr/lib/python3/dist-packages/middlewared/utils/__init__.py", line 50, in run<br> proc = await asyncio.create_subprocess_exec(*args, **kwargs)<br> File "/usr/lib/python3.9/asyncio/subprocess.py", line 236, in create_subprocess_exec<br> transport, protocol = await loop.subprocess_exec(<br> File "/usr/lib/python3.9/asyncio/base_events.py", line 1661, in subprocess_exec<br> transport = await self._make_subprocess_transport(<br> File "/usr/lib/python3.9/asyncio/unix_events.py", line 197, in _make_subprocess_transport<br> transp = _UnixSubprocessTransport(self, protocol, args, shell,<br> File "/usr/lib/python3.9/asyncio/base_subprocess.py", line 36, in __init__<br> self._start(args=args, shell=shell, stdin=stdin, stdout=stdout,<br> File "/usr/lib/python3.9/asyncio/unix_events.py", line 789, in _start<br> self._proc = subprocess.Popen(<br> File "/usr/lib/python3.9/subprocess.py", line 829, in __init__<br> errread, errwrite) = self._get_handles(stdin, stdout, stderr)<br> File "/usr/lib/python3.9/subprocess.py", line 1593, in _get_handles<br> c2pread, c2pwrite = os.pipe()<br>OSError: [Errno 24] Too many open files<br> 2021/10/18 - 23:56:21 TrueCommand has lost access to this system unexpectedly. NAS is possibly offline or has lost network connectivity.
The console also becomes unresponsive.
JIRA ticket NAS-112912.
Last edited: