WebUI access on SCALE (and CORE) intermittently lost

Joined
Jan 4, 2014
Messages
1,644
I have been running a SCALE test server for some time now (specs in my signature block - ref. truenas-s). My attention has been around migrating several single container apps from a Ubuntu VM on a CORE server running Docker and Docker-Compose to SCALE apps.

Every once in a while (not frequently, days/weeks can pass), I lose contact with the SCALE server. I can't access the webUI, nor can I access the server via SSH. Curiously, I can still access the apps that have been set up. This is reminiscent of the Python crashes that were happening with CORE 12.0-U4 Jira ticket NAS-109709 - many users are reporting python38 core dumping. With that ticket, access to the UI was lost, but you could SSH in the box and execute a service middleware restart to get the UI back. The issue here is much more sinister now as SSH access to SCALE is also unavailable.

I generally become aware of the situation through the TrueCommand dashboard.

tn77.jpg


TC alert messages are logged when this happens...

Code:
2021/10/18 - 23:51:57 Failed to check for alert SyslogNg:<br>Traceback (most recent call last):<br> File "/usr/lib/python3/dist-packages/middlewared/plugins/alert.py", line 772, in __run_source<br> alerts = (await alert_source.check()) or []<br> File "/usr/lib/python3/dist-packages/middlewared/alert/base.py", line 214, in check<br> return await self.middleware.run_in_thread(self.check_sync)<br> File "/usr/lib/python3/dist-packages/middlewared/utils/run_in_thread.py", line 10, in run_in_thread<br> return await self.loop.run_in_executor(self.run_in_thread_executor, functools.partial(method, *args, **kwargs))<br> File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run<br> result = self.fn(*self.args, **self.kwargs)<br> File "/usr/lib/python3/dist-packages/middlewared/alert/source/syslog_ng.py", line 15, in check_sync<br> p1 = subprocess.Popen(["/usr/sbin/service", "syslog-ng", "status"], stdout=subprocess.PIPE,<br> File "/usr/lib/python3.9/subprocess.py", line 951, in __init__<br> self._execute_child(args, executable, preexec_fn, close_fds,<br> File "/usr/lib/python3.9/subprocess.py", line 1722, in _execute_child<br> errpipe_read, errpipe_write = os.pipe()<br>OSError: [Errno 24] Too many open files<br>
2021/10/18 - 23:52:49 Failed to check for alert VolumeVersion:<br>Traceback (most recent call last):<br> File "/usr/lib/python3/dist-packages/middlewared/plugins/alert.py", line 772, in __run_source<br> alerts = (await alert_source.check()) or []<br> File "/usr/lib/python3/dist-packages/middlewared/alert/source/volume_version.py", line 24, in check<br> if not await self.middleware.call("pool.is_upgraded", pool["id"]):<br> File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1305, in call<br> return await self._call(<br> File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1262, in _call<br> return await methodobj(*prepared_call.args)<br> File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1182, in nf<br> return await func(*args, **kwargs)<br> File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1092, in nf<br> res = await f(*args, **kwargs)<br> File "/usr/lib/python3/dist-packages/middlewared/plugins/pool.py", line 420, in is_upgraded<br> proc = await Popen(<br> File "/usr/lib/python3.9/asyncio/subprocess.py", line 216, in create_subprocess_shell<br> transport, protocol = await loop.subprocess_shell(<br> File "/usr/lib/python3.9/asyncio/base_events.py", line 1628, in subprocess_shell<br> transport = await self._make_subprocess_transport(<br> File "/usr/lib/python3.9/asyncio/unix_events.py", line 197, in _make_subprocess_transport<br> transp = _UnixSubprocessTransport(self, protocol, args, shell,<br> File "/usr/lib/python3.9/asyncio/base_subprocess.py", line 36, in __init__<br> self._start(args=args, shell=shell, stdin=stdin, stdout=stdout,<br> File "/usr/lib/python3.9/asyncio/unix_events.py", line 789, in _start<br> self._proc = subprocess.Popen(<br> File "/usr/lib/python3.9/subprocess.py", line 829, in __init__<br> errread, errwrite) = self._get_handles(stdin, stdout, stderr)<br> File "/usr/lib/python3.9/subprocess.py", line 1605, in _get_handles<br> errread, errwrite = os.pipe()<br>OSError: [Errno 24] Too many open files<br>
2021/10/18 - 23:53:49 Failed to check for alert SyslogNg:<br>Traceback (most recent call last):<br> File "/usr/lib/python3/dist-packages/middlewared/plugins/alert.py", line 772, in __run_source<br> alerts = (await alert_source.check()) or []<br> File "/usr/lib/python3/dist-packages/middlewared/alert/base.py", line 214, in check<br> return await self.middleware.run_in_thread(self.check_sync)<br> File "/usr/lib/python3/dist-packages/middlewared/utils/run_in_thread.py", line 10, in run_in_thread<br> return await self.loop.run_in_executor(self.run_in_thread_executor, functools.partial(method, *args, **kwargs))<br> File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run<br> result = self.fn(*self.args, **self.kwargs)<br> File "/usr/lib/python3/dist-packages/middlewared/alert/source/syslog_ng.py", line 15, in check_sync<br> p1 = subprocess.Popen(["/usr/sbin/service", "syslog-ng", "status"], stdout=subprocess.PIPE,<br> File "/usr/lib/python3.9/subprocess.py", line 829, in __init__<br> errread, errwrite) = self._get_handles(stdin, stdout, stderr)<br> File "/usr/lib/python3.9/subprocess.py", line 1593, in _get_handles<br> c2pread, c2pwrite = os.pipe()<br>OSError: [Errno 24] Too many open files<br>
2021/10/18 - 23:54:48 Failed to check for alert CoreFilesArePresent:<br>Traceback (most recent call last):<br> File "/usr/lib/python3/dist-packages/middlewared/plugins/alert.py", line 772, in __run_source<br> alerts = (await alert_source.check()) or []<br> File "/usr/lib/python3/dist-packages/middlewared/alert/source/cores.py", line 25, in check<br> File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1305, in call<br> return await self._call(<br> File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1262, in _call<br> return await methodobj(*prepared_call.args)<br> File "/usr/lib/python3/dist-packages/middlewared/plugins/system_/coredump.py", line 14, in coredumps<br> File "/usr/lib/python3/dist-packages/middlewared/utils/__init__.py", line 50, in run<br> proc = await asyncio.create_subprocess_exec(*args, **kwargs)<br> File "/usr/lib/python3.9/asyncio/subprocess.py", line 236, in create_subprocess_exec<br> transport, protocol = await loop.subprocess_exec(<br> File "/usr/lib/python3.9/asyncio/base_events.py", line 1661, in subprocess_exec<br> transport = await self._make_subprocess_transport(<br> File "/usr/lib/python3.9/asyncio/unix_events.py", line 197, in _make_subprocess_transport<br> transp = _UnixSubprocessTransport(self, protocol, args, shell,<br> File "/usr/lib/python3.9/asyncio/base_subprocess.py", line 36, in __init__<br> self._start(args=args, shell=shell, stdin=stdin, stdout=stdout,<br> File "/usr/lib/python3.9/asyncio/unix_events.py", line 789, in _start<br> self._proc = subprocess.Popen(<br> File "/usr/lib/python3.9/subprocess.py", line 829, in __init__<br> errread, errwrite) = self._get_handles(stdin, stdout, stderr)<br> File "/usr/lib/python3.9/subprocess.py", line 1593, in _get_handles<br> c2pread, c2pwrite = os.pipe()<br>OSError: [Errno 24] Too many open files<br>
2021/10/18 - 23:54:50 Failed to check for alert IPMISELSpaceLeft:<br>Traceback (most recent call last):<br> File "/usr/lib/python3/dist-packages/middlewared/plugins/alert.py", line 772, in __run_source<br> alerts = (await alert_source.check()) or []<br> File "/usr/lib/python3/dist-packages/middlewared/alert/source/ipmi_sel.py", line 179, in check<br> return self._produce_alert_for_ipmitool_output(await ipmitool(["sel", "info"]))<br> File "/usr/lib/python3/dist-packages/middlewared/alert/source/ipmi_sel.py", line 24, in __call__<br> result = await run(["ipmitool"] + args, check=False, encoding="utf8", errors="ignore")<br> File "/usr/lib/python3/dist-packages/middlewared/utils/__init__.py", line 50, in run<br> proc = await asyncio.create_subprocess_exec(*args, **kwargs)<br> File "/usr/lib/python3.9/asyncio/subprocess.py", line 236, in create_subprocess_exec<br> transport, protocol = await loop.subprocess_exec(<br> File "/usr/lib/python3.9/asyncio/base_events.py", line 1661, in subprocess_exec<br> transport = await self._make_subprocess_transport(<br> File "/usr/lib/python3.9/asyncio/unix_events.py", line 197, in _make_subprocess_transport<br> transp = _UnixSubprocessTransport(self, protocol, args, shell,<br> File "/usr/lib/python3.9/asyncio/base_subprocess.py", line 36, in __init__<br> self._start(args=args, shell=shell, stdin=stdin, stdout=stdout,<br> File "/usr/lib/python3.9/asyncio/unix_events.py", line 789, in _start<br> self._proc = subprocess.Popen(<br> File "/usr/lib/python3.9/subprocess.py", line 829, in __init__<br> errread, errwrite) = self._get_handles(stdin, stdout, stderr)<br> File "/usr/lib/python3.9/subprocess.py", line 1593, in _get_handles<br> c2pread, c2pwrite = os.pipe()<br>OSError: [Errno 24] Too many open files<br>
2021/10/18 - 23:54:50 Failed to check for alert ZpoolCapacity:<br>Traceback (most recent call last):<br> File "/usr/lib/python3/dist-packages/middlewared/plugins/alert.py", line 772, in __run_source<br> alerts = (await alert_source.check()) or []<br> File "/usr/lib/python3/dist-packages/middlewared/alert/base.py", line 214, in check<br> return await self.middleware.run_in_thread(self.check_sync)<br> File "/usr/lib/python3/dist-packages/middlewared/utils/run_in_thread.py", line 10, in run_in_thread<br> return await self.loop.run_in_executor(self.run_in_thread_executor, functools.partial(method, *args, **kwargs))<br> File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run<br> result = self.fn(*self.args, **self.kwargs)<br> File "/usr/lib/python3/dist-packages/middlewared/alert/source/zpool_capacity.py", line 38, in check_sync<br> proc = subprocess.Popen([<br> File "/usr/lib/python3.9/subprocess.py", line 829, in __init__<br> errread, errwrite) = self._get_handles(stdin, stdout, stderr)<br> File "/usr/lib/python3.9/subprocess.py", line 1593, in _get_handles<br> c2pread, c2pwrite = os.pipe()<br>OSError: [Errno 24] Too many open files<br>
2021/10/18 - 23:54:50 Failed to check for alert IPMISEL:<br>Traceback (most recent call last):<br> File "/usr/lib/python3/dist-packages/middlewared/plugins/alert.py", line 772, in __run_source<br> alerts = (await alert_source.check()) or []<br> File "/usr/lib/python3/dist-packages/middlewared/alert/source/ipmi_sel.py", line 116, in check<br> return await self._produce_alerts_for_ipmitool_output(await ipmitool(["-c", "sel", "elist"]))<br> File "/usr/lib/python3/dist-packages/middlewared/alert/source/ipmi_sel.py", line 24, in __call__<br> result = await run(["ipmitool"] + args, check=False, encoding="utf8", errors="ignore")<br> File "/usr/lib/python3/dist-packages/middlewared/utils/__init__.py", line 50, in run<br> proc = await asyncio.create_subprocess_exec(*args, **kwargs)<br> File "/usr/lib/python3.9/asyncio/subprocess.py", line 236, in create_subprocess_exec<br> transport, protocol = await loop.subprocess_exec(<br> File "/usr/lib/python3.9/asyncio/base_events.py", line 1661, in subprocess_exec<br> transport = await self._make_subprocess_transport(<br> File "/usr/lib/python3.9/asyncio/unix_events.py", line 197, in _make_subprocess_transport<br> transp = _UnixSubprocessTransport(self, protocol, args, shell,<br> File "/usr/lib/python3.9/asyncio/base_subprocess.py", line 36, in __init__<br> self._start(args=args, shell=shell, stdin=stdin, stdout=stdout,<br> File "/usr/lib/python3.9/asyncio/unix_events.py", line 789, in _start<br> self._proc = subprocess.Popen(<br> File "/usr/lib/python3.9/subprocess.py", line 829, in __init__<br> errread, errwrite) = self._get_handles(stdin, stdout, stderr)<br> File "/usr/lib/python3.9/subprocess.py", line 1593, in _get_handles<br> c2pread, c2pwrite = os.pipe()<br>OSError: [Errno 24] Too many open files<br>
2021/10/18 - 23:56:21 TrueCommand has lost access to this system unexpectedly. NAS is possibly offline or has lost network connectivity.

The console also becomes unresponsive. The only way I've found to recover from this situation, and it's not ideal, is to do a hard reset of the server.

ts13.jpg


JIRA ticket NAS-112912.
 
Last edited:
Joined
Jan 4, 2014
Messages
1,644
My bad. I thought I had SSH enabled on SCALE :rolleyes: I'll continue monitoring and logging the crashes on CORE and SCALE. I wonder if they share the same middleware code base?
 
Last edited:
Joined
Jun 2, 2019
Messages
591
If it's a test server, try the nightlies which are more stable.
 
Joined
Jan 4, 2014
Messages
1,644
Joined
Jun 2, 2019
Messages
591

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
CORE and SCALE share similar middleware, but the OS manages resources differently and so not all bugs are shared.
My understanding is that the 12.0-U4 python crashing bug.. was specific to FreeBSD 12 and a specific version of python .
Bugs that do cause issues after weeks of operation (typically memory leaks) are notoriously hard to find.. please report them.
 
Top