Python 3.9 crashes - A TrueCommand perspective

Joined
Jan 4, 2014
Messages
1,644
This is not a TC issue, but it does highlight the value of TC in identifying patterns. Here's a TC dashboard view of a number of servers.

tc76.jpg


One of the systems is offline. I'm also not able to connect to the server UI.

tn04.jpg


This is symptomatic of the python crash that been reported in many guises on the forum. The way to temporarily resolve this is to SSH into the server and restart the middleware. A server restart isn't necessary, but the issue reappears every couple of days, which is annoying.
Code:
root@truenas-l2[/]# service middlewared restart
Stopping middlewared.
root@truenas-l2[/]#

Magically, the server UI is now accessible and the server is no longer offline in TC. Sure enough, the python issue is logged on the TC Alerts card for the server.

tc58.jpg


Patterns have emerged. Returning to the first image of the TC dashboard:
  1. I've only ever seen python crashes on the two servers with Intel processors and never on the servers with AMD processors.
  2. The middleware crashes more frequently on the Intel server running 12.0-U4 and less on the Intel server running 12.0-U3.1.
There is already a ticket out for the python crash issue NAS-109709. The ticket refers to an image that supposedly fixes the core dump issue, but as the Intel servers are production servers, I'm unable to test the fix and will instead await the release of 12.0-U5. I'll add this thread to the ticket though.
 

revengineer

Contributor
Joined
Oct 27, 2019
Messages
193
I only have a statistic of 1 server running on Intel. on U3.1 I had frequent middleware crashes every few days. On U4, issues are gone, running 18 days now without middleware crash. I am running regular U4, not the latest modded version posted by Caleb in the epic ticket. So I cannot backup your second point.
 
Joined
Jan 4, 2014
Messages
1,644
I only have a statistic of 1 server running on Intel. on U3.1 I had frequent middleware crashes every few days. On U4, issues are gone, running 18 days now without middleware crash. I am running regular U4, not the latest modded version posted by Caleb in the epic ticket. So I cannot backup your second point.
Thanks for the feedback. It's curious I have the reverse behaviour on point 2. What makes this more curious is that both servers are on identical h/w apart from slightly different Intel CPUs. What I might try is to switch down to a 12-U3.1 boot environment on the server currently running 12.0-U4 and see if the behaviour changes.
 

revengineer

Contributor
Joined
Oct 27, 2019
Messages
193
@Basil Hendroff Have you tried the modified U4 posted in the epic ticket? This supposedly fixed residual issues with these core dumps. If this does not work for you then we may received in August a U5 that is still not fully working. So testing this now would be good.
 
Joined
Jan 4, 2014
Messages
1,644
Have you tried the modified U4 posted in the epic ticket?
Good thought, but as I indicated in the OP, these are production servers so I not prepared to use these to test the modified U4 on them.
 

revengineer

Contributor
Joined
Oct 27, 2019
Messages
193
Good thought, but as I indicated in the OP, these are production servers so I not prepared to use these to test the modified U4 on them.
Got it, understood. I am puzzled why the memory leak associated with the third-party python library, which caused the core dumps, would be processor specific. Then we will keep our fingers crossed that U5 is the fix all have been waiting for.
 
Joined
Jan 4, 2014
Messages
1,644
Well, I rolled back down to U3.1 on the Intel server (HP Gen 8 microserver with an Intel Xeon E3-1220L V2) running U4 last Wed and I've not had a middleware crash since. It was crashing every couple of days with U4. Hoping U4.1 addresses the issue.
 

revengineer

Contributor
Joined
Oct 27, 2019
Messages
193
I do not see any middleware crashes addressed in the U4.1 fix list. This seems to be mainly for enclosure related issues, and they threw in the dashboard CPU widget fix. So it seems they are holding further python fixes until U5.
 
Joined
Jan 4, 2014
Messages
1,644
Damn it! I spoke to soon and jinxed myself. The middleware crashed on that server overnight o_O
 
Joined
Jan 4, 2014
Messages
1,644
Since upgrading to U4.1, my middleware crashes seem to have ceased.
 

cleansman

Cadet
Joined
Nov 21, 2017
Messages
6
I also have the issue and it is not gone with U4.1, I have Intel CPUs as well.

Restarting my middleware is not working:
Code:
root@freenas:~ # service middlewared restart
Stopping middlewared.
Waiting for PIDS: 7629.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 645, in waitready
    with Client(uri=args.uri) as c:
  File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 285, in __init__
    raise ClientException('Failed connection handshake')
middlewared.client.client.ClientException: Failed connection handshake
root@freenas:~ # 


or

Code:
root@freenas:~ # service middlewared restart
Stopping middlewared.
Waiting for PIDS: 7806.
Middleware startup is idle for more than 240 seconds
##############################################################
MIDDLEWARED FAILED TO START, SYSTEM WILL NOT BEHAVE CORRECTLY!
##############################################################
root@freenas:~ #


PIDS 7629 and 7806 are python3.9.
 
Joined
Jan 4, 2014
Messages
1,644
@cleansman Ouch! That doesn't look good at all. What you haven't provided is context. This thread is not appropriate for that though. This thread is about the TC perspective of an issue that's been widely reported.

According to the thread CPU stats not working in 12.0 U4 and specifically this post, U5 is expected to resolve the issue of more recent python crashes. You may wish to review JIRA ticket NAS-109709 as well and add to it if you feel it's the same issue. If you feel it's not, I suggest you start a new thread in the forum and report the issue as you have done in the previous post, but provide a context as well. By context, I mean including a detailed description of the h/w you're running TrueNAS on. The Forum Rules will provide you with some ideas on what to include.

tn06.jpg
 
Last edited:

fohdeesha

Dabbler
Joined
Nov 14, 2016
Messages
14
Had a python 3.9 crash/core dump on TrueNAS-12.0-U6. Restarting middleware got my GUI back. also intel proc as OP noticed trendwise. suppose I'll attach my coredump to the bugtracker issue
 
Joined
Jan 4, 2014
Messages
1,644
@fohdeesha The original ticket NAS-109709 is marked completed. I've just started a new one today to cover both SCALE and CORE - JIRA ticket NAS-112912. Intermittent Python crashes are happening on both platforms.
 
Top