Am I regretting upgrading to TrueNAS 12?

phospholipid

Dabbler
Joined
Mar 2, 2024
Messages
15
We have an OWC Jupiter Callisto/Kore system. It faithfully ran Freenas 11.1 for years with no issues. I mean, it's uptime was *years* with no problems.

We wanted to start running certain tasks that were clunky on FN11. On the advice of OWC support we upgraded through this path:
11.1—>11.2—>11.3 —TrueNAS 12.

It was pretty cool at first. I had a small scare when I first launched TrueNAS Core 12, but that seems to have sorted out. They have not sorted out. I think we're on TrueNAS 12.1-U5. I say I think because I can't check at the moment because the GUI keeps failing. We have two problems that keep happening.

Randomly, the GUI service and the SMB service just decide to stop working. Everything will be going great for some hours, maybe even a day or two. Then all of a sudden, the GUI is inaccessible and at the same time SMB fails. Users already connected to SMB are okay about 70% of the time. But if this happens while anybody has gone to lunch or left for the day, when they come back they cannot re-connect. The only solution is rebooting the entire unit.

It does not seem to be a network error. I am convinced of that because 1) like I said everything works fine when it works and 2) even when these services fail, I can still ping the unit.

So... I have no idea where to start. I don't want to downgrade, but I also don't want to deal with this. Should I further upgrade to TrueNAS Scale 22 (Bluefin)? That is what the OWC team recommend, but I do feel a little bit apprehensive at this point.

Any ideas are welcome. I can provide more data if needed. Cheers!
 
Last edited:

ABain

Bug Conductor
iXsystems
Joined
Aug 18, 2023
Messages
172
Not familiar with the spec of the system you have, so I won't recommend a course of action, however if you want to go to SCALE you will need to upgrade to 13.0 latest U release before you migrate.

There were bug reports in 12.0 earlier U releases of UI not being accessible and responsive, can't be sure they are the same issue you are having. Looking through the history, these were addressed in 12.0-U8 or 13, depending on the issue.

to check which TrueNAS CORE version you have from the command line you can use cat /etc/version
1709739282624.png


If you decide to migrate to SCALE, before you migrate, please read the migration guide: https://www.truenas.com/docs/scale/23.10/gettingstarted/migrate/
 

phospholipid

Dabbler
Joined
Mar 2, 2024
Messages
15
Not familiar with the spec of the system you have, so I won't recommend a course of action, however if you want to go to SCALE you will need to upgrade to 13.0 latest U release before you migrate.

There were bug reports in 12.0 earlier U releases of UI not being accessible and responsive, can't be sure they are the same issue you are having. Looking through the history, these were addressed in 12.0-U8 or 13, depending on the issue.

to check which TrueNAS CORE version you have from the command line you can use cat /etc/version
View attachment 76331


If you decide to migrate to SCALE, before you migrate, please read the migration guide: https://www.truenas.com/docs/scale/23.10/gettingstarted/migrate/
I'm in TrueNAS 12.0-U8.1. going to start with updating to TrueNAS 13. Should I start with 13 and work up, or can I go right to U5 or U6?
 

phospholipid

Dabbler
Joined
Mar 2, 2024
Messages
15
Not familiar with the spec of the system you have,
I can get that but I'm currently pulling my hair out trying to get any coherent logs.
 

phospholipid

Dabbler
Joined
Mar 2, 2024
Messages
15
You can go straight to 13.0-u6.1 from 12.0-U8.1

I did this upgrade last night, so I'm now on 13.0-U6.1. It appears to have solved the SMB service crashing. I can freely disconnect and reconnect clients without issue.

I'm still having GUI crashes. It looks very similar to events described in this thread: https://www.truenas.com/community/t...l-machine-failed-to-connect-to-libvirt.86315/

I ran dmes

I tried to run the command
Code:
service middlewared restart


That resulted in a stalled Terminal window. It's stuck on "Waiting for PIDS: 139"
I left that window running and opened another SSH window. It's locked on a different report. It says:

Last login: Wed Mar 6 13:22:30 2024
FreeBSD I3.1-RELEASE-p9 n245429 -296d89569Be TRUENAS
TrueNAS (c) 2009-2023, iXsystems, Inc.
All rights reserved.
TrueNAS code is released under the modified BOO license with some
files copyrighted by (c) ii systems Inc.

For more information, documentation, help or support, on here:
Welcome to FreeNAS
Traceback (nest recent call last):
File "usr/local/sbin/hactl., line 171, in <module>
main(args.command, args.q)
File "usr/Iocal/sbin/hactl., line 17, in main
client Client()
File "usr/local/lib/python3.9/site-packages/middlewared/client/Ciient.py", line 286, in __init__

I am 100% certain that if I reboot the unit the GUI will be up and running again... until it's not. I can't do that presently because folks are using it. It looks like at least one person had luck running this instead of "restart"

Code:
service middlewared stop
Service middlewared start


If that would work, great. But I don't want to have to do that several times a day or week.

I'm curious what a way out would look like.
 

ABain

Bug Conductor
iXsystems
Joined
Aug 18, 2023
Messages
172
The issue linked is against much older releases and I would expect to see many more reports if this had continued to be an issue.
As you are now on the latest release of CORE , I would recommend you file a bug ticket https://ixsystems.atlassian.net/jira/software/c/projects/NAS/issues
You will be provided a link on the ticket to a private upload for a debug. If you can drop the link to the ticket in this thread that would be great.
 

phospholipid

Dabbler
Joined
Mar 2, 2024
Messages
15
The issue linked is against much older releases and I would expect to see many more reports if this had continued to be an issue.
As you are now on the latest release of CORE , I would recommend you file a bug ticket https://ixsystems.atlassian.net/jira/software/c/projects/NAS/issues
You will be provided a link on the ticket to a private upload for a debug. If you can drop the link to the ticket in this thread that would be great.
I did that. Thank you for the support. The ticket is here. It's asking for a debug file. I'll have to wait until the end of the day to generate that, as I would need to gather that from the GUI and folks are working from the unit currently so we can't pause to reboot the machine.
 

phospholipid

Dabbler
Joined
Mar 2, 2024
Messages
15
The issue linked is against much older releases and I would expect to see many more reports if this had continued to be an issue.
As you are now on the latest release of CORE , I would recommend you file a bug ticket https://ixsystems.atlassian.net/jira/software/c/projects/NAS/issues
You will be provided a link on the ticket to a private upload for a debug. If you can drop the link to the ticket in this thread that would be great.
I spoke too soon. SMB is not stable either. Folks are dropping off randomly throughout the day. Clients appear able to stay connected if nothing changes, but if they try to reconnect there is an issue. Having said that, sometimes they're randomly booted. The 10G network is an isolated intranet (14.44.44.x) from a router/switch that dishes out almost all reserved IPs, and the 1G network is from an internet-facing router (14.0.0.x) also with almost all reserved IPs. It is highly unlikely there are routing or switching problems.

This problem only started with the upgrade.
 
Last edited:

phospholipid

Dabbler
Joined
Mar 2, 2024
Messages
15
I wanted to confirm after some testing in-house a behavior pattern that is consistent:

After some time, could be hours or a full day, the GUI fails, Shell via SSH also fails, and there are SMB disruptions. If a client is connected through SMB *before* the GUI failures happen, and if the client does not change their network status, the will REMAIN connected indefinitely. If a client is connected to SMB when the failures happen and then disconnects (Or if they were not connected) they CANNOT reconnect to the share. All issues temporarily resolve upon reboot.

Thanks.
 
Top