TrueNAS keeps crashing/freezing after upgrade from FreeNAS 11.3

lindenmj

Cadet
Joined
Nov 27, 2020
Messages
5
Long time reader, first time poster. I've been using FreeNAS for about 1.5 years now and would classify myself as a n00b+ when it comes to FreeBSD. I know enough to do basic filesystem and user work but after that Google DuckDuckGo is my friend.

I recently upgraded from FreeNAS 11.3 (latest patch, whichever number that was) to TrueNAS 12.0. Prior to the software upgrade, I upgraded to a Ryzen 5 2600 from an i5-3570K (16gb RAM in each, but DDR3-1600 vs DDR-2400). Everything went well with the hardware upgrade. It boots off a 500gb WD Blue SSD drive.

During the upgrade to TrueNAS, I ended up wiping my configuration and starting fresh, hoping it would solve this crashing problem. It did not. The pools listed below were kept intact with no issues importing them.

I have 3 pools: pool1 is 6x10TB in raidz1 with 5 datasets and about 40% usage, pool2 and pool3 are drives within a 4-drive USB enclosure used for redundant backups, with 81% and 60% usage respectively. I have two jails running that use very little resources. One is a Plex/Sonarr/Radarr/Jackett/NZBGet/NZBHyrda2/qbittorrent setup. qBittorrent has about 100 torrents running, but probably 10+ active at any one time. Plex does not have any streamers 95% of the time. The 2nd jail is currently for rsync only that executes once a day. I also have three SMB shares that are seldomly used (one for each pool). I have reset all my permissions for the shares to accommodate for the FreeNAS root user -> TrueNAS non-root SMB user change. Everything works fine when the box is up. I also have it hooked into UPS power Tripp Lite SMART1500LCDT. Currently it is only using the UPS power with the USB cable unhooked and the UPS service disabled. This was working fine on FreeNAS. Currently the only services running are: SSH, SMB, and SMART.

The dilemma is that the box freezes/crashes seemingly randomly. I figure qBittorrent is the highest resource user out of everything I'm running but even then the hardware should be able to handle it well. When opening up the Sonarr UI it spikes CPU usage but goes back to normal after it loads. I've tried running the system without the USB enclosure connected, but it does not change to the problem. Plex does not transcode when it is being used the once or twice a day (if at all). Right now its only hooked up to power and gigabit ethernet.

I do not know of any logs I can look at to see if I can diagnose a problem, I'd be happy to post those for help sleuthing. I feel like this is a stab in the dark/searching for a needle in a haystack situation so any help on first steps towards diagnosing are greatly appreciated!
 

lindenmj

Cadet
Joined
Nov 27, 2020
Messages
5
I just received this alert email on the latest boot up (from a graceful shutdown). This is the first alert I've received like this:

Code:
New alerts:
* Failed to check for alert VolumeStatus:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/alert.py", line 706, in __run_source
alerts = (await alert_source.check()) or []
File "/usr/local/lib/python3.8/site-packages/middlewared/alert/source/volume_status.py", line 31, in check
for vdev in await self.middleware.call("pool.flatten_topology", pool["topology"]):
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1233, in call
return await self._call(
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1202, in _call
return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1106, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
File "/usr/local/lib/python3.8/site-packages/middlewared/utils/io_thread_pool_executor.py", line 25, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/pool.py", line 431, in flatten_topology
d = deque(sum(topology.values(), []))
AttributeError: 'NoneType' object has no attribute 'values'


Current alerts:
* Failed to check for alert VolumeStatus:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/alert.py", line 706, in __run_source
alerts = (await alert_source.check()) or []
File "/usr/local/lib/python3.8/site-packages/middlewared/alert/source/volume_status.py", line 31, in check
for vdev in await self.middleware.call("pool.flatten_topology", pool["topology"]):
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1233, in call
return await self._call(
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1202, in _call
return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1106, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
File "/usr/local/lib/python3.8/site-packages/middlewared/utils/io_thread_pool_executor.py", line 25, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/pool.py", line 431, in flatten_topology
d = deque(sum(topology.values(), []))
AttributeError: 'NoneType' object has no attribute 'values'
 

LarsR

Guru
Joined
Oct 23, 2020
Messages
719
Have you disabled c-states in your bios?
Had the same Issue with my Ryzen 5 1600x where it would be up for up to 3 days but then it would crash and only a reboot would fix that.
After disabling that feature my system became completly stable.
 

lindenmj

Cadet
Joined
Nov 27, 2020
Messages
5
Just tried that and it didn't fix the problem.

For the record, I have a Micro-Star B450 Tomahawk (MS-7C02) and used the search function in the BIOS and disabled "Global C-state Control"
 

douglaing

Cadet
Joined
Nov 16, 2020
Messages
7
I have the same issue with a Dell T330, not upgrade started with TrueNas 12.0, it will run from 30min to a week and get the error listed above.
 

lindenmj

Cadet
Joined
Nov 27, 2020
Messages
5
I believe my problem is 100% software based. The OS console is still completely responsive and am able to gracefully reboot or shutdown from it. I'm just not able to access the WebUI or any of the UIs inside my jails.
 

douglaing

Cadet
Joined
Nov 16, 2020
Messages
7
My console is still active but very slooooooooow. It will respond to a reboot but takes over an hour.

Here is my configuration:
TrueNas 12.0

Dell PowerEdge T330

32Gb ECC Ram

Intel Xeon E3-1239 3.5Ghz

Profile Performance with C States Disabled

Two Kingston 120Gb SSD as startup mirrored

Syba 8 port SATA Non-Raid Dual Marvell 9215

Eight Western Digital Red 4Tb drives WD40EFRX

Three more Western Digital Red 4Tb drives WD40EFRX on built in controller in ACHI non raid mode intended as hot-swap. ( was unable to set this up, expansion to hot swap is grayed out)

Single RAIDZ2 pool from the above eight 4Tb drives 17% used space.

Pougins: Plex media server V 1.20.5.3600, Syncthing V 1.10.0

Three CloudSync Tasks to BackBlaze B2 (current up to date)

-------------------------------------------------------------------------------------------------------------------------------------------
The last error:
The following alert has been cleared:
* Failed to check for alert BootPoolStatus:
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 91, in main_worker
res = MIDDLEWARE._run(*call_args)
File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 45, in _run
return self._call(name, serviceobj, methodobj, args, job=job)
File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 33, in _call
with Client('ws+unix:///var/run/middlewared-internal.sock', py_exceptions=True) as c:
File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 281, in __init__
self._ws.connect()
File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 124, in connect
rv = super(WSClient, self).connect()
File "/usr/local/lib/python3.8/site-packages/ws4py/client/__init__.py", line 223, in connect
bytes = self.sock.recv(128)
socket.timeout: timed out
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/alert.py", line 706, in __run_source
alerts = (await alert_source.check()) or []
File "/usr/local/lib/python3.8/site-packages/middlewared/alert/source/boot_pool.py", line 16, in check
pool = await self.middleware.call("zfs.pool.query", [["id", "=", boot_pool]])
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1233, in call
return await self._call(
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1199, in _call
return await self._call_worker(name, *prepared_call.args)
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1205, in _call_worker
return await self.run_in_proc(main_worker, name, args, job)
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1132, in run_in_proc
return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1106, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
socket.timeout: timed out
----------------------------------------------------------------------------------------------------------------------------------

I was able to reboot from the console but it took over an hour. Normally I force power down the back up.

When will we see a 12.1 version that might fix this issue. I see some other people having issues on Redit.
 

douglaing

Cadet
Joined
Nov 16, 2020
Messages
7
I tried to disable Syncthing today but the system still crashed with the same error.

Do the TrueNAS people monitor this and respond with help??
 

lindenmj

Cadet
Joined
Nov 27, 2020
Messages
5
Closing the loop on this issue. I could never resolve it so I nuked TrueNAS and installed Ubuntu Server 20.04.1 LTS bare metal. I would recommend doing this if you are not intimidated by the command line as it was surprisingly easy to figure out with light Googling. I installed Webmin first to get a good base set up then set up/imported my ZFS pools manually via the command line. The performance boost I've seen is absolutely amazing and I do not feel that I'm missing a single thing from TrueNAS.
 
Top