How to troubleshoot the cause of unscheduled system reboot? I had a lot of problems after updating to 22.02-RELEASE

Glavo

Cadet
Joined
Mar 20, 2022
Messages
2
I've been using SCALE since 22.02-RC.1-1 and didn't have any issues at that point.

But after I update to 22.02.RELEASE, my NAS will have unscheduled system reboot every 3~14 days. In addition to this, I am getting a lot of notification emails for /usr/sbin/apache2 crashes, and, TrueNAS occasionally fails to connect with k3s, and it needs to restart the system to solve it.

After I updated to 22.02.1, k3s couldn't connect at all, and restarting couldn't solve it, and all my Docker tasks were lost.

In desperation, I rolled back FreeNAS to 22.02-RC2, and now it has worked continuously for 21 days without any reboots or alerts. This made me sure that the unscheduled reboot was a problem that started in 22.02-RELASE.

While it's working fine for now, I'd still like to update the TrueNAS in the future to get RAID-Z expansion feature. So, how should I troubleshoot the cause of the unscheduled system reboot? I didn't find logs anywhere that might tell me why. (The crash of apache2 only appeared a few weeks before the release of 22.02.1, and the system will reboot before that, so this should not be related to apache2)

Here is my hardware:
  • Motherboard: Super Micro X11SRA-F
  • CPU: Xeon W-2175
  • Memory: 4x Samsung 32GB 2Rx4 2666V ECC REG
  • Boot Disks: SanDisk X300s 128GB and SanDisk X400 256GB
  • Data Disks: 7x Western Digital WD42EJRX 4TB (RAIDZ3 array)
  • SATA expansion card: ASM1064
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
I'd want to rule out a hardware problem first, since that's usually the cause of these kind of crashes. I know you stopped experiencing the problem when you rolled back, but that doesn't mean that the update introduced a bug. Especially since a bug of this magnitude would be squashed very quickly. I'd wager that it has something to do with a change in the software which uses the hardware differently, and because of some kind of hardware issue in your system, this new update is now revealing that problem.

Off the cuff, the first thing that sticks out to me is your ASM1064 expansion card. I'd strongly recommend as HBA card over a SATA expansion card; they can be had very cheaply second hand, and even brand new aren't that expensive relative to the other hardware you're playing with.

Low hanging fruit to check for hardware problems:
  • Full memory test.
  • Run a sustained network transfer to saturate your NIC.
  • Run a scrub and a CPU stress test at the same time to check for power issues.
 

skoop

Cadet
Joined
May 27, 2022
Messages
3
I have similar issue with the unscheduled system reboot.

i have a clean installation of TrueNAS-SCALE-22.02.1.

i have not idea what to look for.
 
Top