ans40
Cadet
- Joined
- Oct 9, 2023
- Messages
- 4
Hi, I have a bare-metal install of TrueNAS-SCALE-22.12.3.3. I'm using a windows VM to remotely edit large video files locally on the system so I don't have to transfer huge amounts of data across the internet. Usually when processing files (even if I limit the VM CPU to one core and one thread), the entire TrueNAS box will crash and reboot. I'm not sure why. The timing is sort of random -- sometimes it happens quickly, sometimes the file will intensely process for an hour and then crash.
Early, I thought it might be a hardware issue so I've tried:
Example (crashed around 13:18:00, startup one minute later):
I was reading some forums (a few in proxmox and unraid but couldn't find any for TrueNAS) and saw some people mentioning this feature can just be turned off with the kernel parameter
I've also tried looking through the debug logs and I don't see any crash files or anything like that. But I'm not super familiar with logging in TrueNAS so direction here could be helpful.
This is where I'm at now. Any help with diagnosing the issue further would be great. Any common problems I'm overlooking? If split/lock isn't a red herring, could someone help me disable it and see if the problems resolve? Thanks!
Early, I thought it might be a hardware issue so I've tried:
- Memtest - ran a full memtest overnight, passed/successful. Also with the VM running, I usually have 20+GB RAM free.
- Measured power draw - the max power draw of my entire box under heavy load is about 250 watts. The PSU 12V rail is rated for 576 so I'm well within spec. Not to say the PSU couldn't be the issue but it seems unlikely.
more /var/log/messages
. I saw that for most crashes, the last logged message involves a split/lock detection. I also saw that on rebooting, one of the first logs includes some split/lock info.Example (crashed around 13:18:00, startup one minute later):
Code:
Oct 9 13:12:45 TrueNAS-Server kernel: x86/split lock detection: #AC: CPU 3/KVM/2244533 took a split_lock trap at address: 0x404756 Oct 9 13:20:04 TrueNAS-Server syslog-ng[4240]: syslog-ng starting up; version='3.28.1' Oct 9 13:19:17 TrueNAS-Server kernel: Linux version 5.15.107+truenas (root@tnsbuilds01.tn.ixsystems.net) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Tue Jul 25 00:05:02 UTC 2023 Oct 9 13:19:17 TrueNAS-Server kernel: Command line: BOOT_IMAGE=/ROOT/22.12.3.3@/boot/vmlinuz-5.15.107+truenas root=ZFS=boot-pool/ROOT/22.12.3.3 ro libata.allow_tpm=1 amd_iommu=on iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=on zfsforce=1 nvme_core.multipath=N i915.force_probe=4692 Oct 9 13:19:17 TrueNAS-Server kernel: x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
I was reading some forums (a few in proxmox and unraid but couldn't find any for TrueNAS) and saw some people mentioning this feature can just be turned off with the kernel parameter
split_lock_detect=off
(documentation), but I'm not sure how to implement that because sysctl is telling me it doesn't exist in the kernel (double-check me on that).I've also tried looking through the debug logs and I don't see any crash files or anything like that. But I'm not super familiar with logging in TrueNAS so direction here could be helpful.
This is where I'm at now. Any help with diagnosing the issue further would be great. Any common problems I'm overlooking? If split/lock isn't a red herring, could someone help me disable it and see if the problems resolve? Thanks!