Help diagnosing kernel panic

Astrodonkey

Explorer
Joined
Jul 18, 2017
Messages
72
Hi TrueNAS'ers,

I'm hoping for some help diagnosing a kernel panic that started happening on my system consistently about 15-20 minutes after boot. This makes the system quite unusable, as you might imagine. I would suppose this is a hardware failure, but I haven't been able to isolate it.

Things I've tried so far including disabling snapshots and running memtest86 (no errors). There have been no recent hardware changes and I am running TrueNAS-12.0-U7. "zpool status" indicates everything is healthy.

Any help with next steps would be appreciated.

From /data/crash:
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address = 0x3801c040
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80a685b1
stack pointer = 0x28:0xfffffe010aa4b760
frame pointer = 0x28:0xfffffe010aa4b7a0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 3229 (PLUGIN[freebsd])
trap number = 12
panic: page fault
cpuid = 3
time = 1643123039

KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe010aa4b420
vpanic() at vpanic+0x17b/frame 0xfffffe010aa4b470
panic() at panic+0x43/frame 0xfffffe010aa4b4d0
trap_fatal() at trap_fatal+0x391/frame 0xfffffe010aa4b530
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe010aa4b580
trap() at trap+0x286/frame 0xfffffe010aa4b690
calltrap() at calltrap+0x8/frame 0xfffffe010aa4b690
--- trap 0xc, rip = 0xffffffff80a685b1, rsp = 0xfffffe010aa4b760, rbp = 0xfffffe010aa4b7a0 ---
cpufreq_curr_sysctl() at cpufreq_curr_sysctl+0x31/frame 0xfffffe010aa4b7a0
sysctl_root_handler_locked() at sysctl_root_handler_locked+0x8a/frame 0xfffffe010aa4b7e0
sysctl_root() at sysctl_root+0x249/frame 0xfffffe010aa4b860
userland_sysctl() at userland_sysctl+0x178/frame 0xfffffe010aa4b910
sys___sysctl() at sys___sysctl+0x5f/frame 0xfffffe010aa4b9c0
amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe010aa4baf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe010aa4baf0
--- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x800c582ca, rsp = 0x7fffdf9fa598, rbp = 0x7fffdf9fa5d0 ---
KDB: enter: panic
panic.txt0600001214174010537 7133 ustarrootwheelpage faultversion.txt0600006314174010537 7534 ustarrootwheelFreeBSD 12.2-RELEASE-p11 75566f060d4(HEAD) TRUENAS
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I would start immediately with the assumption that it will be either:

The Realtek NIC (and the driver which is known to be involved in performance issues and sometimes Kernel panic). Add a PCIe Intel NIC and disable the onboard Realtek NIC to remedy that.

or

The Ryzen Cool 'n' Quiet and C-States not set to off in the BIOS. Set them to off (search the forums for the thread that talks about how/what exactly).
 

Astrodonkey

Explorer
Joined
Jul 18, 2017
Messages
72
I would start immediately with the assumption that it will be either:

The Realtek NIC (and the driver which is known to be involved in performance issues and sometimes Kernel panic). Add a PCIe Intel NIC and disable the onboard Realtek NIC to remedy that.

or

The Ryzen Cool 'n' Quiet and C-States not set to off in the BIOS. Set them to off (search the forums for the thread that talks about how/what exactly).
Thanks, those are good suggestions. I do have an Intel NIC; I will be sure to disable RealTek going forward.

I did learn this morning that if I stop all my jails, the panic doesn't happen. I'll try to turn these back on one by one and see if I can isolate which one it is, though I'm still likely not going to have a root cause once I learn this.
 

Astrodonkey

Explorer
Joined
Jul 18, 2017
Messages
72
Just wanted to report this does indeed appear related to the Realtek NIC. I disabled the port in the BIOS and there have been no further crashes.

This one had me scratching my head. I had been using this NIC to run a Unifi controller on the Management LAN interface, not too concerned about performance but I had the spare on-board NIC and decided to use it for that purpose. I had no issues with it for years, until I disconnected it over the weekend (decided to run the controller on another box).

Never would have guessed that just disconnecting a network cable would have caused my system to crash over and over!

Thanks for your help sretalla, it would have taken me a while to figure this out!
 
Top