New System & Scrub System Ryzen Stability Issue

Starman

Dabbler
Joined
Feb 15, 2014
Messages
15
Hello all,
Built a new system but utilised existing pool HDDs from my old HP Microserver build. Installed TrueNAS onto a dedicated SSD along with 5 10TB Ironwolf drives.

The system installed, and boots without issue yet when I start a scrub of the "Media" pool the system within 45mins or so will crash and TrueNAS will perform an unscheduled system reboot.

I initially thought it was related to my cross flashed IBM M1015 card so I moved the pool disks onto the internal SATA controller but the issue is still present. I've run a RAM test with MemTest86, and 86+ with no errors found. No issues I can see with system temperatures the case as good airflow.

System consists of the following -
OS Build: TrueNAS CORE 12.0-U3.1
Motherboard: Gigabyte B450 AORUS M [BIOS: 61a - latest)
CPU: AMD Ryzen 3 3200G
RAM: 32GB [PNY - XLR8 DDR4 3200 MHz PC RAM - 16 GB x 2]
OS HDD: Samsung 840 256GB SSD
ZFS HDD: x5 10TB Seagate Ironwolf
PSU: Corsair 750W HX
IBM1015 <Suspect unrelated as same issue on internal sata controller>

Anyone offer any advise where to start troubleshooting?
 
Last edited:

c77dk

Patron
Joined
Nov 27, 2019
Messages
467
don't know if it's still an issue, but earlier I've seen some posts regarding AMD instability - I believe a lot fixed it by disabling .... C6 state, or something like that in the bios (don't have and amd system to check on)
 

Starman

Dabbler
Joined
Feb 15, 2014
Messages
15
AMD instability - I believe a lot fixed it by disabling .... C6 state...

Thanks for the assistance - I've found a post and alter settings accordingly. Scrub is running again should find out soon.
 

Attachments

  • BIOS_SysInfo.jpg
    BIOS_SysInfo.jpg
    186.7 KB · Views: 190
  • BIOS_AMD_AdvancedPower.jpg
    BIOS_AMD_AdvancedPower.jpg
    138.9 KB · Views: 190

Starman

Dabbler
Joined
Feb 15, 2014
Messages
15
Well it stayed up for longer but I'd say around 4 hours in while I was asleep the system rebooted again. Scrub is still in progress currently with about an hour to go at 5am the following was logged in the console -
Code:
May 25 05:51:20 truenas (ada5:ahcich9:0:0:0): READ_FPDMA_QUEUED. ACB: 60 a8 d0 9e e2 40 15 00 00 00 00 00
May 25 05:51:20 truenas (ada5:ahcich9:0:0:0): CAM status: CCB request was invalid
May 25 05:51:20 truenas (ada5:ahcich9:0:0:0): Error 22, Unretryable error
May 25 05:51:20 truenas (ada5:ahcich9:0:0:0): READ_FPDMA_QUEUED. ACB: 60 b0 30 b0 e4 40 15 00 00 00 00 00
May 25 05:51:20 truenas (ada5:ahcich9:0:0:0): CAM status: CCB request was invalid
May 25 05:51:20 truenas (ada5:ahcich9:0:0:0): Error 22, Unretryable error
May 25 05:51:20 truenas (ada5:ahcich9:0:0:0): READ_FPDMA_QUEUED. ACB: 60 c0 28 fc e8 40 15 00 00 07 00 00
May 25 05:51:20 truenas (ada5:ahcich9:0:0:0): CAM status: CCB request was invalid
May 25 05:51:20 truenas (ada5:ahcich9:0:0:0): Error 22, Unretryable error


Pool is not marked as unhealthy hopefully this is repairable.
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
467
Well it stayed up for longer but I'd say around 4 hours in while I was asleep the system rebooted again. Scrub is still in progress currently with about an hour to go at 5am the following was logged in the console -
Pool is not marked as unhealthy hopefully this is repairable.

Have you connected to the m1015 again? otherwise that would be my next try, since that part is something a lot of users have great experience with, and I have no clue of what the onboard controller is :smile:
Verifying the PSU might also be worth a try (scrub uses a lot of power on both disks and cpu)

Oh, and when this issue has been resolved - save yourself a lot of trouble and get an Intel NIC (onboard is realtek, which normally gives a lot of trouble)
 

Starman

Dabbler
Joined
Feb 15, 2014
Messages
15
Have you connected to the m1015 again?
Verifying the PSU..
get an Intel NIC..

System rebooted again about 10 mins ago - I'm waiting for the scrub to finish and I will reverting back to the M1015 once a heatsink cooling fan arrives later today.

The PSU came from my old base with an i7 with 5 HDDs too - I can't see it being an issue but I can try swapping in a new 850w running my new i11 build.

The dual port Intel based NIC is on order already :)

Update - Scrub resumed - completed with the previously reported read errors no longer listed. Although it reports to repaired 1.14M of data.
 
Last edited:

Starman

Dabbler
Joined
Feb 15, 2014
Messages
15
Well fitted the fan to my M1015 and checked heat which is under 40c. Started a scrub [new SAS/SATA cables] and within 10 mins the system rolled over. Previously enabled auto tune, and debug kernel.

Code:
May 25 17:06:04 truenas #17 0xffffffff80fd168e at fork_trampoline+0xe
May 25 17:13:50 truenas syslog-ng[1195]: syslog-ng starting up; version='3.29.1'
May 25 17:13:50 truenas kernel trap 12 with interrupts disabled
May 25 17:13:50 truenas Fatal trap 12: page fault while in kernel mode
May 25 17:13:50 truenas cpuid = 0; apic id = 00
May 25 17:13:50 truenas fault virtual address    = 0x0
May 25 17:13:50 truenas fault code        = supervisor write data, page not present
May 25 17:13:50 truenas instruction pointer    = 0x20:0xffffffff80afbb4a
May 25 17:13:50 truenas stack pointer            = 0x28:0xfffffe00c733a870
May 25 17:13:50 truenas frame pointer            = 0x28:0xfffffe00c733a8d0
May 25 17:13:50 truenas code segment        = base 0x0, limit 0xfffff, type 0x1b
May 25 17:13:50 truenas             = DPL 0, pres 1, long 1, def32 0, gran 1
May 25 17:13:50 truenas processor eflags    = resume, IOPL = 0
May 25 17:13:50 truenas current process        = 11 (idle: cpu0)
May 25 17:13:50 truenas trap number        = 12
May 25 17:13:50 truenas panic: page fault
May 25 17:13:50 truenas cpuid = 0
May 25 17:13:50 truenas time = 1621959150
May 25 17:13:50 truenas KDB: stack backtrace:
May 25 17:13:50 truenas db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00c733a520
May 25 17:13:50 truenas vpanic() at vpanic+0x17b/frame 0xfffffe00c733a570
May 25 17:13:50 truenas panic() at panic+0x43/frame 0xfffffe00c733a5d0
May 25 17:13:50 truenas trap_fatal() at trap_fatal+0x391/frame 0xfffffe00c733a630
May 25 17:13:50 truenas trap_pfault() at trap_pfault+0x99/frame 0xfffffe00c733a690
May 25 17:13:50 truenas trap() at trap+0x2be/frame 0xfffffe00c733a7a0
May 25 17:13:50 truenas calltrap() at calltrap+0x8/frame 0xfffffe00c733a7a0
May 25 17:13:50 truenas --- trap 0xc, rip = 0xffffffff80afbb4a, rsp = 0xfffffe00c733a870, rbp = 0xfffffe00c733a8d0 ---
May 25 17:13:50 truenas callout_process() at callout_process+0x20a/frame 0xfffffe00c733a8d0
May 25 17:13:50 truenas handleevents() at handleevents+0x185/frame 0xfffffe00c733a910
May 25 17:13:50 truenas timercb() at timercb+0x196/frame 0xfffffe00c733a960
May 25 17:13:50 truenas lapic_handle_timer() at lapic_handle_timer+0x9b/frame 0xfffffe00c733a990
May 25 17:13:50 truenas Xtimerint() at Xtimerint+0xb1/frame 0xfffffe00c733a990
May 25 17:13:50 truenas --- interrupt, rip = 0xffffffff811fa9a6, rsp = 0xfffffe00c733aa60, rbp = 0xfffffe00c733aa60 ---
May 25 17:13:50 truenas acpi_cpu_c1() at acpi_cpu_c1+0x6/frame 0xfffffe00c733aa60
May 25 17:13:50 truenas acpi_cpu_idle() at acpi_cpu_idle+0x288/frame 0xfffffe00c733aaa0
May 25 17:13:50 truenas cpu_idle_acpi() at cpu_idle_acpi+0x3e/frame 0xfffffe00c733aac0
May 25 17:13:50 truenas cpu_idle() at cpu_idle+0x9f/frame 0xfffffe00c733aae0
May 25 17:13:50 truenas sched_idletd() at sched_idletd+0x3f1/frame 0xfffffe00c733abb0
May 25 17:13:50 truenas fork_exit() at fork_exit+0x80/frame 0xfffffe00c733abf0
May 25 17:13:50 truenas fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c733abf0
May 25 17:13:50 truenas --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
May 25 17:13:50 truenas KDB: enter: panic
May 25 17:13:50 truenas ---<<BOOT>>---


I've opened a bug report ticket to report and await feedback. I will try moving the pool into an old i7 P8P67 PRO system using the same PSU and pool drives as a temp measure to see if it remains stable.
 
Last edited:

Starman

Dabbler
Joined
Feb 15, 2014
Messages
15
So I have transferred the PSU, Ironwolf drives and SAS card into an old P8P67 MB, i7 2660k 16MB DDR3 system with a fresh SSD install of TrueNAS (existing SSD install didn't work with networking) - the scrub completed without incident overnight so this does seem to suggest a CPU/Chipset compatibility issue.
 
Top