Hi! I'm looking for advice with some crashes I've been getting.
I just rebuilt my Truenas machine with various parts I have lying around, but I've been getting system crashes & reboots every time I try to move data into Truenas.
The console outputs
Following the textdumps, I find:
I believe that the error isn't just with the device ada6 - having crashed the system repeatedly, different crashes will name different SATA drives here.
And I've attached the full textdump here: https://pastebin.com/YH2JT5Qz
I'm not sure how to continue troubleshooting this - none of the logs make any sense to me. As far as hardware goes, all the hardware in this system is known good and has been pulled from various other system's I've tested (and have been using).
General hardware list:
Intel i3-7320
Intel S1200SPL motherboard
2x32GB DDR4 ECC UDIMM
Intel SSD boot drive
6x 14TB WD HDDs in Z2
Samsung SM953 480GB for cache
Drives were used in a previous Truenas setup, have tested good and show no signs of failing.
CPU & motherboard were also previously used in a different build and have tested stable.
The RAM is the only thing I can think of that might possibly create errors - these sticks were known to be good with another motherboard & this CPU/RAM/motherboard configuration passes memtest86 on a warm boot, but is susceptible to the ECC cold boot bug. See this link for what I'm referring to: https://forums.passmark.com/memtest...r-only-after-cold-boot-not-after-repeat-reset
I'm unsure how to proceed here. I would swap the RAM, but this is the only kit of ECC UDIMM I have lying around. Is there a configuration issue or some obvious mistake I'm missing?
I just rebuilt my Truenas machine with various parts I have lying around, but I've been getting system crashes & reboots every time I try to move data into Truenas.
The console outputs
Code:
panic: APEI Fatal Hardware Error!
Following the textdumps, I find:
Code:
root@truenas[~]# cat /data/crash/info.last Dump header from device: /dev/ada6p1 Architecture: amd64 Architecture Version: 4 Dump Length: 401408 Blocksize: 512 Compression: none Dumptime: 2023-10-14 07:24:39 -0700 Hostname: truenas.localdomain Magic: FreeBSD Text Dump Version String: FreeBSD 13.1-RELEASE-p2 n245412-484f039b1d0 TRUENAS Panic String: APEI Fatal Hardware Error! Dump Parity: 3924496761 Bounds: 2 Dump Status: good
I believe that the error isn't just with the device ada6 - having crashed the system repeatedly, different crashes will name different SATA drives here.
And I've attached the full textdump here: https://pastebin.com/YH2JT5Qz
I'm not sure how to continue troubleshooting this - none of the logs make any sense to me. As far as hardware goes, all the hardware in this system is known good and has been pulled from various other system's I've tested (and have been using).
General hardware list:
Intel i3-7320
Intel S1200SPL motherboard
2x32GB DDR4 ECC UDIMM
Intel SSD boot drive
6x 14TB WD HDDs in Z2
Samsung SM953 480GB for cache
Drives were used in a previous Truenas setup, have tested good and show no signs of failing.
CPU & motherboard were also previously used in a different build and have tested stable.
The RAM is the only thing I can think of that might possibly create errors - these sticks were known to be good with another motherboard & this CPU/RAM/motherboard configuration passes memtest86 on a warm boot, but is susceptible to the ECC cold boot bug. See this link for what I'm referring to: https://forums.passmark.com/memtest...r-only-after-cold-boot-not-after-repeat-reset
I'm unsure how to proceed here. I would swap the RAM, but this is the only kit of ECC UDIMM I have lying around. Is there a configuration issue or some obvious mistake I'm missing?