SOLVED The mv command crashes Truenas

jwong858

Dabbler
Joined
Nov 25, 2022
Messages
28
The mv command crashes Truenas, but the cp command works without any issue. I was trying to mv some directories from one zfs location to another zfs location. Whenever I ran the mv command, Truenas crashed every time. I am running Truenas Scale version TrueNAS-SCALE-22.12.3.3.
Here is my hardware list:
Motherboard: ASRock X570M PRO4
CPU: AMD Ryzen 9 5900X
GPU1: MSI Gaming GeForce GTX 16650
GPU2: PNY Nvidia T1000
RAM: 128GB (TTCED464G3600HC18JDC01)
M2: 2x4TB
Disks: 4x14TB

I tested the RAM for 3 days and found no issues.

I'm getting hardware errors:

Aug 17 00:06:09 truenas kernel: mce: [Hardware Error]: Machine check events logged
Aug 17 00:06:09 truenas kernel: mce: [Hardware Error]: CPU 4: Machine Check: 0 Bank 0: bc00080001010135
Aug 17 00:06:09 truenas kernel: mce: [Hardware Error]: TSC 0
Aug 17 00:06:09 truenas kernel: mce: [Hardware Error]: PROCESSOR 2:a20f12 TIME 1692255957 SOCKET 0 APIC 8 microcode a20120a
Aug 17 00:23:04 truenas kernel: mce: [Hardware Error]: Machine check events logged
Aug 17 00:23:04 truenas kernel: mce: [Hardware Error]: CPU 5: Machine Check: 0 Bank 0: bc00080001010135
Aug 17 00:23:04 truenas kernel: mce: [Hardware Error]: TSC 0 ADDR 1ba8ea440 MISC d012000000000000 IPID 1000b000000000
Aug 17 00:23:04 truenas kernel: mce: [Hardware Error]: PROCESSOR 2:a20f12 TIME 1692256973 SOCKET 0 APIC a microcode a20120a
Aug 17 20:49:29 truenas kernel: mce: [Hardware Error]: Machine check events logged
Aug 17 20:49:29 truenas kernel: mce: [Hardware Error]: CPU 5: Machine Check: 0 Bank 0: bc00080001010135
Aug 17 20:49:29 truenas kernel: mce: [Hardware Error]: TSC 0 ADDR 2109f6700 MISC d012000000000000 IPID 1000b000000000
Aug 17 20:49:29 truenas kernel: mce: [Hardware Error]: PROCESSOR 2:a20f12 TIME 1692330557 SOCKET 0 APIC a microcode a20120a
Aug 18 03:51:15 truenas kernel: mce: [Hardware Error]: Machine check events logged
Aug 18 03:51:15 truenas kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 27: faa000000000080b
Aug 18 03:51:15 truenas kernel: mce: [Hardware Error]: TSC 0 MISC d012000200000000 SYND 5d000000 IPID 1002e00000500
Aug 18 03:51:15 truenas kernel: mce: [Hardware Error]: PROCESSOR 2:a20f12 TIME 1692355863 SOCKET 0 APIC 0 microcode a20120a
Aug 18 04:35:05 truenas kernel: mce: [Hardware Error]: Machine check events logged
Aug 18 04:35:05 truenas kernel: mce: [Hardware Error]: CPU 5: Machine Check: 0 Bank 1: bc800800060c0859
Aug 18 04:35:05 truenas kernel: mce: [Hardware Error]: TSC 0 ADDR 1dce3f4640 MISC d012000000000000 IPID 100b000000000
Aug 18 04:35:05 truenas kernel: mce: [Hardware Error]: PROCESSOR 2:a20f12 TIME 1692358494 SOCKET 0 APIC a microcode a20120a
Aug 18 14:29:25 truenas kernel: mce: [Hardware Error]: Machine check events logged
Aug 18 14:29:25 truenas kernel: mce: [Hardware Error]: CPU 5: Machine Check: 0 Bank 1: fc800800060c0859
Aug 18 14:29:25 truenas kernel: mce: [Hardware Error]: TSC 0 ADDR 1e22215340 MISC d012000000000000 IPID 100b000000000
Aug 18 14:29:25 truenas kernel: mce: [Hardware Error]: PROCESSOR 2:a20f12 TIME 1692394154 SOCKET 0 APIC a microcode a20120a
Aug 18 14:34:50 truenas kernel: mce: [Hardware Error]: Machine check events logged
Aug 18 14:34:50 truenas kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 27: faa000000000080b
Aug 18 14:34:50 truenas kernel: mce: [Hardware Error]: TSC 0
Aug 18 14:34:50 truenas kernel: mce: [Hardware Error]: PROCESSOR 2:a20f12 TIME 1692394479 SOCKET 0 APIC 0 microcode a20120a
Aug 18 14:59:50 truenas kernel: mce: [Hardware Error]: Machine check events logged
Aug 18 14:59:50 truenas kernel: mce: [Hardware Error]: CPU 4: Machine Check: 0 Bank 0: bc00080001010135
Aug 18 14:59:50 truenas kernel: mce: [Hardware Error]: TSC 0 ADDR 433afba40 MISC d012000000000000 IPID 1000b000000000
Aug 18 14:59:50 truenas kernel: mce: [Hardware Error]: PROCESSOR 2:a20f12 TIME 1692395979 SOCKET 0 APIC 8 microcode a20120a

I did memory test for 3 days and it found no memory errors.

I also have broken BIOS errors:

Aug 17 00:06:10 truenas kernel: ccp 0000:0d:00.1: enabling device (0000 -> 0002)
Aug 17 00:06:10 truenas kernel: ccp 0000:0d:00.1: ccp: unable to access the device: you might be running a broken BIOS.
Aug 17 00:23:05 truenas kernel: ccp 0000:0d:00.1: enabling device (0000 -> 0002)
Aug 17 00:23:05 truenas kernel: ccp 0000:0d:00.1: ccp: unable to access the device: you might be running a broken BIOS.
Aug 17 00:35:58 truenas kernel: ccp 0000:0d:00.1: enabling device (0000 -> 0002)
Aug 17 00:35:58 truenas kernel: ccp 0000:0d:00.1: ccp: unable to access the device: you might be running a broken BIOS.
Aug 17 20:49:30 truenas kernel: ccp 0000:0d:00.1: enabling device (0000 -> 0002)
Aug 17 20:49:30 truenas kernel: ccp 0000:0d:00.1: ccp: unable to access the device: you might be running a broken BIOS.
Aug 18 03:51:16 truenas kernel: ccp 0000:0d:00.1: enabling device (0000 -> 0002)
Aug 18 03:51:16 truenas kernel: ccp 0000:0d:00.1: ccp: unable to access the device: you might be running a broken BIOS.
Aug 18 04:35:06 truenas kernel: ccp 0000:0d:00.1: enabling device (0000 -> 0002)
Aug 18 04:35:07 truenas kernel: ccp 0000:0d:00.1: ccp: unable to access the device: you might be running a broken BIOS.
Aug 18 14:25:57 truenas kernel: ccp 0000:0d:00.1: enabling device (0000 -> 0002)
Aug 18 14:25:57 truenas kernel: ccp 0000:0d:00.1: ccp: unable to access the device: you might be running a broken BIOS.
Aug 18 14:29:26 truenas kernel: ccp 0000:0d:00.1: enabling device (0000 -> 0002)
Aug 18 14:29:26 truenas kernel: ccp 0000:0d:00.1: ccp: unable to access the device: you might be running a broken BIOS.
Aug 18 14:34:51 truenas kernel: ccp 0000:0d:00.1: enabling device (0000 -> 0002)
Aug 18 14:34:51 truenas kernel: ccp 0000:0d:00.1: ccp: unable to access the device: you might be running a broken BIOS.
Aug 18 14:59:51 truenas kernel: ccp 0000:0d:00.1: enabling device (0000 -> 0002)
Aug 18 14:59:51 truenas kernel: ccp 0000:0d:00.1: ccp: unable to access the device: you might be running a broken BIOS.

I talked to the motherboard manufacturer, ASRock and they didn't support Linux.

Any idea? Your assistance is gratefully appreciated.
 
Last edited:

jwong858

Dabbler
Joined
Nov 25, 2022
Messages
28
Actually the cp command just crashed Truenas. It looks like a ZFS issue, right?
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
Pool layout?
What command exactly?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Actually the cp command just crashed Truenas. It looks like a ZFS issue, right?
Its not ECC RAM.... how did you test it to know that it won't ever cause a bit flip?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694

jwong858

Dabbler
Joined
Nov 25, 2022
Messages
28
This problem has been resolved by replacing a motherboard (Asus TUF Gaming B550M-PLUS) from a different manufacturer.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
This problem has been resolved by replacing a motherboard (Asus TUF Gaming B550M-PLUS) from a different manufacturer.

Thanks for the update. Did you identify the fault in the motherboard?

Without ECC RAM, any signal integrity issue is undetectable an can cause crashes. Is that a likely culprit?
 
Top