MCA: CPU 0 COR OVER RD channel 1 memory error

PDM

Dabbler
Joined
Dec 17, 2011
Messages
24
Over the last couple of days I have seen the following error message appear in the console.
Apr 25 03:27:25 freenas MCA: Bank 5, Status 0xd400008000910091
Apr 25 03:27:25 freenas MCA: Global Cap 0x0000000000000806, Status 0x0000000000000000
Apr 25 03:27:25 freenas MCA: Vendor "GenuineIntel", ID 0x406d8, APIC ID 0
Apr 25 03:27:25 freenas MCA: CPU 0 COR OVER RD channel 1 memory error
Apr 25 03:27:25 freenas MCA: Address 0x5aa320380


Having searched the forums, I found the following thread: HOWTO: Troubleshooting faulty RAM

I need someones second opinion as to which my DDR3 ECC memory modules needs replacing.

mcelog:
Code:
root@freenas[~]# mcelog

Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 5 TSC 7ddec99c5a40
ADDR 5aa320380
TIME 1619336966 Sun Apr 25 09:49:26 2021
MCG status:
STATUS d400008000910091 MCGSTATUS 0
MCGCAP 806 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 77 Step 8

Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 5 TSC 85ba92baf978
ADDR 5aa320380
TIME 1619336966 Sun Apr 25 09:49:26 2021
MCG status:
STATUS 9400004000910091 MCGSTATUS 0
MCGCAP 806 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 77 Step 8

Hardware event. This is not a software error.
MCE 2
CPU 0 BANK 5 TSC 13a7a07bf6650
ADDR 5aa320380
TIME 1619336966 Sun Apr 25 09:49:26 2021
MCG status:
STATUS d400008000910091 MCGSTATUS 0
MCGCAP 806 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 77 Step 8


MCA error messages:
Code:
root@freenas[~]# cat /var/log/messages | grep MCA
Apr 23 11:29:54 freenas Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Apr 23 11:29:55 freenas Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Apr 24 03:27:24 freenas MCA: Bank 5, Status 0xd400008000910091
Apr 24 03:27:24 freenas MCA: Global Cap 0x0000000000000806, Status 0x0000000000000000
Apr 24 03:27:24 freenas MCA: Vendor "GenuineIntel", ID 0x406d8, APIC ID 0
Apr 24 03:27:24 freenas MCA: CPU 0 COR OVER RD channel 1 memory error
Apr 24 03:27:24 freenas MCA: Address 0x5aa320380
Apr 24 04:27:24 freenas MCA: Bank 5, Status 0x9400004000910091
Apr 24 04:27:24 freenas MCA: Global Cap 0x0000000000000806, Status 0x0000000000000000
Apr 24 04:27:24 freenas MCA: Vendor "GenuineIntel", ID 0x406d8, APIC ID 0
Apr 24 04:27:24 freenas MCA: CPU 0 COR RD channel 1 memory error
Apr 24 04:27:24 freenas MCA: Address 0x5aa320380
Apr 25 03:27:25 freenas MCA: Bank 5, Status 0xd400008000910091
Apr 25 03:27:25 freenas MCA: Global Cap 0x0000000000000806, Status 0x0000000000000000
Apr 25 03:27:25 freenas MCA: Vendor "GenuineIntel", ID 0x406d8, APIC ID 0
Apr 25 03:27:25 freenas MCA: CPU 0 COR OVER RD channel 1 memory error
Apr 25 03:27:25 freenas MCA: Address 0x5aa320380



dmidecode:
Code:
root@freenas[~]# dmidecode -t 20
# dmidecode 3.2
# SMBIOS entry point at 0x000f0560
Found SMBIOS entry point in EFI, reading table from /dev/mem.
SMBIOS 2.8 present.

Handle 0x002E, DMI type 20, 35 bytes
Memory Device Mapped Address
        Starting Address: 0x00000000000
        Ending Address: 0x001FFFFFFFF
        Range Size: 8 GB
        Physical Device Handle: 0x002D
        Memory Array Mapped Address Handle: 0x002C
        Partition Row Position: Unknown

Handle 0x0030, DMI type 20, 35 bytes
Memory Device Mapped Address
        Starting Address: 0x00200000000
        Ending Address: 0x003FFFFFFFF
        Range Size: 8 GB
        Physical Device Handle: 0x002F
        Memory Array Mapped Address Handle: 0x002C
        Partition Row Position: Unknown

Handle 0x0032, DMI type 20, 35 bytes
Memory Device Mapped Address
        Starting Address: 0x00400000000
        Ending Address: 0x005FFFFFFFF
        Range Size: 8 GB
        Physical Device Handle: 0x0031
        Memory Array Mapped Address Handle: 0x002C
        Partition Row Position: Unknown

Handle 0x0034, DMI type 20, 35 bytes
Memory Device Mapped Address
        Starting Address: 0x00600000000
        Ending Address: 0x007FFFFFFFF
        Range Size: 8 GB
        Physical Device Handle: 0x0033
        Memory Array Mapped Address Handle: 0x002C
        Partition Row Position: Unknown



Memory modules:
Code:
root@freenas[~]# dmidecode -t memory | grep -A23 "0x002"
Handle 0x002B, DMI type 16, 23 bytes
Physical Memory Array
        Location: System Board Or Motherboard
        Use: System Memory
        Error Correction Type: Single-bit ECC
        Maximum Capacity: 64 GB
        Error Information Handle: Not Provided
        Number Of Devices: 4

Handle 0x002D, DMI type 17, 34 bytes
Memory Device
        Array Handle: 0x002B
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 8192 MB
        Form Factor: SODIMM
        Set: None
        Locator: DIMMA1
        Bank Locator: BANK 0
        Type: DDR3
        Type Detail: Synchronous Unbuffered (Unregistered)
        Speed: 1600 MT/s
        Manufacturer: Hynix
        Serial Number: 14254030
        Asset Tag:  BANK 0 DIMMA1 AssetTag
        Part Number: HMT41GA7AFR8A-PB
        Rank: 2
        Configured Memory Speed: 1600 MT/s

Handle 0x002F, DMI type 17, 34 bytes
Memory Device
        Array Handle: 0x002B
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 8192 MB
        Form Factor: SODIMM
        Set: None
        Locator: DIMMA2
        Bank Locator: BANK 0
        Type: DDR3
        Type Detail: Synchronous Unbuffered (Unregistered)
        Speed: 1600 MT/s
        Manufacturer: Hynix
        Serial Number: 14254021
        Asset Tag:  BANK 0 DIMMA2 AssetTag
        Part Number: HMT41GA7AFR8A-PB
        Rank: 2
        Configured Memory Speed: 1600 MT/s

Handle 0x0031, DMI type 17, 34 bytes
Memory Device
        Array Handle: 0x002B
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 8192 MB
        Form Factor: SODIMM
        Set: None
        Locator: DIMMB1
        Bank Locator: BANK 0
        Type: DDR3
        Type Detail: Synchronous Unbuffered (Unregistered)
        Speed: 1600 MT/s
        Manufacturer: Hynix
        Serial Number: 11231852
        Asset Tag:  BANK 0 DIMMB1 AssetTag
        Part Number: HMT41GA7AFR8A-PB
        Rank: 2
        Configured Memory Speed: 1600 MT/s

Handle 0x0033, DMI type 17, 34 bytes
Memory Device
        Array Handle: 0x002B
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 8192 MB
        Form Factor: SODIMM
        Set: None
        Locator: DIMMB2
        Bank Locator: BANK 0
        Type: DDR3
        Type Detail: Synchronous Unbuffered (Unregistered)
        Speed: 1600 MT/s
        Manufacturer: Hynix
        Serial Number: 14253919
        Asset Tag:  BANK 0 DIMMB2 AssetTag
        Part Number: HMT41GA7AFR8A-PB
        Rank: 2
        Configured Memory Speed: 1600 MT/s


Is DIMMB1 failing/dying? Or have I mis-read the logs/information?
 
Joined
Dec 29, 2014
Messages
1,135
It could be a couple of things. It could be a DIMM failing. It can also be a problem with certain pings on the CPU. The one thing that is pretty certainly is that it is some kind of hardware problem.
 

PDM

Dabbler
Joined
Dec 17, 2011
Messages
24
It's strange that it's almost at the same time of day.

Code:
Apr 24 03:27:24 freenas MCA: Vendor "GenuineIntel", ID 0x406d8, APIC ID 0
Apr 24 03:27:24 freenas MCA: CPU 0 COR OVER RD channel 1 memory error
Apr 24 03:27:24 freenas MCA: Address 0x5aa320380
Apr 24 04:27:24 freenas MCA: Bank 5, Status 0x9400004000910091
Apr 24 04:27:24 freenas MCA: Global Cap 0x0000000000000806, Status 0x0000000000000000
Apr 24 04:27:24 freenas MCA: Vendor "GenuineIntel", ID 0x406d8, APIC ID 0
Apr 24 04:27:24 freenas MCA: CPU 0 COR RD channel 1 memory error
Apr 24 04:27:24 freenas MCA: Address 0x5aa320380

Apr 25 03:27:25 freenas MCA: Bank 5, Status 0xd400008000910091
Apr 25 03:27:25 freenas MCA: Global Cap 0x0000000000000806, Status 0x0000000000000000
Apr 25 03:27:25 freenas MCA: Vendor "GenuineIntel", ID 0x406d8, APIC ID 0
Apr 25 03:27:25 freenas MCA: CPU 0 COR OVER RD channel 1 memory error
Apr 25 03:27:25 freenas MCA: Address 0x5aa320380

Apr 26 03:27:25 freenas MCA: Bank 5, Status 0xd40000c000910091
Apr 26 03:27:25 freenas MCA: Global Cap 0x0000000000000806, Status 0x0000000000000000
Apr 26 03:27:25 freenas MCA: Vendor "GenuineIntel", ID 0x406d8, APIC ID 0
Apr 26 03:27:25 freenas MCA: CPU 0 COR OVER RD channel 1 memory error
Apr 26 03:27:25 freenas MCA: Address 0x5aa320380
 
Joined
Dec 29, 2014
Messages
1,135
It's strange that it's almost at the same time of day.
Is it when some system jobs are running? If so, maybe it is using more memory at that time than normal. You can also try re-seating or moving DIMM's around. See if the error follows the DIMM slot, or the DIMM module.
 
Last edited:

PDM

Dabbler
Joined
Dec 17, 2011
Messages
24
Is it when some system jobs are running? If so, maybe it is using more memory at that time than normal. You can also try res-eating or moving DIMM's around. See if the error follows the DIMM slot, or the DIMM module.
That was also my initial suspicion yesterday. However, all snapshots/replication start/finish before midnight.
No cron jobs, scrub is due to run in 6 days.

I think, I'll bring the server down and move the 8GB SO-DIMM I expect maybe the culprit.
 

PDM

Dabbler
Joined
Dec 17, 2011
Messages
24
Having swapped the SO-DIMMs between DIMMB1 and DIMMB2 I get the following error message:
Apr 26 22:34:53 freenas MCA: Global Cap 0x0000000000000806, Status 0x0000000000000000
Apr 26 22:34:53 freenas MCA: Vendor "GenuineIntel", ID 0x406d8, APIC ID 0
Apr 26 22:34:53 freenas MCA: CPU 0 COR OVER RD channel 1 memory error
Apr 26 22:34:53 freenas MCA: Address 0x5a2320380

Code:
root@freenas[~]# dmidecode -t 20
# dmidecode 3.2
# SMBIOS entry point at 0x000f0560
Found SMBIOS entry point in EFI, reading table from /dev/mem.
SMBIOS 2.8 present.

Handle 0x002E, DMI type 20, 35 bytes
Memory Device Mapped Address
        Starting Address: 0x00000000000
        Ending Address: 0x001FFFFFFFF
        Range Size: 8 GB
        Physical Device Handle: 0x002D
        Memory Array Mapped Address Handle: 0x002C
        Partition Row Position: Unknown

Handle 0x0030, DMI type 20, 35 bytes
Memory Device Mapped Address
        Starting Address: 0x00200000000
        Ending Address: 0x003FFFFFFFF
        Range Size: 8 GB
        Physical Device Handle: 0x002F
        Memory Array Mapped Address Handle: 0x002C
        Partition Row Position: Unknown

Handle 0x0032, DMI type 20, 35 bytes
Memory Device Mapped Address
        Starting Address: 0x00400000000
        Ending Address: 0x005FFFFFFFF
        Range Size: 8 GB
        Physical Device Handle: 0x0031
        Memory Array Mapped Address Handle: 0x002C
        Partition Row Position: Unknown

Handle 0x0034, DMI type 20, 35 bytes
Memory Device Mapped Address
        Starting Address: 0x00600000000
        Ending Address: 0x007FFFFFFFF
        Range Size: 8 GB
        Physical Device Handle: 0x0033
        Memory Array Mapped Address Handle: 0x002C
        Partition Row Position: Unknown


Code:
root@freenas[~]# dmidecode -t memory | grep -A23 "0x002"
Handle 0x002B, DMI type 16, 23 bytes
Physical Memory Array
        Location: System Board Or Motherboard
        Use: System Memory
        Error Correction Type: Single-bit ECC
        Maximum Capacity: 64 GB
        Error Information Handle: Not Provided
        Number Of Devices: 4

Handle 0x002D, DMI type 17, 34 bytes
Memory Device
        Array Handle: 0x002B
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 8192 MB
        Form Factor: SODIMM
        Set: None
        Locator: DIMMA1
        Bank Locator: BANK 0
        Type: DDR3
        Type Detail: Synchronous Unbuffered (Unregistered)
        Speed: 1600 MT/s
        Manufacturer: Hynix
        Serial Number: 14254030
        Asset Tag:  BANK 0 DIMMA1 AssetTag
        Part Number: HMT41GA7AFR8A-PB
        Rank: 2
        Configured Memory Speed: 1600 MT/s

Handle 0x002F, DMI type 17, 34 bytes
Memory Device
        Array Handle: 0x002B
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 8192 MB
        Form Factor: SODIMM
        Set: None
        Locator: DIMMA2
        Bank Locator: BANK 0
        Type: DDR3
        Type Detail: Synchronous Unbuffered (Unregistered)
        Speed: 1600 MT/s
        Manufacturer: Hynix
        Serial Number: 14254021
        Asset Tag:  BANK 0 DIMMA2 AssetTag
        Part Number: HMT41GA7AFR8A-PB
        Rank: 2
        Configured Memory Speed: 1600 MT/s

Handle 0x0031, DMI type 17, 34 bytes
Memory Device
        Array Handle: 0x002B
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 8192 MB
        Form Factor: SODIMM
        Set: None
        Locator: DIMMB1
        Bank Locator: BANK 0
        Type: DDR3
        Type Detail: Synchronous Unbuffered (Unregistered)
        Speed: 1600 MT/s
        Manufacturer: Hynix
        Serial Number: 14253919
        Asset Tag:  BANK 0 DIMMB1 AssetTag
        Part Number: HMT41GA7AFR8A-PB
        Rank: 2
        Configured Memory Speed: 1600 MT/s

Handle 0x0033, DMI type 17, 34 bytes
Memory Device
        Array Handle: 0x002B
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 8192 MB
        Form Factor: SODIMM
        Set: None
        Locator: DIMMB2
        Bank Locator: BANK 0
        Type: DDR3
        Type Detail: Synchronous Unbuffered (Unregistered)
        Speed: 1600 MT/s
        Manufacturer: Hynix
        Serial Number: 11231852
        Asset Tag:  BANK 0 DIMMB2 AssetTag
        Part Number: HMT41GA7AFR8A-PB
        Rank: 2
        Configured Memory Speed: 1600 MT/s


Not sure what to do now. Mem86 tests will take sometime.
 
Last edited:
Top