Memory errors

TobiT92

Cadet
Joined
Apr 17, 2017
Messages
8
Hi, every 2-4 weeks I get one or two memory errors sent via the daily security output. The last two were actually just 2 days apart which does make me paranoid now.
These are all messages I got over the last 2 months (i.e. since the machine went online):


MCA: Bank 12, Status 0x8c000043000800c3
MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 0
MCA: CPU 0 COR (1) MS channel 3 memory error
MCA: Address 0x27db12fac0
MCA: Misc 0x123180004000468c


MCA: Bank 9, Status 0x8c000042000800c0
MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 0
MCA: CPU 0 COR (1) MS channel 0 memory error
MCA: Address 0x13c001cb00
MCA: Misc 0x122942000200048c


MCA: Bank 12, Status 0x8c00004d000800c3
MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 0
MCA: CPU 0 COR (1) MS channel 3 memory error
MCA: Address 0x164d8dcbc0
MCA: Misc 0x1229500010001a8c


MCA: Bank 7, Status 0x8c00004000010093
MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 0
MCA: CPU 0 COR (1) RD channel 3 memory error
MCA: Address 0x27db12fa80
MCA: Misc 0x15030b086
MCA: Bank 12, Status 0x8c000043000800c3
MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 0
MCA: CPU 0 COR (1) MS channel 3 memory error
MCA: Address 0x27db12fac0
MCA: Misc 0x123180004000468c



Does Bank indicate the ram stick?
If yes, then as there is Bank 7, 9 and 12 affected, I'm wondering if this can really be three bad ram sticks? Or if this more probably has to do with cpu or motherboard.

There is nothing logged in the event log at the ipmi interface.

System specs are:

256GB of DDR4 Samsung regECC LRDIMM 2400mhz RAM (8x 32GB Sticks)
Supermicro X10SRL-F Mainboard
Xeon 2630L V3 CPU
6x 12TB Ironwolf HDD (3 striped Mirrors)

Freenas 11.1 U7



In other threads I read that these are most probably ECC corrected errors, however I wonder why the IPMI Event log doesn't show anything. It was often mentioned that this comes with an IPMI event log. I can swap out the ram as I have another 256gb in another system but before I take the machines offline I'd like to get your thoughts if this can actually be the ram sticks when the errors are on 3 different Banks.

The system is working without any noticable failures for the last 2 months. Only the security output makes me paranoid.


Thanks for your help!
 

TobiT92

Cadet
Joined
Apr 17, 2017
Messages
8
Thanks, that is the same error on the same motherboard. Unfortunately, the other thread opener does not report what the problem was in the end. The answers say that it can be anyone of RAM, CPU and MB.

My question is basically, does 3 different banks indicate that it is rather the cpu or the mb, as it is highly unlikely that 3 different RAM sticks are bad?
 
Joined
Jan 7, 2015
Messages
1,155
I took it as the CPU was likely the culprit. Have you reseated the CPU? I agree its hard to say all ram is bad at same time. If you can try running on a stick or two of ram at a time and see if the errors persist on any combination, id point toward the CPU. Does this memory pass a 24 hour memtest or 4 passes. Any errors in memtest are a fail. One of the lines in these errors are consistent "CPU 0 COR (1) MS channel 3 memory error"

Id rank the board as least likely. And the ram stick in slot 3 as most likely.
 

TobiT92

Cadet
Joined
Apr 17, 2017
Messages
8
one line it says channel 0, not 3.

Had another error today:

MCA: Bank 12, Status 0x8c000043000800c3
MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 0
MCA: CPU 0 COR (1) MS channel 3 memory error
MCA: Address 0x27db12fac0
MCA: Misc 0x123180004000468c


that is the same as yesterday.
 
Joined
Jan 7, 2015
Messages
1,155
4 pass memtest or different ram. I'd rule that out first since it's the easiest to test. Good luck.
 

TobiT92

Cadet
Joined
Apr 17, 2017
Messages
8
I've changed the RAM 2 weeks ago and haven't had an error since.

The ram is running in a workstation now. I did not come around to run memtest yet but windows runs normally without any errors.
Thanks for the answers, I will update this thread when I have done the memtests
 
Top