Hi, every 2-4 weeks I get one or two memory errors sent via the daily security output. The last two were actually just 2 days apart which does make me paranoid now.
These are all messages I got over the last 2 months (i.e. since the machine went online):
Does Bank indicate the ram stick?
If yes, then as there is Bank 7, 9 and 12 affected, I'm wondering if this can really be three bad ram sticks? Or if this more probably has to do with cpu or motherboard.
There is nothing logged in the event log at the ipmi interface.
System specs are:
256GB of DDR4 Samsung regECC LRDIMM 2400mhz RAM (8x 32GB Sticks)
Supermicro X10SRL-F Mainboard
Xeon 2630L V3 CPU
6x 12TB Ironwolf HDD (3 striped Mirrors)
Freenas 11.1 U7
In other threads I read that these are most probably ECC corrected errors, however I wonder why the IPMI Event log doesn't show anything. It was often mentioned that this comes with an IPMI event log. I can swap out the ram as I have another 256gb in another system but before I take the machines offline I'd like to get your thoughts if this can actually be the ram sticks when the errors are on 3 different Banks.
The system is working without any noticable failures for the last 2 months. Only the security output makes me paranoid.
Thanks for your help!
These are all messages I got over the last 2 months (i.e. since the machine went online):
MCA: Bank 12, Status 0x8c000043000800c3
MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 0
MCA: CPU 0 COR (1) MS channel 3 memory error
MCA: Address 0x27db12fac0
MCA: Misc 0x123180004000468c
MCA: Bank 9, Status 0x8c000042000800c0
MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 0
MCA: CPU 0 COR (1) MS channel 0 memory error
MCA: Address 0x13c001cb00
MCA: Misc 0x122942000200048c
MCA: Bank 12, Status 0x8c00004d000800c3
MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 0
MCA: CPU 0 COR (1) MS channel 3 memory error
MCA: Address 0x164d8dcbc0
MCA: Misc 0x1229500010001a8c
MCA: Bank 7, Status 0x8c00004000010093
MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 0
MCA: CPU 0 COR (1) RD channel 3 memory error
MCA: Address 0x27db12fa80
MCA: Misc 0x15030b086
MCA: Bank 12, Status 0x8c000043000800c3
MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 0
MCA: CPU 0 COR (1) MS channel 3 memory error
MCA: Address 0x27db12fac0
MCA: Misc 0x123180004000468c
Does Bank indicate the ram stick?
If yes, then as there is Bank 7, 9 and 12 affected, I'm wondering if this can really be three bad ram sticks? Or if this more probably has to do with cpu or motherboard.
There is nothing logged in the event log at the ipmi interface.
System specs are:
256GB of DDR4 Samsung regECC LRDIMM 2400mhz RAM (8x 32GB Sticks)
Supermicro X10SRL-F Mainboard
Xeon 2630L V3 CPU
6x 12TB Ironwolf HDD (3 striped Mirrors)
Freenas 11.1 U7
In other threads I read that these are most probably ECC corrected errors, however I wonder why the IPMI Event log doesn't show anything. It was often mentioned that this comes with an IPMI event log. I can swap out the ram as I have another 256gb in another system but before I take the machines offline I'd like to get your thoughts if this can actually be the ram sticks when the errors are on 3 different Banks.
The system is working without any noticable failures for the last 2 months. Only the security output makes me paranoid.
Thanks for your help!