MCA errors output from system email

Nacman

Dabbler
Joined
Jul 23, 2018
Messages
11
Hi All,

Super green new user of FreeNAS 11.2 U3.

Last June I built my system to handle my home automation server, Media library/server. I started with 64GB of ECC ram.

Then no sooner built I had to pack up my house and move. Finally got into my new house ( last month) and after connecting up to my lan I configured ups monitoring, email for logs etc. Started reviewing logs once they started dropping but I must report all seems to be working fine. However, according to this email output below it seems I have a faulty module. I say module because it appears to me to reference the same location, 4x.

1) Want to make sure I am reading it right.
2) I did some googling and it was just enough to make me think maybe it is possibly other hardware. Reason for Q#1.
3) I could and will, if it the advice here to start halving my installed ram, test and determine said module, is there a better way or tool for this?
4) Why four entries of apparently the same thing. Is the log cumalitive or am I reading it wrong?

Thanks for any help/advice.
-Nac



kernel log messages:
MCA: Bank 11, Status 0x8c000050000800c2​
MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000​
MCA: Vendor "GenuineIntel", ID 0x406f1, APIC ID 0​
MCA: CPU 0 COR (1) MS channel 2 memory error​
MCA: Address 0x3effcfc80​
MCA: Misc 0x122100004000608c​

MCA: Bank 11, Status 0x8c000050000800c2​
MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000​
MCA: Vendor "GenuineIntel", ID 0x406f1, APIC ID 0​
MCA: CPU 0 COR (1) MS channel 2 memory error​
MCA: Address 0x3effcfc80​
MCA: Misc 0x122100004000608c​

MCA: Bank 11, Status 0x8c000050000800c2​
MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000​
MCA: Vendor "GenuineIntel", ID 0x406f1, APIC ID 0​
MCA: CPU 0 COR (1) MS channel 2 memory error​
MCA: Address 0x3effcfc80​
MCA: Misc 0x122100004000608c​

MCA: Bank 11, Status 0x8c000050000800c2​
MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000​
MCA: Vendor "GenuineIntel", ID 0x406f1, APIC ID 0​
MCA: CPU 0 COR (1) MS channel 2 memory error​
MCA: Address 0x3effcfc80​
MCA: Misc 0x122100004000608c​
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Is this the X10SRH-CLN4F? It's important that you be specific about your hardware when asking for help with hardware issues.

If the answer is "yes" or "similar board", open the IPMI webGUI and have a look at the IPMI event log. It will have date and time and we can work from there.
 

Nacman

Dabbler
Joined
Jul 23, 2018
Messages
11
It is the system in my build and as you inquired, the X10SRH-CLN4F . I am in the event log, but there's nothing except fan entries back in 2018 when I was building the system. wrong log?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
No, it should be the right log. Which is weird, although there are some rumors of vendors suppressing logging of ECC errors so as to not scare customers.

Try running memtest for a couple of passes and see if anything gets logged to the IPMI event log. You might want to try the latest memtest UEFI version that supposedly can decode ECC RAM registers to figure out there was an error. I say "supposedly" because their ECC error injection feature was a complete bust when tested on several different platforms.
 

Nacman

Dabbler
Joined
Jul 23, 2018
Messages
11
I am running the UEFI version now. Letting it run four passes overnight. I did not select or make configuration changes to Memtest, it just started running. Will report back what I find. Thank you for helping me out!
 

Nacman

Dabbler
Joined
Jul 23, 2018
Messages
11
I have ran 4 passes of Memtest and no errors. Not sure if I needed to change any defaults, but it didn’t report errors. Too strange.
17953EA0-5F4C-4607-909A-965DA7FEA2F0.jpeg
75E199A4-4C27-4A38-95EE-A344082FD74A.jpeg
 

Nacman

Dabbler
Joined
Jul 23, 2018
Messages
11
I received another email from FreeNAS security output detailing the same errors in the same sequence as before. Maybe a bug?
 

JediDan

Dabbler
Joined
Apr 9, 2019
Messages
11
Any progress on this issue? I recently received a similar system report email and am curious what the resolution was.
 
Top