Possible memory error?

Status
Not open for further replies.
Joined
Oct 2, 2014
Messages
925
FreeNAS kernel log messages:
> MCA: Bank 8, Status 0x8c0000400001009f
> MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
> MCA: Vendor "GenuineIntel", ID 0x106a5, APIC ID 0
> MCA: CPU 0 COR (1) RD channel ?? memory error
> MCA: Address 0x20530d00
> MCA: Misc 0xea22961000085b43

-- End of security output --

Memtest originally passed all the RAM after 5 days of testing, should i remove the possible bad RAM, if it is in fact defective RAM?

EDIT: System specs:

FreeNAS Server 9.3
Supermicro X8DTE-CS045 , dual E5530 @ 2.40Ghz with noctua heatsinks
48GB of ram (for now)
M1015 with 2 SAS cables to the backplane
x2 80GB Intel SSD's for boot
x7 4TB WD Red's RAIDz3
x7 2TB WD RE4's RAIDz3
x7 2TB WD RE4's RAIDz3
Chelsio T420-CR 10GB card
Supermicro SC846E16-R1200B case
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, I'm not sure what your hardware is that reported this error.

But, memtesting a system that is allegedly using ECC is almost useless.

If you assume ECC is working and not just "supported", any error should be caught by the hardware and not by memtest's test pattern. The correction via ECC would occur at a lower level than memtest is working at, so it would be unaware. I am not aware if memtest monitors the memory controller for ECC errors or not. I asked this question in the forum a year or two ago and nobody ever responded. So the more conservative answer is to assume it doesn't monitor the memory controller. I'm not sure if I'd expect it to since it runs from DOS.

Then, if you assume ECC is working, any error that memtest is going to catch is likely to be found during regular use of the server (and corrected during regular use of the server).

Then, based on discussions with lots of guys around iXsystems, they find that memtest really doesn't stress the RAM enough to catch all errors. They've found that trying to do a large compile operation, like compiling FreeNAS, to be a very good test as it will generally abort on a single-byte error.

So nothing is lost by doing a memtest on an ECC based system aside from the time spent waiting on the tests, but there is a non-zero probability of gain. It probably is an essentially a zero gain, hence I call it "almost" useless.
 
Joined
Oct 2, 2014
Messages
925
Well it got sent to me as an email, so i wasnt sure if FreeNAS picked something up or not. Guess i might be doing some calculations or something to stress it a little
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well it got sent to me as an email, so i wasnt sure if FreeNAS picked something up or not. Guess i might be doing some calculations or something to stress it a little

Sorry, I mean you've got multiple systems in your sig. So I can't tell you which system provided that error. Also, system info in a sig is useless as many people use tapatalk, and no sigs are shown in tapatalk. So its best to quote your own hardware, even if its in the sig.
 
Joined
Oct 2, 2014
Messages
925
Sorry, I mean you've got multiple systems in your sig. So I can't tell you which system provided that error. Also, system info in a sig is useless as many people use tapatalk, and no sigs are shown in tapatalk. So its best to quote your own hardware, even if its in the sig.
I'll make that change :P sorry about that. The FreeNAS server:

FreeNAS Server 9.3
Supermicro X8DTE-CS045 , dual E5530 @ 2.40Ghz with noctua heatsinks
48GB of ram (for now)
M1015 with 2 SAS cables to the backplane
x2 80GB Intel SSD's for boot
x7 4TB WD Red's RAIDz3
x7 2TB WD RE4's RAIDz3
x7 2TB WD RE4's RAIDz3
Chelsio T420-CR 10GB card
Supermicro SC846E16-R1200B case
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, I have good news and bad.

Good news: A stick of RAM is bad, but the error was correctable. There is a decoder table to figure out which stick of RAM it is.
Bad news: I don't know where to get the table, but Supermicro knows. So you'll have to contact them to get the decoder table.

The bank 8 may mean that it's the 9th stick, but I'm not sure what the order is. Also the bank number doesn't always correlate to the stick/slot, hence the need for the decoder table. It's got a bunch of "if you have this model of board, do this; if you have that model do that" stuff.
 
Joined
Oct 2, 2014
Messages
925
Well the gentleman i spoke to was almost completely useless, he said the decoder table is something i would give to supermicro after running "dmi decoding". Im going to poke around google see what i can find
 

JJT211

Patron
Joined
Jul 4, 2014
Messages
323
Wow, how did I miss this thread!?!?

I'm having very similar error and it appears we had it at almost the same time!

Code:
MCA: Bank 5, Status 0xd40000c000900090
> MCA: Global Cap 0x0000000000000806, Status 0x0000000000000000
> MCA: Vendor "GenuineIntel", ID 0x406d8, APIC ID 0
> MCA: CPU 0 COR OVER RD channel 0 memory error
> MCA: Address 0x12ef39498


I memtest'd 3 passes and. same as you, didnt get any errors. My system is running just fine.

https://forums.freenas.org/index.php?threads/mca-memory-error.30276/
 
Joined
Oct 2, 2014
Messages
925
Wow, how did I miss this thread!?!?

I'm having very similar error and it appears we had it at almost the same time!

Code:
MCA: Bank 5, Status 0xd40000c000900090
> MCA: Global Cap 0x0000000000000806, Status 0x0000000000000000
> MCA: Vendor "GenuineIntel", ID 0x406d8, APIC ID 0
> MCA: CPU 0 COR OVER RD channel 0 memory error
> MCA: Address 0x12ef39498


I memtest'd 3 passes and. same as you, didnt get any errors. My system is running just fine.

https://forums.freenas.org/index.php?threads/mca-memory-error.30276/
I saw your thread earlier , i just removed the *possible* bad stick, as i have 12 DIMM slots...and it says DIMM 8 has errors yet my DIMM's arent labeled that way soooo i counted, and pulled what i think is DIMM 8 lol
 

JJT211

Patron
Joined
Jul 4, 2014
Messages
323
Well it got sent to me as an email, so i wasnt sure if FreeNAS picked something up or not. Guess i might be doing some calculations or something to stress it a little

As far as stress test goes, I just set Plex to "My my CPU hurt" and played full 1080p video files on as many devices as I could. Not too long after I would get a single error.

I recently added some new RAM I bought used over Ebay. Thats gotta be it, Im just trying to isolate to which specific stick it might be. See my thread. above
 

JJT211

Patron
Joined
Jul 4, 2014
Messages
323
I saw your thread earlier , i just removed the *possible* bad stick, as i have 12 DIMM slots...and it says DIMM 8 has errors yet my DIMM's arent labeled that way soooo i counted, and pulled what i think is DIMM 8 lol

LOL....Yea, not too sure that would work for me as I have only 4 slots and my error says Bank 5.....i dunno

As a matter of fact, I think Ill remove one right now and see what happens.
 

JJT211

Patron
Joined
Jul 4, 2014
Messages
323
Well my system apparently wont let me use 3 slots. It's either all 4,2 or 1. So I got the 2 suspect ones in there now. Lets see what happens.
 

JJT211

Patron
Joined
Jul 4, 2014
Messages
323
What kind of RAM are you running btw? Kingston?

I heard Cyberjock mention Kingston has been up to Shenanigans recently. And it isnt listed on Supermicros approved list.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Note that memory banks and memory slots are two different things.
 
Last edited:

JJT211

Patron
Joined
Jul 4, 2014
Messages
323
Cool thx, think im going to head back over to my thread as to not hijack the OP's thread...
 
Joined
Oct 2, 2014
Messages
925
No it isnt kingston, i need to look into what RAM it is, i pulled it and rebooted and let it run. I dont have a spare server to take down to memtest the ram in, even if memtest may or may not find the error(s)
 
Joined
Oct 2, 2014
Messages
925
Well, so far so good. No emails, i pulled the 8th DIMM and all seems to be good
 
Status
Not open for further replies.
Top