SOLVED Supermicro SEL "Assertion: Memory"

Status
Not open for further replies.

Harsesis

Explorer
Joined
Jan 21, 2014
Messages
95
Hi,

I got my Hardware for my new freenas build. The board is an Supermicro X10SL7-F with 32 gig of Samsung Memory (M391B1G73QH0-YK0). CPU is an Xeon E3-1230.

I know assembled the whole build and started the burn-in tests. Memtest now ran for 50h+ and reported no errors, but when I check my SEL I find four messages saying:

Code:
Assertion: Memory| Event = Correctable ECC@DIMMB1(CPU1)


I played arround a bit and found out that this error seems to accour from time to time when I reboot the system. It never occoured while running the memtest. Is one of my modules (DIMMB1) faulty?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Could be. Could also be bad power (sounds more likely if it only happens at boot).
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Swap your RAM around and see if the failure follows the stick of RAM or if it remains at DIMMB1. If the problem moves then it is likely a faulty stick of RAM. If it remains then it could be the power supply (the easiest thing to replace) or possibly the motherboard. Also, ensure your BIOS is setup for the RAM properly (speed, timing, etc...)
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Swap your RAM around and see if the failure follows the stick of RAM or if it remains at DIMMB1. If the problem moves then it is likely a faulty stick of RAM. If it remains then it could be the power supply (the easiest thing to replace) or possibly the motherboard. Also, ensure your BIOS is setup for the RAM properly (speed, timing, etc...)
I was thinking that it might follow the DIMM, if it's a marginal (on the "working" side) one and bad power is causing this.

@Harsesis - what PSU are you using?
 

Harsesis

Explorer
Joined
Jan 21, 2014
Messages
95
Hi,

thanks for your response! The PSU I'm using is a new Seasonic G450. I will then start checking the BIOS (changed nothing here), than swapping the DIMM's, if that does not make any difference I could change the PSU (should have laying arround some older ones).
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Well, Seasonic does reduce the probability of it being the PSU - always a possibility, though.
 

Harsesis

Explorer
Joined
Jan 21, 2014
Messages
95
So I checked the BIOS and I'm not 100% confident wheather the settings are corret. Futhermore I dont know how I can change them. Is there a way to change the timings? I've made a screenshot of the current settings, you can find it here. The datasheet of the ram can be found here. When I understand the datasheet correctly the settings of tRCDmin, tRPmin and tRASmin are wrong and should be 13.75-13.75-35?

I also swaped DIMMB1 and DIMMA1 on the board. After restarting there was no error in the SEL, but after booting into Memtest86+ version 5.01 a new error appeared in the SEL. Know it is reporting the error from DIMMA1. So there are tow things going on:
- the error moves with the (possibly faulty) DIMM
- the error occours when the systems loads memtest86

So what do you beleve, should I simply contact the vendor and RMA the possibly faulty DIMM?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Manually set your RAM speed to 1333 MHz and that should take care of the timings, which actually look fine for 1600 MHz but if the RAM is actually being pushed up to 1600 MHz, you are safer manually dialing it down.

As to your DIMM modules, I'd reseat them again. Just be careful to not physically break them.

As for RMA, try the above steps first. Your system may not be as stable running the RAM in a turbo speed situation.
 

Harsesis

Explorer
Joined
Jan 21, 2014
Messages
95
With setting the RAM speed you mean setting the memory frequency limiter? Expecting this would help is this the proper way of dealing with it? I mean the modules are specified as 1600, shouldent they deliver that?

I did not quite understand your comment on the RMA. What do you mean with the system could be instable in turbo speed situations? How can I check this and what could be the reason for this?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Since everything is supposed to support DDR3-1600, not running at that speed is plentiful reason to RMA. No need to keep marginal stuff around, even if it stabilizes with a workaround.
 

Harsesis

Explorer
Joined
Jan 21, 2014
Messages
95
So you would go for RMA now? Or is there anything else I could try? Did one of you take a look at my BIOS timing settings? Still not sure if they are correct and if not how I can change them in BIOS...
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
So you would go for RMA now? Or is there anything else I could try? Did one of you take a look at my BIOS timing settings? Still not sure if they are correct and if not how I can change them in BIOS...
That's taken care of automatically with SPD. CL11 sounds right, too.

I'd just try a different PSU first. If everything stays the same, RMA the DIMM.
 

Harsesis

Explorer
Joined
Jan 21, 2014
Messages
95
So good news, I just dicided to order one extra DIMM and replace the other one with this. I allready got my new DIMM today and up to now it seems to work just fine. I decieded to do it this way as I can just reovke the new or one DIMM of the old order without any cost. That was the fastest way and I dont have to do the RMA procedure.

If anything changes and surprisingly new errors would occour I will let you know! Thank you all for you help!
 
Status
Not open for further replies.
Top