Failing Supermicro x11 motherboard

Jasse Jansson

Explorer
Joined
Mar 19, 2017
Messages
71
Have been messing around with my servers latley.

This server setup:
Supermicro X11SSM
32Gb ECC RAM
Seasonic PSU
LSI controller (9210 with IT firmware)
6 WD red 3Tb disks

Boots from mirrored USB sticks because I used all 8 SATA ports until about 2 weeks ago.

Ran memtest86 a week ago and found the one memory module was reporting recoverable failures.
This with modules in slot dimmB2 and dimmA2 (16Gb modules)

(Note, dimmB2 should be the first slot populated, then dimmA2 per the manual)

Removed module from slot dimmB2, module in slot dimmA2.
No errors reported.

Reinserted the module in dimmB2 and removed module from slot dimmA2.
Recoverable error happened again.

Oki, now I know what RAM module is failing, it's put aside for recycling.

Installed more disks and just checked that the LSI controller found all disks.

A couple days later (today) I decided to fix this server for future use.
Remembered the working RAM module is in the "second" slot, so I moved it to dimmB2.

Now I couldn't create a pool out of my 6 3Tb WD reds, nor could I upgrade the system to U2.
Whut ??? It worked a few days ago.

Got a fancy idea, move the RAM module to the secondary slot (dimmA2).
Now I have created a nice raidZ pool and right now rebooting in to U2.

Question:
Is it the motherboard itself or the CPU that's failing ???
 

Jasse Jansson

Explorer
Joined
Mar 19, 2017
Messages
71
Forgot to mention that the CPU is an i3-6100.
The memory controller is inside the CPU, right ?

Does it make sense to redo the memory tests with another CPU ?
I have a spare.
 

Jasse Jansson

Explorer
Joined
Mar 19, 2017
Messages
71
Moved the good module to slot dimmB2.

Just started memtest86.

about 3 minutes in I got this:
ECC errors detected (cil, row, rank, bank) (88, EC26, 3, 2)

6 minutes in, another error in the same row.

Both are correctible errors, but something is not right here.
 

Jasse Jansson

Explorer
Joined
Mar 19, 2017
Messages
71
Good module back to slot dimmB2.
First pass without any errors.
I'm going nuts.
 
Top