Security Report shows Error. Memtest shows nothing!

Status
Not open for further replies.

SumitB

Dabbler
Joined
Aug 26, 2014
Messages
15
I recently built a Supermicro based FreeNAS server with the following components:


Supermicro Server Motherboard X10SRI-F
Xeon E5-1650V3 with Supermicro Heat Sink
8 x 16 GB DDR4 ECC RDIMM 2133 MHz Samsung (Supermicro Certified)
Intel X540 Dual Port 10G BaseT
LSI 9211-8i HBA Controller with Cables
4U Supermicro Chassis CSE847E16-R1400LPB with Slide Rails

Before deploying the server, I ran a memtest for about 72 hours. No errors were detected. About 20 days later, this is what the Daily Security Report showed:

freenas.local kernel log messages:
> da5: quirks=0x8<4K>
> SMP: AP CPU #8 Launched!
> SMP: AP CPU #3 Launched!
> SMP: AP CPU #7 Launched!
> SMP: AP CPU #11 Launched!
> SMP: AP CPU #6 Launched!
> Timecounter "TSC-low" frequency 1750034808 Hz quality 1000
> vboxdrv: fAsync=0 offMin=0x454 offMax=0x16db
> arp: 10.0.1.200 moved from a0:f3:c1:f2:32:2e to 00:25:90:a3:1b:28 on epair2b
> arp: 10.0.1.200 moved from 00:25:90:a3:1b:28 to a0:f3:c1:f2:32:2e on epair0b
> arp: 10.0.1.200 moved from a0:f3:c1:f2:32:2e to 00:25:90:a3:1b:28 on epair0b
> arp: 10.0.1.200 moved from 00:25:90:a3:1b:28 to a0:f3:c1:f2:32:2e on epair0b
> MCA: Bank 11, Status 0x8c00004d000800c2
> MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
> MCA: Vendor "GenuineIntel", ID 0x306f2, APIC ID 0
> MCA: CPU 0 COR (1) MS channel 2 memory error
> MCA: Address 0x1db3d74180
> MCA: Misc 0x908400080009a8c

Checking for packages with security vulnerabilities:
samba43-4.3.4_100116
pcre-8.38

-- End of security output --

I took down the server and have been running a memtest for the past 2 days now. Attached is the screenshot.
Screenshot 2016-04-20 12.43.23.png


Question is, what should I do now?
 
Joined
Apr 9, 2015
Messages
1,258
Memtest has to be a pro version to inject and test for ECC issues.

http://www.memtest86.com/features.htm

It could be as simple as a piece of dust or some oil from your fingers on a contact causing a problem, possibly power fluctuations or something nearby. It the system is not on a UPS it needs to be on one, I have seen places where one outlet has noise and substandard power/won't properly run a computer and others that are fine on the same wall a mere 4 feet away. It is also possible the ram is getting too hot so make sure there is adequate airflow in the case and over the ram.

I would power down, open it up, and remove/reseat all the ram as well as make sure the contacts are clean and slots are blown out as well as make sure it is on a good UPS and possibly pull power from another source.

The error message is showing a couple corrected errors from what I remember so it did it's job.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
Last edited:
Status
Not open for further replies.
Top