Single Bit ECC Memory Errors during Burn-in and Testing

Status
Not open for further replies.

NAScent

Cadet
Joined
Jul 19, 2017
Messages
6
I've finally gotten around to assembling and testing my FreeNAS HPE ML10 build.

After spending a little over a week running 130 iterations of memtest86 and a day of CPU stress testing, I rebooted to double-check some BIOS settings before working on drive tests and burn-in. In the BIOS logs, I see 10 "Single Bit ECC Memory Error"' Smbios 0x01 errors.

I know ECC's purpose is to correct these, but is this something I should be worried about? If so, is there a way to know which module threw the errors?

In case I'm doing something wrong with my RAM, here is what I have setup.

The original 4GB DIMM that came with the system is in slot 3 (zero-based) and the 16GB module I added is in slot #1, which I believe is the order in which the user manual said to populate them. Memory Timings (tCL-tRCD-tRP-tRAS) are listed as 15-36 and the "Memory Scrambler' is Enabled.

MemTest86 lists the two modules as:

1. PC4-17000 DDR4 ECC 2134Mhz / 15-15-15-36 / Samsung M391A2K43BB1-CPB
2. PC4-17000 DDR4 ECC 2134Mhz / 15-15-15-36 / Sk Hynix HMA451U7AFR8

The Samsung is a dual-rank DIMM, while the Hynix is single-rank and there are minor differences in some of the more advanced timings MemTest86 can display, like tRCD timings.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358
Great that it’s working, but I’d determine if it’s one of the DIMMs or one of your slots that is faulty, and resolve.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
After spending a little over a week running 130 iterations of memtest86 and a day of CPU stress testing, I rebooted to double-check some BIOS settings before working on drive tests and burn-in. In the BIOS logs, I see 10 "Single Bit ECC Memory Error"' Smbios 0x01 errors.

I know ECC's purpose is to correct these, but is this something I should be worried about? If so, is there a way to know which module threw the errors?
I don't think you are doing anything wrong, but to test which module is causing the error, test them one at a time. Many of the HP systems I have worked on have built-in diagnostics for testing the memory. If this unit has that, use it to test the RAM. If the memory that was supplied with the system fails diagnostics, HP will replace it under warranty. If the module you purchased separately fails the HP diagnostic, it is likely that the memory company has a warranty that can be accessed. In either case, I would attempt to replace a module that was giving errors, even if the ECC system is catching and correcting the errors. Best to have it working as well as possible before putting your data on it.
 
Status
Not open for further replies.
Top