ECC working? Single/multi bit? Testing ECC RAM?

Mirax

Cadet
Joined
Jan 7, 2019
Messages
2
Hi!

I'm building my second FreeNAS server. (Aiming at a low power NAS that can run 24/7/365 without I feel it's using too much electricity. My old NAS uses >100 W in idle.)

Relevant to this thread is:
Motherboard: Supermicro X11SSM-F
CPU: Intel Pentium G4560
RAM: 1 x Samsung M391A2K43BB1-CRC (16GB DDR4 2400MHz ECC UDIMM)

(I know 2 DIMMs of the same type will be faster, but since the bottle neck will most likely be 1 Gbit LAN I figure it doesn't matter. I assume I will save a little power by using only one DIMM, and it's probably better to have 1 x 16 GB instead of 2 x 8 GB in the future, if I want to expand or repurpose the RAM.)

How can I know whether ECC is actually working? I can't find anything about ECC in BIOS. I think sometimes you can enable or disable ECC in BIOS, but I didn't find ECC mentioned at all.

When I run "sudo dmidecode -t memory" from Ubuntu Live, it says "Single-bit ECC".
After some googling I found that for some people it says "Multi-bit ECC".
(Dmidecode does not seem to prove that ECC is working though. Someone said that dmidecode reported ECC on a computer he was sure didn't have ECC.)

So there are different kinds of ECC? Should I have bought something else?

I kind of feel like "Hey! There are multi-bit ECC, why do I only have single-bit ECC?"

In case ECC would fix a bit error, or detect >1 bit errors, how will I know?

I was also wondering whether it's any use to run MemTest86 on ECC RAM? Maybe ECC will fix single-bit errors (if any), and MemTest86 will report 0 errors?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I assume I will save a little power by using only one DIMM,
Yes, but not a significant amount of power.
How can I know whether ECC is actually working? I can't find anything about ECC in BIOS. I think sometimes you can enable or disable ECC in BIOS, but I didn't find ECC mentioned at all.
Why would you think it is not working? The system board and processor support ECC; so if the memory is ECC, it should work.
I kind of feel like "Hey! There are multi-bit ECC, why do I only have single-bit ECC?"

In case ECC would fix a bit error, or detect >1 bit errors, how will I know?
ECC memory is supposed to automatically correct single bit errors and notify of multi bit errors. That is what the ECC memory standard calls for and it has been a standard for a long time. I don't know of any change to that.

Building, Burn-In, and Testing your FreeNAS system
https://forums.freenas.org/index.php?resources/building-burn-in-and-testing-your-freenas-system.38/

Github repository for FreeNAS scripts, including disk burnin
https://forums.freenas.org/index.ph...for-freenas-scripts-including-disk-burnin.28/

Uncle Fester's Basic FreeNAS Configuration Guide
https://www.familybrown.org/dokuwiki/doku.php?id=fester:intro
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
ECC memory sometimes is written as;
Single bit detect & correct​
Multi-bit detect & report, (but not correct)​

As long as you buy ECC support from each of the following, you are good;
  • Motherboard, (and with ECC supporting BIOS)
  • CPU
  • Memory
  • OS, (FreeBSD & FreeNAS support ECC memory)
As a side note, it appears DDR5 memory, which is just coming out, may support more ECC options. From my reading, you can get 2 x 32 bit channels, (each with 8 bits of ECC), on each DIMM. Weird, but it's supposed to allow independant R/W from each half.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Yes, but not a significant amount of power.
Could be upwards of 7 watts.
ECC memory is supposed to automatically correct single bit errors and notify of multi bit errors. That is what the ECC memory standard calls for and it has been a standard for a long time. I don't know of any change to that.
Multi bit correcting ECC is a thing. I have never seen it though. If you are that concerned about downtime caused by memory, many high end server boards have options for mirroring DIMMs. As you can imagine that can get quite expensive as half your RAM is unusable.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925

rvassar

Guru
Joined
May 2, 2018
Messages
972
Multi bit correcting ECC is a thing. I have never seen it though.

I'm sitting here trying to remember if the old Sun SPARC memory was multi-bit ECC. I know the end of the 32 bit era (SS20...) saw memory modules 144 bits wide. But I always thought that was 128 bits tag line width + ECC. I have no idea if it was Hamming code or Reed-Solomon... I suspect the later would be required at some point for multi-bit.

They did later develop something they called Extended ECC, but it's my understanding it was more like RAID for RAM, ala IBM's Mainframe stuff.
 

Mirax

Cadet
Joined
Jan 7, 2019
Messages
2
Thank you for your replies! I appreciate all input.

If I would save 3 W by having one DIMM instead of two, that's more than 10% of the total system power, so I think it's significant enough to me. (On my old server it wouldn't make much of a difference though.)

The harddrives were running for a couple of months last year (before I decided about MB/CPU) and I recently ran a CPU stress test for over a week. Also testing the pico PSU on an old computer (less loss if the PSU would damage the computer).

Now I'm running Memtest86+ 5.01, but I have no idea whether it will report single bit errors on ECC memory. Hopefully ECC errors will appear in the Event Log (that I can see via IPMI), but I don't know. Memtest has been running for around a week now, and I guess I'll let it run for at least one week more.

ECC should work, but how can I be sure? When making an effort to build a reliable system, and spending maybe $200 extra (more expensive MB and RAM) it would be nice to confirm that it's actually working. I haven't even seen ECC mentioned at all in BIOS. It can't be that unusual that people want to confirm that ECC is working?

It's a little weird that the same tool (dmidecode) reports "Single-bit ECC" to some people and and "Multi-bit ECC" to other people. Maybe the first one is error correction and the latter is detection, like Arwen said. But still weird, since it's the same tool.
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I'm sitting here trying to remember if the old Sun SPARC memory was multi-bit ECC. I know the end of the 32 bit era (SS20...) saw memory modules 144 bits wide. But I always thought that was 128 bits tag line width + ECC. I have no idea if it was Hamming code or Reed-Solomon... I suspect the later would be required at some point for multi-bit.

They did later develop something they called Extended ECC, but it's my understanding it was more like RAID for RAM, ala IBM's Mainframe stuff.

"Chipkill" is the most commonly known multi-bit ECC correction, probably just because of the cool name. I believe they used a BCH code, or maybe it was multiple Hamming iterations with the checksums spread across multiple physical RAM chips.

Most vendors have a "RAM RAID" or "DIMM sparing" option nowadays as well, but I rarely see it in practical use.
 
Top