Here's my attempt to explain how it works:
Remember the Pythagorean theorem from grade school/high school? A^2 + B^2 = C^2
So think of it like this. A & B are your RAM locations, C is your ECC portion of RAM. Now, using the actual data stored you calculated an arbitrary value. So if A or B is wrong, then you can recalculate it. it's simple algebra to solve for any misisng letter.
Inside the computer all that happens is that when RAM is read it goes through the memory controller(for ECC RAM you get 72 bits instead of 64-bits despite 64 bits
always being requested). Even if you request only 1 single bit, you will still end up retrieving 72-bits from RAM as the ECC check is part of the pipeline to the CPU. It compares A, B, and C. If all is well, then A and B go through to the CPU for processing/execution. C isn't important for your program to run.
Now let say that A is wrong. Then it calculates for the correct A, writes that to RAM and then sends the correct A & B to your CPU(normally it goes to L2 cache). I've been a little foggy on this and not gotten what I consider 100% proof of this, but the memory controller then reads back the location it wrote to to verify it was written correctly. If it hasn't you get your BIOS error log entry that RAM is bad. This prevents you from getting error messages from radiation, etc. After all, since all RAM has random errors from radiation you don't want a log filling up with those kinds of errors since you can't really fix them. You just want errors from things you can control like stuck bits so you can RMA that memory stick, right?
Now what if A & B are wrong. Now you have a problem. If you remember from math class you can only solve for equations with 1 unknown variable. You now have 2; A & B. So you get a system halt and an error message in your system log.
So as you can see, that ECC stuff is actually pretty cool and very helpful. You're protected from trashing any data on any disks because the corrections are made as the system runs through its normal routine. And if the memory controller hits a situation it can't get out of, the system halts.
It's not the greatest example, but it has the virtue of being extremely simple to understand. So easy a high schooler will get it.
Now, to see how the real ECC RAM works its a bit more complicated. It doesn't do the Pythagorean theorem because 1/3 of your information is not actual code. You only got to store 2 pieces of data out of 3. That's kind of a bad return on investment for RAM. Good RAM is not that expensive. We also need to do this on a scale that protects large amounts of RAM while not requiring large amounts of RAM to correct the errors. par/par2s use something called
Reed-Solomon error correction(R/S). An alternative to R/S is
XOR. The link explains it well enough that I won't explain it here. (If you check out that link, you'll see parity is mentioned. Anyone remember "parity RAM" from way back in the day?)
Anyway, that's how ECC does its calculation. With given inputs you will get a given output. With a certain amount of bad input you can identify and fix them, but beyond that you cannot fix them but you can identify that "something is wrong"(this is what I tried to explain above). It sounds pretty familiar to Par2s.
Reed-Solomon error correction is pretty awesome. It's been used tons of devices like CD/DVD/Blu-Ray checksums, some transmission protocols,
SSDs use it for checksums on memory pages, and in probably every RAID controller you have ever owned that did RAID5 and RAID6(bet you wondered how it came up with that "parity" data huh? now you know...). It's used in tons of places to protect information because it is so robust.
So what's the difference between ECC and non-ECC RAM physically/electrically? Literally, you have 8 more bits. Typically, RAM has 2, 4, or 8 chips on it. In the case of 2 or 4 there will be a smaller extra chip that handles the extra 8 bits. But for 8 chip RAM sticks, some are special. Some manufacturers make only ECC PCBs. This helps with production costs as they have to make only 1 PCB to cover their entire line. Then, when they decide to make a RAM sick they decide to either go with 8 chips(non-ECC RAM) or 9 chips(ECC RAM). All of the chips are identical in 8 bit increments(8bits per chip * 8 chips= 64 bits; 8 bits per chip * 9 chips = 72 bits). That's it! (Don't confuse this with registered RAM that has to mitigate capacitance from high density RAM). Here's a
picture comparing ECC and non-ECC RAM. Notice the number of chips.
Check out the below picture:
See the white square? That's where the 9th memory chip would go. So I know without looking up this RAM model it is definitely non-ECC RAM. ECC RAM will always have 3, 5 or 9 ram chips(basically odd number of chips). But don't confuse the ram chips with the registered RAM as those have extra chips to deal with capacitance from high density DIMMs.
So now you are wondering why ECC RAM is so much more expensive than non-ECC RAM. After all, they're only adding 1/8th of the cost(minus the practically nil cost of the PCB). Simple answer.. price gouging. They know you'll pay for it because they know you want it. It's as simple as that. The reality of it is that ECC RAM, if it weren't for price gouging, shouldn't be more than 1/8th more than non-ECC RAM. (I tried to find a picture, but I couldn't find one). Of course, some manufacturers do heat stress testing and heat accelerated aging on ECC RAM to get past the early failure known as the
bathtub curve.
Now I'll mention registered RAM only briefly because I mentioned it above. RAM in very high densities causes alot of capacitance. You've seen capacitance if you've ever touched a doorknob and gotten shocked. And just like how it hurts you, it can kill RAM. Since RAM really is nothing more than a bunch of microscopic capacitors the more you have in parallel the more it hurts. And to prevent one stick of RAM from damaging the other stick, they are electrically isolated. They usually have a chip(s) that look different from the chips that actually store data. Here's a picture:
Notice the 9 chips(so this is ECC RAM) but see the 2 smaller chips in the middle. That's your giveaway that this is Registered RAM.
So lets apply this new found knowledge... Go to this
Amazon page and look at that. It's ECC RAM, but has only 8 chips. The seller is selling it as ECC Registered RAM. Guess what? You and I both know that the picture is not of ECC RAM nor is it registered.
Check out this
Kingston RAM. One picture shows what is definitely ECC + Registered RAM, but the other picture is clearly non-ECC unregistered RAM. Again, our secret...
Here's a test. Look at this stick...is it ECC and is it registered? Look up the model number on Kingston's website for the answer...
So now, at a glance, you can look at a stick of RAM and without even looking up the model you should be able to identify ECC from non-ECC and registered from non-registered.