IEEE Spectrum Article on DRAM Errors

Status
Not open for further replies.

ewhac

Contributor
Joined
Aug 20, 2013
Messages
177
https://spectrum.ieee.org/computing/hardware/drams-damning-defects-and-how-they-cripple-computers

TL;DR: Based on their studies, it seems that the Received Wisdom on DRAM errors (i.e. mostly caused by cosmic rays/random events) is wrong. By analyzing data from large-scale systems in the field, most DRAM errors are reproducible with a high level of address/row/column locality. They suggest implementing a page retirement policy to spare out memory pages once bad cells are detected.

Of course, if you don't use ECC RAM, you'd never know you had a problem in the first place...
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
They suggest implementing a page retirement policy to spare out memory pages once bad cells are detected.
Does this mean that RAM cells wear out in similar manner to the memory used in SSDs?
 

wblock

Documentation Engineer
Joined
Nov 14, 2014
Messages
1,506
I read it as some locations were just mediocre from the start. Possibly this is due to impurities during chip manufacture, because they say that errors seemed to be close to one another. So a daemon monitoring memory errors could help with this, particularly if it kept long-term results. Eventually, we'll probably end up with that in hardware, along with sparing like for hard drives.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
This is another study that suggests that ECC error rates are much higher than expected in datacenter environments - compared to expectations based on the outside world. Interesting...

Does this mean that RAM cells wear out in similar manner to the memory used in SSDs?
No, they're very different. NAND flash is programmed by applying high voltages to charge a floating gate in the transistor. This high voltage will eventually break down the oxide layer that is keeping said gate floating, causing it to effectively leak and not be able to store the charge it would need to represent data.

DRAM is just a bunch of capacitors, so it's not subject to the same sort of wear.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Use ECC, because you’re screwed without it
 
Status
Not open for further replies.
Top