I believe anyone serious into ZFS would have gone through all the reading about the importance of ECC and its implications.
During my research I came across this article
[The Great Debate] ecc-vs-non-ecc-RAM by Andrew Galloway a.k.a. nex7(link is dead but available through wayback machine)
Which is a link included in
[Why I Chose Non-ECC RAM for my FreeNAS] by briancmoses which i believe most people should have read.
The article states that
While the remaining articles at the bottom seem to suggest that ZFS is resilient to RAM errors.
Since both sides of the ECC debate have valid points, and the community ZFS stand is still the //Nobody got fired for choosing IBM/Intel// go for ECC as much as possible.
Which got me thinking, if ECC isnt 100% reliable either and we are already RAIDing our disks, why cant we apply the same to our RAM.
Essentially RAM is not much different from disks, albeit volatile vs non.
If it was possible to RAID5/RAIDZ the RAM, then the ECC requirement would be less of an impact and error rates could be even further reduced.
It would also remove the need to choose certain brands or models over others because they are statistically more reliable. And also removes the single point of failure, since ECC RAM can fail too.
The requirement would then be shifted from having ECC to having more RAM sticks.
I'm curious as to whether it is implementable and the tradeoffs (CPU overhead etc) are worth it.
During my research I came across this article
[The Great Debate] ecc-vs-non-ecc-RAM by Andrew Galloway a.k.a. nex7(link is dead but available through wayback machine)
Which is a link included in
[Why I Chose Non-ECC RAM for my FreeNAS] by briancmoses which i believe most people should have read.
The article states that
My takeaway is that ECC still isnt the magic solution over non-ECC, just vastly better.ecc failure rates an average of .22/dimm/yr versus 8.2/dimm/yr correctable errors (which would have been uncorrectable if it wasn't ECC). and despite it being significantly lower than non-ECC, the chances of it failing are still there and non zero.
While the remaining articles at the bottom seem to suggest that ZFS is resilient to RAM errors.
Since both sides of the ECC debate have valid points, and the community ZFS stand is still the //Nobody got fired for choosing IBM/Intel// go for ECC as much as possible.
Which got me thinking, if ECC isnt 100% reliable either and we are already RAIDing our disks, why cant we apply the same to our RAM.
Essentially RAM is not much different from disks, albeit volatile vs non.
If it was possible to RAID5/RAIDZ the RAM, then the ECC requirement would be less of an impact and error rates could be even further reduced.
It would also remove the need to choose certain brands or models over others because they are statistically more reliable. And also removes the single point of failure, since ECC RAM can fail too.
The requirement would then be shifted from having ECC to having more RAM sticks.
I'm curious as to whether it is implementable and the tradeoffs (CPU overhead etc) are worth it.