Another ECC RAM question

Isfn

Cadet
Joined
Sep 29, 2019
Messages
2
Is the level of risk different for different types of RAID when using non-ecc ram?

Most of what I’ve read about memory errors talks about losing the whole pool. What about corruption to a single file? I am not too concerned about losing the whole pool (this is for home use.) I am more concerned about a corrupted file that goes unnoticed.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Using non-ECC RAM exposes you to some additional risk, but the same risk exists in any filesystem and RAID/drive configuration. It's not ZFS-specific.

The point of concern around non-ECC memory is if the data is sent to your NAS, gets cached in memory before being written, and corrupted there in-memory before it gets checksummed and written to disk. A "verify after write" would pick this up; however, that isn't always practical from a speed perspective. Once the file is written to disk, ZFS checksums will take care of protecting it.

In the words of Matthew Ahrens, one of the co-founders of ZFS at Sun and current Delphix ZFS developer:

There's nothing special about ZFS that requires/encourages the use of ECC RAM more so than any other filesystem. If you use UFS, EXT, NTFS, btrfs, etc without ECC RAM, you are just as much at risk as if you used ZFS without ECC RAM. Actually, ZFS can mitigate this risk to some degree if you enable the unsupported ZFS_DEBUG_MODIFY flag (zfs_flags=0x10). This will checksum the data while at rest in memory, and verify it before writing to disk, thus reducing the window of vulnerability from a memory error.

I would simply say: if you love your data, use ECC RAM. Additionally, use a filesystem that checksums your data, such as ZFS.
 

Ender117

Patron
Joined
Aug 20, 2018
Messages
219
Using non-ECC RAM exposes you to some additional risk, but the same risk exists in any filesystem and RAID/drive configuration. It's not ZFS-specific.

The point of concern around non-ECC memory is if the data is sent to your NAS, gets cached in memory before being written, and corrupted there in-memory before it gets checksummed and written to disk. A "verify after write" would pick this up; however, that isn't always practical from a speed perspective. Once the file is written to disk, ZFS checksums will take care of protecting it.

In the words of Matthew Ahrens, one of the co-founders of ZFS at Sun and current Delphix ZFS developer:

I think that someone also made a point that bad RAM can corrupt good data during scrubs. This could be the only case where ZFS benefit from ECC RAM than non-checksumed FS, but the chance seems exceptionally small.
 

anmnz

Patron
Joined
Feb 17, 2018
Messages
286
I think that someone also made a point that bad RAM can corrupt good data during scrubs.
To make a long story short the "scrub of death" theory used to hold that this was a serious threat, and used to be widely believed, but has been pretty thoroughly debunked.

the chance seems exceptionally small
Yes. It's easy to see how non-ECC RAM could produce spurious checksum errors during scrubs -- and quite frankly if you value your data enough to use FreeNAS rather than some simpler shinier alternative I don't see why you would tolerate that -- but the multiple precisely correlated errors that would be required (in both the data and its checksums) in order to actually corrupt data are going to be extremely rare.
 
Top