RAM memory scrub option?

Status
Not open for further replies.

AlainD

Contributor
Joined
Apr 7, 2013
Messages
145
I look at threads like this and I wonder if they are some social engineering designed to wear down the FreeNAS staff so that they unwittingly clobber the latest newbie that asks a stupid question. Cyberjock, why don't you create a sticky of "shit we are not even going to debate with you" and lock every stupid thread that comes along and point them at the sticky?

The FreeNAS "experts" only have a limited number of cycles and I'd rather them expend it on something productive and not something worthless.....

Don't you find one the data integrity important? It seems maybe the nr 1 reason for using zfs.

Have a look at the following research paper and the part of possible memory problems with zfs.
http://research.cs.wisc.edu/adsl/Publications/zfs-corruption-fast10.pdf

The passing period I looked quite long how to know if ECC-ram was working. Today cyberjock hinted that a properly configured freenas server could send e-mails when single bit ECC-errors where reported, but I didn't find it in the GUI or the forum or google (with freenas). Hence my question.

But I was remembered in the GUI that freenas uses a separate swap file area (2GB/drive). Hopefully none of the ZFS structures are swappable.

BTW. I know how ECC ram works and most of them don't detect 3 bit errors. So ECC-ram is lowering the odds, but not eliminating all of them.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
But are servers like the Sun servers (with mostly system admins watching after them) the target for freenas?

Of course Sun servers are not the target of FreeNAS. That's just absurd. ZFS was developed by Sun for Sun servers. ZFS was open-sourced and then ported to FreeBSD, and then someone else created FreeNAS. The problem with this is that one of the platform properties (very reliable server runs the software) is no longer a given.

Your statement suggests you aren't grasping what I'm saying there. The fact that ZFS came from Sun does not imply that FreeNAS is intended to run on Sun gear and is clearly a flawed conclusion on a logical basis.

Talking about how "almost no users look at their ecc-error logs" is meaningless fearmongering; we actually don't care so much if errors are corrected (which ECC happens to be capable of for single bit errors) but rather merely care that they are detected. Since detection of an uncorrectable error is sufficient to prevent corruption (by panicking the system), and since modern systems actually correct errors, merely having tested ECC memory on a platform that supports ECC is a very comprehensive solution to the general problem of memory corruption.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
@scotch_tape After reading this thread I can tell only one iX System employee responded to it and well after it had many responses. I don't see the need to lock it or delete it. If the folks posting to it feel like they are done, then they will stop posting. I for one do not like forcing super strict rules on a forum. Short of severe harassment, flaming, or SPAM, I think everything else within reason is game.
 

AlainD

Contributor
Joined
Apr 7, 2013
Messages
145
Of course Sun servers are not the target of FreeNAS. That's just absurd. ZFS was developed by Sun for Sun servers. ZFS was open-sourced and then ported to FreeBSD, and then someone else created FreeNAS. The problem with this is that one of the platform properties (very reliable server runs the software) is no longer a given.

Your statement suggests you aren't grasping what I'm saying there. The fact that ZFS came from Sun does not imply that FreeNAS is intended to run on Sun gear and is clearly a flawed conclusion on a logical basis.

I know it aren't sun servers that are the target, I wrote "servers like the Sun servers" to describe a certain high level class of servers.

Talking about how "almost no users look at their ecc-error logs" is meaningless fearmongering; we actually don't care so much if errors are corrected (which ECC happens to be capable of for single bit errors) but rather merely care that they are detected. Since detection of an uncorrectable error is sufficient to prevent corruption (by panicking the system), and since modern systems actually correct errors, merely having tested ECC memory on a platform that supports ECC is a very comprehensive solution to the general problem of memory corruption.

From a software and user standpoint I also don't care about them if they are corrected. I got thinking about the "if".

Well I started by thinking what if it doesn't work, how to test and find it. A software memory scrub with the available checksums would find rather much of those problems (even if they are very unlikely). Is it worth it, well if some fatal pool losses are avoided, it could be very useful.

I have done a long burn in and test of my hardware, but it's certainly not at the level Sun made and tested it's hardware.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I know it aren't sun servers that are the target, I wrote "servers like the Sun servers" to describe a certain high level class of servers.

Certainly FreeNAS will run on "high level class" servers, but it will also run on very modestly priced server gear like the stuff suggested in my hardware sticky. But that isn't what your previous message implied.

Well I started by thinking what if it doesn't work, how to test and find it. A software memory scrub with the available checksums would find rather much of those problems (even if they are very unlikely). Is it worth it, well if some fatal pool losses are avoided, it could be very useful.

I do not advocate building cars without seat belts. Putting more air bags in the car is not an acceptable substitute for having seat belts.

With that, I am out of this discussion because it is relatively pointless.
 

AlainD

Contributor
Joined
Apr 7, 2013
Messages
145
...
I do not advocate building cars without seat belts. Putting more air bags in the car is not an acceptable substitute for having seat belts.

With that, I am out of this discussion because it is relatively pointless.

Well in Europe it's very difficult to buy a car without air bags. My car didn't even start when the brake lights where broke, quite a scary experience, no msg, no code, no limited driving speed, nothing worked any more. I rather had gotten a msg when the first bulb broke.

Are those air bags a substitute for seat belts, no, but they save lives. Often from people that are wearing a seat belt.

Maybe I'm paranoid, but:
I know that even good server hardware will sometimes fail and as far as I know not the complete memory path is ECC-ram. For example calculations inside the CPU, swap files,...
For business stuff I would think very careful what I would use from the zfs tools. A second zfs box with replication as backup is clearly a no go. Separate tape backups, who backup the contents, not the structure are a minimum.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Nice way to rewrite what I wrote to try to continue the discussion. I deem this sufficiently close to trolling that I'm closing this thread as nonproductive. If you wish to suggest this as a feature, the bug tracker is available at the top of the page, little green link.
 
Status
Not open for further replies.
Top