ECC vs non-ECC RAM and ZFS

Status
Not open for further replies.

ss4johnny

Explorer
Joined
Nov 15, 2013
Messages
55
I initially set up my FreeNas with non-ECC RAM and consumer-grade MB, but I decided to move those components into a HTPC and buy the products recommended from earlier in the thread to put in the FreeNas box. When I receive the new stuff, what's the best way to switch over? (e.g. If I don't have any permanent errors on the drive, I'm not sure how much sense it makes to delete my existing zpool, but in the interest of avoiding any risks I could delete the zpool before installing the new hardware.)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Just install the old disks in your new hardware and boot'er up. The pool will mount if your SATA controller is compatible. You might have to reset your network configuration, but other than that, it should work just as before.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Just to clarify:

Cyberjock means: Take your drives, put them in the new machine, AND TAKE YOUR BOOT DEVICE (usually a USB thumb drive), plug it right in there, make sure it's set to boot in the BIOS, and you should not have to do anything more. Boom done.
 

jyavenard

Patron
Joined
Oct 16, 2013
Messages
361
Just install the old disks in your new hardware and boot'er up. The pool will mount if your SATA controller is compatible. You might have to reset your network configuration, but other than that, it should work just as before.


In case device names have changed, I usually just delete the /etc/zfs/zpool.cache (on FreeNAS it's /boot/zfs/zpool.cache which is a link to /data/zfs/zpool.cache, so it's that one to delete) so importing the old pool is much easier, and less likely to mount a pool which will appear as degraded and scare the crap out of you.

That also works when going cross platform (e.g. booting linux with ZOL, nexenta, solaris etc..)
 

ss4johnny

Explorer
Joined
Nov 15, 2013
Messages
55
Just to clarify:

Cyberjock means: Take your drives, put them in the new machine, AND TAKE YOUR BOOT DEVICE (usually a USB thumb drive), plug it right in there, make sure it's set to boot in the BIOS, and you should not have to do anything more. Boom done.
Thanks (to both of you for the quick reply). I'm gonna keep the same case for the FreeNas, so I'll just swap out the MB/CPU.
 

ss4johnny

Explorer
Joined
Nov 15, 2013
Messages
55
Thanks (to both of you for the quick reply). I'm gonna keep the same case for the FreeNas, so I'll just swap out the MB/CPU.
Sorry for the double post (I can't edit), but I was going to add that the reason I brought it up is that I get a significant number of Permanent Errors copying my data over to the zfs (maybe like 1 for every 50gb or so copied). So what I've been doing is copying some data, scrubbing, re-copying any files listed as Permanent errors, re-scrubbing to make sure everything's copacetic. My guess is that switching to ECC will be a more permanent fix than this painful process. Based on what you said, I assume that so long as it isn't listing any permanent errors before I make the switch, I'll be fine.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You shouldn't need to delete the zpool.cache on FreeNAS. The way FreeNAS is setup its designed to let you walk the disks to a new machine and have it "just work", minus some minimal network setup of course.
 

jyavenard

Patron
Joined
Oct 16, 2013
Messages
361
You shouldn't need to delete the zpool.cache on FreeNAS. The way FreeNAS is setup its designed to let you walk the disks to a new machine and have it "just work", minus some minimal network setup of course.


the reason I mentioned it, is that I had that issue just two days ago. I moved the 6 drives of one zpool from one chassis to another (identical chassis/motherboard). On one board they were plugged to the LSI adapter, on the other 2 were on the LSI and the remaining 4 on the intel controller.

The zpool got mounted as degraded. If you're not familiar with zfs; I'm sure this could give some cold sweat to a few people.

I had to export it, and re-import it with -f
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
FreeNAS 9.1.1 already imports with -f. In fact, if you do a zpool history you can see all the command you've had done on your pool.

For example, mine mounts with the command:

zpool import -c /data/zfs/zpool.cache.saved -o cachefile=none -R /mnt -f 15524257174459755049
 

jyavenard

Patron
Joined
Oct 16, 2013
Messages
361
a bit off topic in this thread, but as we're talking migration: but is there a way to easily copy the settings from one box to another? that includes setting up the zpool to have the same layout... same dataset etc
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
Just install the old disks in your new hardware and boot'er up. The pool will mount if your SATA controller is compatible. You might have to reset your network configuration, but other than that, it should work just as before.
In case device names have changed, I usually just delete the /etc/zfs/zpool.cache (on FreeNAS it's /boot/zfs/zpool.cache which is a link to /data/zfs/zpool.cache, so it's that one to delete) so importing the old pool is much easier, and less likely to mount a pool which will appear as degraded and scare the crap out of you.
The key here is to export the pool before doing the hardware swap. It will then import cleanly even when device names changed and you won't have to do any zpool.cache tricks.
 

panz

Guru
Joined
May 24, 2013
Messages
556
You're right Dusan; there's plenty of folks that don't read the manual:

FreeNAS 9.1 Guide Section 6.3.9 "If you will be moving a ZFS drive from one system to another, perform this

http://docs.huihoo.com/opensolaris/solaris-zfs-administration-guide/html/ch04s06.html

exportation first. This operation flushes any unwritten data to disk, writes data to the disk indicating that the export was done, and removes all knowledge of the pool from the system". :)
 

jyavenard

Patron
Joined
Oct 16, 2013
Messages
361
I've managed to reproduce the pool being mounted in degraded state...

Two identical machines: supermicro X10SL7-F boards; 12 SATA ports linked to 12 hotswap bays (SM 2U chassis): 8 connected to LSI adapter, 4 to the intel SATA.
One run ubuntu 13.10, the other freenas.

Boot freenas, create a pool of 6 disks on the LSI adapter. halt the system (don't export the pool first).
Move drive to 2nd machine, this time connect drives to the last two LSI SATA ports, and 4 intel. Boot machine using ubuntu with zfs kernel driver.
import pool in linux; then halt (don't export there either)

Move drive back to freenas box; connect them to 2 LSI sata + 4 Intel sata.

boot freenas: pool is mounted in degraded state.

zfs export followed by zfs import -f pool remount the pool properly, as if nothing happened.
now if I halt freenas; move the disks all back to the LSI ports, rebooting freenas and it's all fine: pool is mounted as healthy.

A bit twisted I know... but it's one I managed to reproduce twice.
maybe I did boot linux last time (I was doing a lot of testing at the time between freenas and ubuntu, so it's certainly a possibility that that's what happened at first)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
There we go. That would explain alot. ;)

I still wouldn't consider this worthwhile to discuss here in the forums as a regular course of business because we don't provide ZFS on linux support here at all. If we did, then suddenly we'd be expected to support ZFS for the MacOSX project(I forget the name) as well as Solaris.

Generally, I think the response would be "if you want to go off and use something other than FreeNAS you are of your own volition to do what your want at your own benefit or peril".

It is something I'll keep in the back of my mind if someone talks about using ZFS on Linux with a FreeNAS pool as I'm sure there's some users that do it even if they don't immediately admit it(or even remember) on the forums.

As for your "maybe I did boot linux last time" I can totally believe it. When you start doing stuff you get those 5 steps in your head, but then step 2 doesn't work and what should have taken 20 minutes has now taken 2 days and you can't remember every little step you did yesterday. You just remember where you are right now and where you are trying to get to. It happens to all of us. I see it fairly frequently on the forums with new users that are very confused and are only concerned with one thing... not losing their data. It just shows that we're all human here and we don't all remember every detail of everything we do.
 

jyavenard

Patron
Joined
Oct 16, 2013
Messages
361
You're unbelievable, there's no user error here, there's no "failure to connect the right sata cable".

Just one (and there's always more than one way to skin a cat) scenario to prove that you should export the pool before doing anything: as recommended in the freenas manual or the ZFS manual.

FWIW, Freenas doesn't export the pool before shutting down. Hence why trying to import the pool in another system will fail without the -f flag as otherwise it states the pool is currently being used by another system (this wouldn't happen if the pool had been exported)
 

jyavenard

Patron
Joined
Oct 16, 2013
Messages
361
If I had followed the manual, nothing would have happened. I would have exported the pool first and everything would have been dandy.

But I didn't follow the manual, I simply shut down and swapped the disks, which is what you advise people of doing here: don't export, do nothing: because you've done it plenty of times and you never had a problem.

Fact: export the pool first before doing any swapping around, be it OS, hardware, sata ports.
This is what the manual recommend, takes zero time and prevent any issues, whatever those may be.

Discussion close
 

snicker

Dabbler
Joined
Dec 9, 2013
Messages
10
Just came here to say that I'm the latest idiot... Just built my first FreeNAS box in March, and it has been cranking along quite happily, with me blissfully thinking that I had such safety and redundant storage... my shares were inaccessible this morning and I go hook a monitor up to the box to discover that the machine had rebooted.

On the screen: "Fatal Trap 12".... rebooted and watched it happen after it tried to mount the filesystems.

Googled around a bunch, had a couple things to try, nothing worked. Ran memtest86 and got several (read: well over 70,000) errors on one of my 8gb sticks. Non-ECC memory, of course. I pulled it out and started troubleshooting... and then came across this thread.

Tried:
* Booting with one drive in the array disconnected, testing with each drive
* booting with two drives in the array disconnected in each permutation
* Fresh installation of FreeNAS and manually importing the pool
* Booting with OpenIndiana to import the pool (OpenIndiana doesn't even show the pool)

Nothing worked.

Suffice to say, I wish that I had seen this thread when I built the machine in March.. I had no idea the risks I was taking. I have 6TB of data that is now *gone*. At least I think. Please someone tell me that there is a way of recovery?

Except I know there's not.

I didn't know I was taking this gamble, as I didn't do the research or think it out beforehand.


Let me tell you, reader of this thread, if you are thinking that "oh, i'll save a couple bucks by getting non-ECC memory/mobo/processor"... you're wrong. No matter what kind of data you're storing, you're delusional if you think it's going to be okay if you lose it. Why are you even using redundant storage in the first place?

GET ECC RAM. YOU'RE AN IDIOT LIKE ME IF YOU DON'T.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Sorry to hear that snicker. I just saw your post in the IRC and I wish people would take the warning seriously.

You're pretty luck that things went bad really fast. Not sure how you did backups, but it probably saved your backupsfrom shuffling off this mortal coil.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
It's terrible that you had a problem like this but it's good you shared it to show people that you need ECC RAM. It took me a few years to make the change and I got lucky I didn't have any failures, although I did have a backup of all my important data just in case something went wrong. And I would have not been happy trying to restore the non-important data (movies and computer backups) over time.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
A friend's FreeNAS server was originally built with non-ECC RAM(I didn't know better at the time and it was a temporary emergency). The system was up 24x7 for weeks at a time(usually uptime was approximately the same as the RELEASE schedule). Turned into a system that used non-ECC for 6 months.

Anyway, one thing I never got sorted out was the resilvering of data during scrubs. Every time a scrub was run, without exception, every disk in his pool would have a small number of CHKSUM errors. This was never more than 20-30 errors, but was never zero. But the fact that every single disk had them peaked my interested. We tried all sorts of things. We tried a different PSU, we tried a different SATA controller(hard drives were 1/2 onboard and half on a 3ware normally so we didn't think this was the problem), SMART never showed a disk as anything less than pristine. Now we never had a problem with the pool being unmountable or anything. Never any problem at all. But those CHKSUM errors were never resolved. Eventually when he was upgraded to Supermicro stuff the first scrub had the same number of CHKSUM errors. But all subsequent scrubs that ran had zero for the remaining 3 months before we replaced the drives and built a new pool. And now the new pool has never had any CHKSUM errors on scrub.

It *really* makes me think it was non-ECC RAM errors. But the quantity of errors makes me question it. I'm just not buying that non-ECC error rates are so high that ZFS is catching that many errors. But, there's very little data out there on actual in-field RAM error rates. This is something I'd love to see!

Unfortunately, this is a problem that I will probably never know the answer. :(
 
Status
Not open for further replies.
Top