SOLVED Lots of g_dev_taste: make_dev_p() failed (gp->name=zvol/pool/.../...@auto-20131006-010000s1, error=6

Status
Not open for further replies.

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Thanks for heads up! You are right, all disks are AF. Regarding the ECC, do you really think it is the cause of the problem? I need some more information on why ECC is so important for ZFS? I am running lots of other servers, which do have more ram and do not have ECC and have no problem for years. I agree that it helps in rare cases. This errors should be sporadic errors, right? I can not imagine a ECC problem here as it is reproducible and memory has been intensively checked.

Then, regarding the amount of RAM. Shouldn't 8GB be enough for <3TB of data? I have only 2x3TB mirror. I thought, roughly, that 1GB for 1TB would be enough and 8GB are stated as the minimum. What I do not understand is, that the amount of RAM is always given as explanation for unpredictable behaviour. For an enterprise system there should always be a well defined and predictable behaviour. There should be no doubt at all as NAS systems always grow. So memory should be the cause for performance degredation but not for critical errors and data loss. If this would be the case, no company would by such a system ever! So I guess it is more like "it could be" "who knows" (because we do not know the real answer).... If it really would be the case that ZFS is so sensitive to RAM size that it would cause data corruption then I would turn away from it. But this is in contrary to my experience. Because of experience I changed to ZFS a longer time agou and also use it heavily in linux with ZoL. I even use it for my disks which I swap between workstations and laptop. Works for years now like a charm. And I configured RAM limitation on all computers. I came to it because I spent a lot of time on finding the cause of bluescreens etc. on a hybrid disk with broken firmware. ZFS was the only file system which directly again and again showed up errors immediately. Since then this is my fs nr. 1.

You'll find that Cyberjock's guide will answer your basic ECC questions.
There's also the ECC sticky which goes a bit more in-depth.

http://forums.freenas.org/index.php...ning-vdev-zpool-zil-and-l2arc-for-noobs.7775/
 

BuddyButterfly

Dabbler
Joined
Jun 18, 2014
Messages
28
Hi Ericlove,

thanks a lot for info material. I will read it when having the time.

To all others thanks a lot for help. I renamed all zvols (btw. I find it very very confusing that Freenas does not stick to the naming of zfs. zfs clearly separates datasets and volumes whereas Freenas is talking generally about volumes. That is why I have to write zvols to be clear ;-) and re-imported the pool. No error message anymore. I hope that this also was the cause for the crashes I experienced. I will let you know how it goes and when I will have setup the second pool running freenas.

Again thanks a lot for the enlightning discussion and please regard this thread as solved (do no know where to set this).
 

BuddyButterfly

Dabbler
Joined
Jun 18, 2014
Messages
28
Not sure if I am looking at the correct list of error numbers, but according to the sys/errno.h file for FreeBSD, error 63 is ENAMETOOLONG - "File name too long".

Not sure if that helps at all. Just curious - How many characters long is the full path to the snapshot listed in your error message?

Again, thanks a lot, eraser. A rename fixed all this errrors. Hopefully also the crashes.
 

BuddyButterfly

Dabbler
Joined
Jun 18, 2014
Messages
28
Just to finalize this topic.

I am a happy and satisfied Freenas user now. I migrated Nas4Free to Freenas. The result:

1. No more crashes when writing big data to the nas with iscsi, nfs or cfs.
2. Much less CPU consumption than in Nas4Free

How I did the migration? To migrate to a standard system as far as possible I have set up Freenas anew on new 2 disks.
I have done the following steps:

1. Setup Freenas on two new disks (have drawn the old ones out)
2. Created the same users and groups.
3. Created the same datasets and zvols.
4. Set the permissions on the datasets.
5. Put in the 2 Nas4Free disks
6. Decrypted and imported the pool.
7. Copied the data between the pools datasets with rsync (to not have any zfs metadata corruption).
8. Created a migration snapshot on the old pool on all zvols.
9. ZFS sent over all zvols to the new pool.
10. Re-did the config for iscsi, cifs, nfs etc.

Viola, done.

I would like to add to the discussion above about the correct RAM (ECC) etc. As expected, it showed up, that the problems WERE NOT caused
by using NON-ECC ram or by having ONLY 8GB of ram! This was totally in line with expectation as the issue could be replayed and happened way
too frequent and after all the memory has been tested intensively before. In Freenas I have used the "Tuning" function which works great, even with
8GB of RAM. Why shouldn't it? My personal experience with small disks on my laptop and workstation, where I limit the RAM to 2GB for ZFS
was, that I had no problems with this for over a year now.

I have seen in lots of posts that the ECC-Question and amount of RAM was always stressed very much before trying to analyze the problems any deeper.
Therefore I am asking you to be a bit more facts oriented and not stress this areas right from the beginning. It is a valid question to do a thorough memory
test though, but after this, ECC should not be discussed any further. The exhaustion of ram can be very good monitored and only in this case the amount
of ram should be in question.

Nevertheless, thanks for the competent help!
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
@BuddyButterfly, I am glad things worked out for you.

However, you are 100% mistaken about non-ECC RAM. If non-ECC RAM has an error while running, then (due to the way ZFS operates) by doing a pool scrub you might erase the entire pool. No amount of prior testing can protect against it.

Is that possible with ECC RAM? Yes, but look up the chances of undetected (silent) ECC RAM errors and compare.

P.S.
FreeNAS, until the current version, allowed to use either ZFS or UFS. You may want to create a bug report that, going forward, the terminology should be more aligned with ZFS.
 

BuddyButterfly

Dabbler
Joined
Jun 18, 2014
Messages
28
@BuddyButterfly, I am glad things worked out for you.

However, you are 100% mistaken about non-ECC RAM. If non-ECC RAM has an error while running, then (due to the way ZFS operates) by doing a pool scrub you might erase the entire pool. No amount of prior testing can protect against it.

Is that possible with ECC RAM? Yes, but look up the chances of undetected (silent) ECC RAM errors and compare.

P.S.
FreeNAS, until the current version, allowed to use either ZFS or UFS. You may want to create a bug report that, going forward, the terminology should be more aligned with ZFS.

Hi solarisguy,

after heaving digged a bit deeper into the topic, I must agree on this with you here. Obviously zfs is more sensitive to RAM errors only because of the so much loved checksumming feature. What I still do not understand is that it will erase the entire pool. Why? Shouldn't it be local to the place of error? So it would corrupt some data. But why the whole pool?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Hi solarisguy,

after heaving digged a bit deeper into the topic, I must agree on this with you here. Obviously zfs is more sensitive to RAM errors only because of the so much loved checksumming feature. What I still do not understand is that it will erase the entire pool. Why? Shouldn't it be local to the place of error? So it would corrupt some data. But why the whole pool?

If it's a single bit-flip, yes, you should be fine in most cases. If it's really bad RAM, it'll start causing a lot of "corrections", destroying a lot of data and probably even metadata along the way. Bam, one pool destroyed.
 
Status
Not open for further replies.
Top