"fatal trap 12: page fault while in kernel mode" Dead pool after scrub?

Status
Not open for further replies.

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Doh! My bad. Could have sworn I saw RAIDZ1 above. That's what I get for doing 3 things at once.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Just a word of warning.. If you are using registered memory you will get random corruption as the registers mess with the timing of the CPU and RAM. It will be intermittent and you may or may not be able to prove it with a memory test.
 

Abel408

Dabbler
Joined
Oct 15, 2012
Messages
32
I am using a server motherboard that supports xeon and i5 processors. Not sure if that makes a difference.

How can I mount the pool as read only to take the data off? The zpool import command with the txg arguement will import the pool, but it is not mounted.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I am using a server motherboard that supports xeon and i5 processors. Not sure if that makes a difference.

Not really. But now I'm curious to know what board you have... got a model number?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Aha. I see my confusion. the 1155 socket is more expanded in terms of families that are supported on server-grade 1155s. But, if you look at the server grade 1155s(mine is the X9SCM-F personally) it doesn't do i5s because those don't support ECC.

In either case you are looking at a new CPU, new motherboard, or both. Pretty sure in your case it will be both since server-grade motherboards don't support i5s.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I think if you could give us a full hardware list, including the model number of the CPU, not that it's just an i5, that would help. Maybe you do have a system that supports ECC RAM, only that listing will help out. If it turns out your system doesn't in fact support ECC RAM, then a recommendation could be made so you don't have to go through this issue again. There are utilities on the Ultimate Boot CD which can read the CPU info if you do not recall what you purchased. Not sure if your BIOS will tell you that info either but it might.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I think if you could give us a full hardware list, including the model number of the CPU, not that it's just an i5, that would help. Maybe you do have a system that supports ECC RAM, only that listing will help out. If it turns out your system doesn't in fact support ECC RAM, then a recommendation could be made so you don't have to go through this issue again. There are utilities on the Ultimate Boot CD which can read the CPU info if you do not recall what you purchased. Not sure if your BIOS will tell you that info either but it might.

FreeNAS says it's an i5 760, so we're looking at an LGA 1156 system. The real question is "What motherboard is that?". If that CPU really is using registered memory, it's a damned miracle the thing was stable enough for you to reach this point...
 

Abel408

Dabbler
Joined
Oct 15, 2012
Messages
32
I am using X8SIE-O. I don't believe i5's are supported with this motherboard, but it does work.

Motherboard: SUPERMICRO MBD-X8SIE-O LGA 1156
RAID Card (Pass-through): areca ARC-1680IX-24
CPU: Core i5-760


Need any more information?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Ouch! What is the cheapest route to upgrade in order to gain ECC capability. Changing the CPU is not cheap at all. Looked at the Xeon X3480 and it's about $400, but it might be cheaper somewhere.

Question: Does this MB perform it's own ECC checks? I know I read about that in the past that a MB could do this if it was designed to.
 

Abel408

Dabbler
Joined
Oct 15, 2012
Messages
32
After rebooting the server back to FreeNAS, I attempted to import my pool using the previous transaction group again. This time, it did not work. Anyone have any explanation why it worked the first time and not now?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Ouch! What is the cheapest route to upgrade in order to gain ECC capability. Changing the CPU is not cheap at all. Looked at the Xeon X3480 and it's about $400, but it might be cheaper somewhere.

Question: Does this MB perform it's own ECC checks? I know I read about that in the past that a MB could do this if it was designed to.

My guess is that only applies to motherboards that contain the memory controller... Since Intel moved to CPU memory controllers precisely with this generation (Nehalem), I'd expect it to be absolutely impossible (well, at a reasonable price. I'm sure something could be hacked together).
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
After rebooting the server back to FreeNAS, I attempted to import my pool using the previous transaction group again. This time, it did not work. Anyone have any explanation why it worked the first time and not now?

Yep. If you do a rollback it opens up pandora's box. That's why I said the following and in the way that I did:

If you can mount your pool by rolling back some transactions my advice is to mount it as read-only and only when you are ready to start copying your data of the pool as fast as you can. It is very likely to crash again and you are very likely to not get all of your files back.

There's never a guarantee it will work tomorrow or in the future. Normally when I have to mount pools for people and it works my first question is "what is THE most important folder and don't you dare tell me 'all of it' because I'll punch you in the balls?"

Yeah.. bad idea to use an unsupported CPU in a board regardless of "if it works". I'm seeing a recurring theme of "stuff I wouldn't do for the stability of my server".
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
@Abel408
I don't get the feeling that you will be able to recover from this issue. Once you are done with this mess I would recommend you focus on restructuring your hardware in order to prevent this from happening again. Also, I would never run an Alpha on my system unless I didn't care if my data just goes away. I'm not saying it was the Alpha code but it could have been a contributor.
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Abel408, can you post the outputs of when you were successful and the next try with a failure?

I am asking, since you had said that you imported read-only, so no change should have happened. Also there should be an error...

Anyway, here is the code for FreeNAS 9.2.1.5, please take a look and read my comments.
Code:
[root@freenas] ~# zpool history Abel408
History for 'Abel408':
2014-06-23.12:25:44 zpool create -o cachefile=/data/zfs/zpool.cache -o failmode=continue -o autoexpand=on -O compression=lz4 -O aclmode=passthrough -O aclinherit=passthrough -f -m /Abel408 -o altroot=/mnt Abel408 /dev/gptid/af8cf655-fb2d-11e3-9167-94de80223d14
2014-06-23.12:25:49 zfs inherit mountpoint Abel408
2014-06-23.12:25:49 zpool set cachefile=/data/zfs/zpool.cache Abel408
2014-06-23.12:26:34 zpool export Abel408
2014-06-23.12:27:08 zpool import -f -R /mnt 8540557898973534518
2014-06-23.12:27:08 zfs inherit -r mountpoint Abel408
2014-06-23.12:27:08 zpool set cachefile=/data/zfs/zpool.cache Abel408
2014-06-23.12:27:08 zfs set aclmode=passthrough Abel408
2014-06-23.12:27:13 zfs set aclinherit=passthrough Abel408

I have created a volune (ZFS pool) using the defaults in the GUI. Then in the GUI exported and imported afterwards. zpool history shows the commands done on the pool. As you can see FreeNAS did not do straight import. What happens if you do a straight import (again here after exporting in the GUI) is shown below.
Code:
[root@freenas] ~# zpool import Abel408
cannot mount '/Abel408': failed to create mountpoint
[root@freenas] ~# mount
/dev/ufs/FreeNASs1a on / (ufs, local, read-only)
devfs on /dev (devfs, local, multilabel)
/dev/md0 on /etc (ufs, local)
/dev/md1 on /mnt (ufs, local)
/dev/md2 on /var (ufs, local)
/dev/ufs/FreeNASs4 on /data (ufs, local, noatime, soft-updates)
[root@freenas] ~# zpool export Abel408
[root@freenas] ~# zpool import -f -R /mnt Abel408
[root@freenas] ~# mount
/dev/ufs/FreeNASs1a on / (ufs, local, read-only)
devfs on /dev (devfs, local, multilabel)
/dev/md0 on /etc (ufs, local)
/dev/md1 on /mnt (ufs, local)
/dev/md2 on /var (ufs, local)
/dev/ufs/FreeNASs4 on /data (ufs, local, noatime, soft-updates)
Abel408 on /mnt/Abel408 (zfs, local, nfsv4acls)
[root@freenas] ~#
 

Abel408

Dabbler
Joined
Oct 15, 2012
Messages
32
Thanks for everyones help. At this point, I have abandoned the pool and focused my efforts on restoring my backup and making sure this doesn't happen again.

I'm seeing a recurring theme of "stuff I wouldn't do for the stability of my server".

If by recurring theme, you mean I made an honest mistake by using a CPU that I thought supported ECC, then yes... I wonder how many others are using an i5 CPU and running scrubs on their pools. If this is a very big issue and a ticking time bomb for zfs pools, it should be well documented and probably put as a warning on the zfs scrub GUI.
@Abel408
I don't get the feeling that you will be able to recover from this issue. Once you are done with this mess I would recommend you focus on restructuring your hardware in order to prevent this from happening again. Also, I would never run an Alpha on my system unless I didn't care if my data just goes away. I'm not saying it was the Alpha code but it could have been a contributor.
The memtest was run for 20 hours and made 4 passes without a single error. I'm having a hard time believing it was a hardware problem, but if the general consensus is that I should get a CPU that supports ECC, than I will do that before redeploying my pool.
I am only running Alpha because one of the contributors told me that my fix was the only change made between the Alpha and official release.
Abel408, can you post the outputs of when you were successful and the next try with a failure?

I am asking, since you had said that you imported read-only, so no change should have happened. Also there should be an error...

Thanks for your efforts. This was the output after my successful attempt:

Code:
[root@nas] ~# zpool import -f -T 2136314 -o readonly=on Vo2
cannot mount '/Vo2': failed to create mountpoint


I was stuck there. I didn't know how to mount the pool. I figured the import command would automatically mount the pool, didn't know I had to also issue a mount command. I can't get the pool to mount at all now. I tried going back a few txgs and was unsuccessful. At this point, I have basically given up hope and looking at my backups, which is stored on another FreeNAS server, I don't think I lost any data. It is probably a better option than mounting a previous txg anyway, seeing that I probably won't recover all my data that way.

Off topic question, but does anyone know how to rsync data while keeping Windows ACL's?

Thanks again.
 
Status
Not open for further replies.
Top