"fatal trap 12: page fault while in kernel mode" Dead pool after scrub?

Status
Not open for further replies.

Abel408

Dabbler
Joined
Oct 15, 2012
Messages
32
Hello all,

I seem to have a big issue on my hands. Before I get into detail,I attached my server information.

After every reboot, I get this error when trying to mount the local file system. I have 3 zfs pools. I took out all of my drives and the server booted up fine. I than put each pull back in the server one at a time until I hit the problem. I now know which pool is giving me problems, but I don't know how to get it back.

I think the problem occurred during a scrub. I have a scrub scheduled for the first sunday of each month for all my pools. What is odd is that I got an email stating that a scrub was started for this pool. I don't ever remember getting an email for scrubs and I never got one from the other scrubs.

This server and pool has been working without an issue for months. I haven't made any hardware or software changes in weeks.

I now have the server booted up without the pool that has the error. After the server booted up, I plugged in my disks. The server sees all the disks, but the status of that pool in unknown.

Thanks for any support you can provide. Let me know if you need more information.

-Chris
 

Attachments

  • Screen Shot 2014-06-23 at 11.43.23 AM.png
    Screen Shot 2014-06-23 at 11.43.23 AM.png
    24 KB · Views: 290

Abel408

Dabbler
Joined
Oct 15, 2012
Messages
32
Hey danb35... I was just going to edit my post to include this.

pool: Vo2
id: 10570967669751284945
state: ONLINE
status: Some supported features are not enabled on the pool.
action: The pool can be imported using its name or numeric identifier, though
some features will not be available without an explicit 'zpool upgrade'.
config:

Vo2 ONLINE
raidz2-0 ONLINE
gptid/e79e3d7f-98bc-11e3-ad1d-0025903503b6 ONLINE
gptid/e7e6e270-98bc-11e3-ad1d-0025903503b6 ONLINE
gptid/e836d167-98bc-11e3-ad1d-0025903503b6 ONLINE

gptid/e87ca4cd-98bc-11e3-ad1d-0025903503b6 ONLINE

Seems to be fine, but when I try zpool import Vo2, system crashes.
 

SmallGuy

Guru
Joined
Jun 7, 2013
Messages
560
You're not using ECC RAM (core I5 doesn't support it).
What you describe looks like your pool has been eaten by defective RAM.
My advice is to test your RAM first:
http://www.memtest.org
 

SmallGuy

Guru
Joined
Jun 7, 2013
Messages
560
Look at this : http://ark.intel.com/search/advanced?s=t&ECCMemory=true
There isn't any core I5 which support ECC memory.
ECC memory need Mother Board ECC compatible and processor ECC compatible, to be fully functional.
If I were you, I would begin with a memory test...
[edit]oops, it looks like some recent I5 are ECC compatible (very recently).
Can you provide your detail system spec-?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
The memory test is likely a good place to start. Again, what is the output (in code tags--the curly braces) of zpool status? And does the system crash when you try to import from the command line, or only from the GUI?
 

Abel408

Dabbler
Joined
Oct 15, 2012
Messages
32
UPDATE:

I tried Richards suggestion here: http://forums.freenas.org/index.php?threads/zfs-pool-import-crashes-freenas-9-2-0-x64-vm.17785/

I was able to import the pool using one txg prior, but the pool was not mounted (is this because I imported it as read only?)... Here is the zpool status now:

pool: Vo2
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-4J
scan: scrub in progress since Sun Jun 22 00:00:00 2014
499G scanned out of 827G at 1/s, (scan is slow, no estimated time)
0 repaired, 60.29% done
config:

NAME STATE READ WRITE CKSUM
Vo2 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
7285988012032049399 FAULTED 0 0 0 was /dev/gptid/e79e3d7f-98bc-11e3-ad1d-0025903503b6
gptid/e7e6e270-98bc-11e3-ad1d-0025903503b6 ONLINE 0 0 0
gptid/e836d167-98bc-11e3-ad1d-0025903503b6 ONLINE 0 0 0

gptid/e87ca4cd-98bc-11e3-ad1d-0025903503b6 ONLINE 0 0 0
 

Abel408

Dabbler
Joined
Oct 15, 2012
Messages
32
Look at this : http://ark.intel.com/search/advanced?s=t&ECCMemory=true
There isn't any core I5 witch support ECC memory.
ECC memory need Mother Board ECC compatible and processor ECC compatible, to be fully functional.
If I were you, I would begin with a memory test...
[edit]oops, it looks like some recent I5 are ECC compatible (very recently).
Can you provide your detail system spec-?

The cpu is at least 3 years old so I doubt it is one of those newer I5s... I will run a memtest.
 

Abel408

Dabbler
Joined
Oct 15, 2012
Messages
32
Anyone have any suggestions for me? At this point, I think I am in good shape since I was able to import the volume in read only using the previous transaction group, but I am in uncharted waters now. I am afraid that if I try to fix that one drive or import using read/write, I could cause even more damage.

Also, the scrub seems to be stuck at 60.29%
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Is there any particular reason to run ALPHA software?

Could you please tell me, whether gcc is included in that particular build? If it were, there would be a trivial way to confirm or deny whether ECC works on your system.

GCC is included with FreeNAS 9.2.1 -9.2.1.5
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
We are concerned that without ECC RAM, your scrub might damage your pool.

So a scrub is good with ECC RAM (and let it run for as long as it takes), but a scrub could be evil with non-ECC RAM...
 

Abel408

Dabbler
Joined
Oct 15, 2012
Messages
32
Is there any particular reason to run ALPHA software?

Could you please tell me, whether gcc is included in that particular build? If it were, there would be a trivial way to confirm or deny whether ECC works on your system.

GCC is included with FreeNAS 9.2.1 -9.2.1.5



I am running Alpha because I had an issue with smb crashing my server and at the time, this alpha release was the only release that had a fix for it. I have the system down for memtest right now so I do not know if it has GCC or not.
 

Abel408

Dabbler
Joined
Oct 15, 2012
Messages
32
So far, memtest isn't showing anything. Would it be bad if I imported this pool using the previous txg in read/write? If that works and I can access my data, how can I use this txg over the most recent one?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ok, gonna clarify a few things in no particular order:

1. Registered memory means nothing except you can do higher density. I can guarantee you that you aren't using registered memory with an i5. if you are that is a major problem as i5s do not support registered memory. Registered RAM has no bearing on it's ability to store data properly or error correction.
2. ECC is important. The fact that you aren't using ECC may or may not be your problem. The whole problem with the cosmic ray bitflip thing is that *if* it is because of that you have no way of proving it... ever.
3. No i5 that has ever existed has supported ECC RAM (at least AFAIK). So you definitely are NOT utilizing ECC regardless of your RAM type. To actually use ECC features with Intel hardware the motherboard *must* have a server chipset (this is the main reason we recommend server-grade stuff). Server-grade motherboards also have a unique socket that is not compatible with i5s. It's a game Intel plays with you. If you want ECC your options are to go low end- some celerons, pentiums, and i3s or go high end with a Xeon. Notice I didn't mention i5 or i7.
4. Your pool is obviously damaged. Being that one disk seems to be faulted you probably have that condition where you have one bad disk and a URE on another disk resulting in a trashed pool. This is why we tell people not to do RAIDZ1.
5. Regardless of how you manage to mount your pool you need to get your data off the pool so you can destroy and recreate it. Consider your pool to be "unstable" forever now. ZFS doesn't auto-correct what it can't fix, and if it's giving a fault it has encountered problems it can't fix because of inadequate redundancy.
6. You very likely have a hardware problem somewhere. Bad hard drive(s), bad choice of SATA controller, something. You should seek it out. You don't suddenly have a corrupted pool because you sneezed. Something has to have gone wrong.
7. If you can mount your pool by rolling back some transactions my advice is to mount it as read-only and only when you are ready to start copying your data of the pool as fast as you can. It is very likely to crash again and you are very likely to not get all of your files back.
8. When you create your new pool don't do RAIDZ1 again.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Status
Not open for further replies.
Top