"fatal trap 12: page fault while in kernel mode" Dead pool after scrub?

Abel408 · Jun 23, 2014

Hello all,

I seem to have a big issue on my hands. Before I get into detail,I attached my server information.

After every reboot, I get this error when trying to mount the local file system. I have 3 zfs pools. I took out all of my drives and the server booted up fine. I than put each pull back in the server one at a time until I hit the problem. I now know which pool is giving me problems, but I don't know how to get it back.

I think the problem occurred during a scrub. I have a scrub scheduled for the first sunday of each month for all my pools. What is odd is that I got an email stating that a scrub was started for this pool. I don't ever remember getting an email for scrubs and I never got one from the other scrubs.

This server and pool has been working without an issue for months. I haven't made any hardware or software changes in weeks.

I now have the server booted up without the pool that has the error. After the server booted up, I plugged in my disks. The server sees all the disks, but the status of that pool in unknown.

Thanks for any support you can provide. Let me know if you need more information.

-Chris

danb35 · Jun 23, 2014

what is the output of 'zpool import' at the shell prompt?

Abel408 · Jun 23, 2014

Hey danb35... I was just going to edit my post to include this.

pool: Vo2
id: 10570967669751284945
state: ONLINE
status: Some supported features are not enabled on the pool.
action: The pool can be imported using its name or numeric identifier, though
some features will not be available without an explicit 'zpool upgrade'.
config:

Vo2 ONLINE
raidz2-0 ONLINE
gptid/e79e3d7f-98bc-11e3-ad1d-0025903503b6 ONLINE
gptid/e7e6e270-98bc-11e3-ad1d-0025903503b6 ONLINE
gptid/e836d167-98bc-11e3-ad1d-0025903503b6 ONLINE

gptid/e87ca4cd-98bc-11e3-ad1d-0025903503b6 ONLINE

Seems to be fine, but when I try zpool import Vo2, system crashes.

danb35 · Jun 23, 2014

Looks to me like V02 is already online (i.e., imported, mounted, and ready to go). What's the output of 'zpool status'?

SmallGuy · Jun 23, 2014

You're not using ECC RAM (core I5 doesn't support it).
What you describe looks like your pool has been eaten by defective RAM.
My advice is to test your RAM first:
http://www.memtest.org

Abel408 · Jun 23, 2014

I am using registered ECC memory. This is the memory I bought: http://www.newegg.com/Product/Product.aspx?Item=N82E16820148648

Abel408 · Jun 23, 2014

Should I still try memtest?

Abel408 · Jun 23, 2014

danb35 said:
Looks to me like V02 is already online (i.e., imported, mounted, and ready to go). What's the output of 'zpool status'?

Volume is not mounted...

SmallGuy · Jun 23, 2014

Look at this : http://ark.intel.com/search/advanced?s=t&ECCMemory=true
There isn't any core I5 which support ECC memory.
ECC memory need Mother Board ECC compatible and processor ECC compatible, to be fully functional.
If I were you, I would begin with a memory test...
[edit]oops, it looks like some recent I5 are ECC compatible (very recently).
Can you provide your detail system spec-?

danb35 · Jun 23, 2014

The memory test is likely a good place to start. Again, what is the output (in code tags--the curly braces) of zpool status? And does the system crash when you try to import from the command line, or only from the GUI?

Abel408 · Jun 23, 2014

UPDATE:

I tried Richards suggestion here: http://forums.freenas.org/index.php?threads/zfs-pool-import-crashes-freenas-9-2-0-x64-vm.17785/

I was able to import the pool using one txg prior, but the pool was not mounted (is this because I imported it as read only?)... Here is the zpool status now:

pool: Vo2
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-4J
scan: scrub in progress since Sun Jun 22 00:00:00 2014
499G scanned out of 827G at 1/s, (scan is slow, no estimated time)
0 repaired, 60.29% done
config:

NAME STATE READ WRITE CKSUM
Vo2 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
7285988012032049399 FAULTED 0 0 0 was /dev/gptid/e79e3d7f-98bc-11e3-ad1d-0025903503b6
gptid/e7e6e270-98bc-11e3-ad1d-0025903503b6 ONLINE 0 0 0
gptid/e836d167-98bc-11e3-ad1d-0025903503b6 ONLINE 0 0 0

gptid/e87ca4cd-98bc-11e3-ad1d-0025903503b6 ONLINE 0 0 0

Abel408 · Jun 23, 2014

SmallGuy said:
Look at this : http://ark.intel.com/search/advanced?s=t&ECCMemory=true
There isn't any core I5 witch support ECC memory.
ECC memory need Mother Board ECC compatible and processor ECC compatible, to be fully functional.
If I were you, I would begin with a memory test...
[edit]oops, it looks like some recent I5 are ECC compatible (very recently).
Can you provide your detail system spec-?

The cpu is at least 3 years old so I doubt it is one of those newer I5s... I will run a memtest.

Abel408 · Jun 23, 2014

Anyone have any suggestions for me? At this point, I think I am in good shape since I was able to import the volume in read only using the previous transaction group, but I am in uncharted waters now. I am afraid that if I try to fix that one drive or import using read/write, I could cause even more damage.

Also, the scrub seems to be stuck at 60.29%

solarisguy · Jun 23, 2014

Is there any particular reason to run ALPHA software?

Could you please tell me, whether gcc is included in that particular build? If it were, there would be a trivial way to confirm or deny whether ECC works on your system.

GCC is included with FreeNAS 9.2.1 -9.2.1.5

solarisguy · Jun 23, 2014

We are concerned that without ECC RAM, your scrub might damage your pool.

So a scrub is good with ECC RAM (and let it run for as long as it takes), but a scrub could be evil with non-ECC RAM...

Abel408 · Jun 23, 2014

solarisguy said:
Is there any particular reason to run ALPHA software?

Could you please tell me, whether gcc is included in that particular build? If it were, there would be a trivial way to confirm or deny whether ECC works on your system.

GCC is included with FreeNAS 9.2.1 -9.2.1.5

I am running Alpha because I had an issue with smb crashing my server and at the time, this alpha release was the only release that had a fix for it. I have the system down for memtest right now so I do not know if it has GCC or not.

Abel408 · Jun 23, 2014

So far, memtest isn't showing anything. Would it be bad if I imported this pool using the previous txg in read/write? If that works and I can access my data, how can I use this txg over the most recent one?

cyberjock · Jun 23, 2014

Ok, gonna clarify a few things in no particular order:

1. Registered memory means nothing except you can do higher density. I can guarantee you that you aren't using registered memory with an i5. if you are that is a major problem as i5s do not support registered memory. Registered RAM has no bearing on it's ability to store data properly or error correction.
2. ECC is important. The fact that you aren't using ECC may or may not be your problem. The whole problem with the cosmic ray bitflip thing is that *if* it is because of that you have no way of proving it... ever.
3. No i5 that has ever existed has supported ECC RAM (at least AFAIK). So you definitely are NOT utilizing ECC regardless of your RAM type. To actually use ECC features with Intel hardware the motherboard *must* have a server chipset (this is the main reason we recommend server-grade stuff). Server-grade motherboards also have a unique socket that is not compatible with i5s. It's a game Intel plays with you. If you want ECC your options are to go low end- some celerons, pentiums, and i3s or go high end with a Xeon. Notice I didn't mention i5 or i7.
4. Your pool is obviously damaged. Being that one disk seems to be faulted you probably have that condition where you have one bad disk and a URE on another disk resulting in a trashed pool. This is why we tell people not to do RAIDZ1.
5. Regardless of how you manage to mount your pool you need to get your data off the pool so you can destroy and recreate it. Consider your pool to be "unstable" forever now. ZFS doesn't auto-correct what it can't fix, and if it's giving a fault it has encountered problems it can't fix because of inadequate redundancy.
6. You very likely have a hardware problem somewhere. Bad hard drive(s), bad choice of SATA controller, something. You should seek it out. You don't suddenly have a corrupted pool because you sneezed. Something has to have gone wrong.
7. If you can mount your pool by rolling back some transactions my advice is to mount it as read-only and only when you are ready to start copying your data of the pool as fast as you can. It is very likely to crash again and you are very likely to not get all of your files back.
8. When you create your new pool don't do RAIDZ1 again.

danb35 · Jun 23, 2014

The pool as posted here is a RAIDZ2, not a RAIDZ1.

Ericloewe · Jun 23, 2014

Abel408 said:
I am using registered ECC memory. This is the memory I bought: http://www.newegg.com/Product/Product.aspx?Item=N82E16820148648

a) No i5 has ever supported ECC
b) No i5 has ever supported Registered memory

You may have Unbuffered ECC RAM, but you certainly are not using ECC. And you most certainly are not using Registered RAM.

Important Announcement for the TrueNAS Community.

"fatal trap 12: page fault while in kernel mode" Dead pool after scrub?

Dabbler

Attachments

Hall of Famer

Dabbler

Hall of Famer

Guru

Dabbler

Dabbler

Dabbler

Guru

Hall of Famer

Dabbler

Dabbler

Dabbler

Guru

Guru

Dabbler

Dabbler

Inactive Account

Hall of Famer

Server Wrangler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: ""fatal trap 12: page fault while in kernel mode" Dead pool after scrub?"

Similar threads