Please help me: Fatal Trap 12

Status
Not open for further replies.

RFeynman

Dabbler
Joined
Dec 15, 2012
Messages
15
My system: Intel D875PBZ, P4 (3.2GHz), 4GB RAM, BFG Tech BFGR76512GSOC GeForce 7600GS (AGP), boot device: SD-CF-IDE-DI adapter w/8GB Transcend CompactFlash card; SYBA SY-PCI40010 PCI SATA II Controller Card, Antec True430w.
ZFS: (3 x 2TB HDDs) in RAIDZ1, dedup=off.

I had FreeNAS 8.3.0 installed running fine, pool healthy. I started copying some files over windows share, when the system crashed. After hard reboot, I got the "Fatal Trap 12" error, unable to access GUI or shell. I had a suspicion that bad SATA cables might have caused an HDD to be dropped from system, so I changed the cables. I reinstalled FreeNAS. Was able to boot, get GUI, setup SSH, but any attempt to import the pool, causes the "Fatal Trap 12" error.

Fatal_Trap_12_(FreeNASv830).jpg

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x3c
fault code = supervisor read, page not present
instruction pointer = 0x20:0x81055156
stack pointer = 0x28:0x873d719c
frame pointer = 0x28:0x873d719c
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 0 (system_taskq_1)


The drives & the pool is recognized when I do "zpool import" but if I "zpool import -f tank" crash.

[root@freenas] ~# zpool import
pool: tank
id: 2820821072774409311
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

tank ONLINE
raidz1-0 ONLINE
ada0p2 ONLINE
ada1p2 ONLINE
ada2p2 ONLINE


I tried booting off a USB stick with FreeBSD 9.1 LiveCD, same problem: attempt to import pool caused "Fatal Trap 12". I even connected the HDDs to a different system, same error with trying to import.

Fatal_Trap_12_(FreeBSD9.1LiveCD).jpg

Any help you guys can give me will be greatly appreciated. The data is extremely valuable to me, no backups (I know, I feel incredibly stupid), as I was in the process of ordering an external backup solution when all this occurred. I love FreeNAS, love ZFS, but this has me really concerned...
Thank you all for you time, patience, and efforts.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ok, here's a few questions and idea. Not sure if i'll be much help. But here goes:

Have you tried upgrading your RAM? There's a reason why the manual says 6GB of RAM minimum for ZFS. Yes, ZFS can work great with less than 6GB, but alot of people have had stability issues with less RAM.

Did you have deduplication enabled? If you did it's possible that the Fatal Trap 12 error is caused by not having enough RAM. You may need a boatload of RAM in this case.

Since you have 3 drives with a RAIDZ1, I'd try unplugging one and try to mount the zpool. If it fails try the second drive and so on. It's possible one of the drives is badly corrupted and removing it will solve the problem. If you try this use the command 'zpool import -f -o readonly=on -R /mnt/tank tank'. That way if something goes awry with one disk removed the zpool is still read-only so you can't do more damage. If this works report back with what you see. Don't try to do a scrub or anything like that. Forcing a zpool to mount with -f can be very damaging. All the more reason to keep it read-only if you do a force. If you do get access it may be a good time to backup anything important ;)

Edit: Do realize that its very possible your data is safe. Don't do anything stupid while emotions are high and you're in a panic with your desire to get to your data. I'm sure that one of the more knowledgeable people will provide some insight into the problem if its fixable. The forum has seen quite a few people do very stupid things to try to get their data back that has actually caused them to lose all of their data because of their haste. I'd provide links but that only serves to pour salt in the wounds for the people I'd call out. It REALLY sucks losing your data.

More Edits: I found a thread where someone had a similar problem and he found his RAM was bad. So doing a RAM test may be a good place to start too(besides.. its free via memtest x86+). Of course, this person that had bad RAM found out that the bad RAM also corrupted the entire zpool resulting in a loss of the zpool. He had to resort to backups.

Also, it may be a good idea to wait for protosd or PaleoN. Those 2 know their sh*t and I'm sure one of them will post in this thread in the next 12 hours.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Edit: Do realize that its very possible your data is safe. Don't do anything stupid while emotions are high and you're in a panic with your desire to get to your data. I'm sure that one of the more knowledgeable people will provide some insight into the problem if its fixable. The forum has seen quite a few people do very stupid things to try to get their data back that has actually caused them to lose all of their data because of their haste. I'd provide links but that only serves to pour salt in the wounds for the people I'd call out. It REALLY sucks losing your data.

++++1

Sit tight and don't freak out, ask and wait for confirmation if you decide to try something.

I wouldn't try what Noobsauce suggested about removing disks one at a time. It will just cause your pool to become faulted and add to the problem.

What I would try is running a memtest on your RAM to start. www.memtest.org

You can also try searching the tickets for ideas from other people that have had that error, it can be a ton of different things.

Look at those at support.freenas.org


If anyone else wants to jump in with ideas, please feel free!
 

RFeynman

Dabbler
Joined
Dec 15, 2012
Messages
15
Thanks for the reply noobsauce80.

1) 4GBs is the system's max, originally I had 2GBs. I'm looking into getting a new system.
2) I had deduplication set to OFF. I too read how horrible it is unless you have tons of RAM.
3) I'm going to be a little cautious in unplugging one HDD at a time. Interesting theory, but I'm afraid it might "fault" the pool. Before posting here, I had tried to do: zpool import -f tank, but I was still getting the Fatal Trap 12. I'll try the read-only method, once I've run MEMTest for several hours.

Thanks for the reply Protosd.

I am trying to stay calm. I've been looking through similar posts, even Googled to see if there's a hint at what's going on. Lots of links, very difficult to say. People mention it could be hardware related, but I've tried importing the pool on a different system with no success.

Everyone's welcome to help, please help me save my data. Thank you all.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
It will just cause your pool to become faulted and add to the problem.

If it mounts but has a fault that's better than no mount at all. One webpage I found had someone remove one drive at a time and he found that one of his disks was bad. He was able to mount the zpool without forcing it with that one disk missing. He found it odd because SMART had no indication of failure.. until he ran a long test of that disk.

OP - Yeah, ignore me. While my idea may make sense, protosd is far more knowledgable than I. So I'd just go with whatever he says :) He may get to the point of pulling one disk at a time, but let him try to lead you down the path. I know if he and I argued over who was 'more' right.. he'd eat me for breakfast :P
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
If it mounts but has a fault that's better than no mount at all. One webpage I found had someone remove one drive at a time and he found that one of his disks was bad. He was able to mount the zpool without forcing it with that one disk missing. He found it odd because SMART had no indication of failure.. until he ran a long test of that disk.

I suppose it's an option, but one I would save until later. Think of it like this, if you have a z1 you can loose one disk, if the disk you pull isn't the one that's bad, it's like loosing 2 disks. I understand the logic, but I think it's risky.


OP - Yeah, ignore me. While my idea may make sense, protosd is far more knowledgable than I. So I'd just go with whatever he says :) He may get to the point of pulling one disk at a time, but let him try to lead you down the path. I know if he and I argued over who was 'more' right.. he'd eat me for breakfast :P

Nah, I've seen you come up with stuff that I hadn't thought of, so it never hurts to have some extra ideas thrown out there.

RFeynman, If you've tried it on other hardware, then that would seem to eliminate the memory. It's either a bad disk or something corrupt with the pool.

If you do a search for one of mine or PaleoN's posts, there's an import command you can try to force it to mount, and/or rewind the metadata to a point where it can mount. I don't recall it off the top of my head, it's something with -fFX or similar.

If you really have the time, patience, and money, you could get new disks, clone the current ones with ddrescue, and put aside the originals while you try stuff with the copies.
 

RFeynman

Dabbler
Joined
Dec 15, 2012
Messages
15
So, I've been running Memtest86 for couple of hours now - starting to get this sinking feeling as many errors are being reported...
Memtest86_001.jpg Since it looks like the memory is bad, I'll be working to save my pool on a different system. How should I proceed? Suggestions, since even a different system gives "Fatal Trap 12"?

Protosd, I'll try to find that post you talked about - for importing the pool.
Thanks guys. Great feedback.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Found it, there's:

zpool import -nfF pool (the -n does sort of a trial run and tells you what would happen if -n wasn't used)

and then there's an undocumented "-X" option you can try if that doesn't work. So that would look like:

zpool import -fFX pool

I know I saw another strange incantation here somewhere, but I don't remember enough to find it.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
So, I've been running Memtest86 for couple of hours now - starting to get this sinking feeling as many errors are being reported...

So it's starting to sound like what Noobsauce was mentioning with the other person having bad memory and causing his pool to become too corrupt to fix. It doesn't sound promising, but keep us posted and confirm stuff with us if you're not sure.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Oh, and if you're going to try those import commands, try it on a system with good memory ;)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I've always thought that bad RAM could be responsible for corrupting a zpool beyond repair, but I considered it something that I'd likely never actually see. I've seen so little RAM go bad, only 2 sticks in over 15 years in computers, I considered it very unlikely. Looks like maybe you are the example of why ECC RAM is more important than we all thought.


I thought that trying to mount 2 disks with the read-only attribute would be fine. Worst thing that could happen is it wouldn't mount. But it couldn't make things worse since it was imported as readonly. But eh, lets see what else goes on first.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Also, I'm off to sleep now, so if I don't reply for awhile that's why.
Sleep, heck I've been in the ER with the daughter all AM, I'm beat. Time for me to sleep too.

@RFeynman
If you do not have another computer to use then if you can, remove a block of RAM and run MEMTest again. Try to get the failed memory out of there. Sure you will have less RAM but you might be able to get things working so you can backup any valuable data. If you have other RAM modules you could put in, that too and then MEMTest.
 

RFeynman

Dabbler
Joined
Dec 15, 2012
Messages
15
I really don't want to be the poster-child for ECC RAM. LoL. I know how important it is to have ECC, I was going to build a new ultimate server in the beginning of the new year, just had to let this system maintain my pool until then...

For now, I'm in the process of Memtest86'ing each memory module to isolate the bad ones. A little later I'll try the import commands on a different system. Keep you guys updated, I'm desperately praying my pool can still be saved. I have so much valuable data on it. Thank you all for your quick responses, ideas, and help. :)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I've seen so little RAM go bad, only 2 sticks in over 15 years in computers, I considered it very unlikely. Looks like maybe you are the example of why ECC RAM is more important than we all thought.

Wow. I've seen memory come in bad, I've seen memory go bad, and a hell of a lot more than two sticks every decade or two. We buy mostly ECC here, and in the systems that support reporting, the odd thing is that mostly we don't get many reports, and of the ones we do, either they tend to be the same ones over and over (bad module that is only slightly flaky with a single bit error every so often) or the module is a complete disaster. I have a bit of trouble reconciling the Google memory error research with what we've observed. But then again, it is very possible that we have very different ideas than Google does of build quality... for them, cram the cheapest stuff they can get in a rack and accept some failures is a sensible business model. It isn't unusual around here to bench test something for a week or even a month, then rack it, then stick trite loads on it, and we're not building with the cheapest available components. I have to wonder if that's part of the difference.

That having been said, if you're building a FreeNAS box, you're probably spending a ton on hard drives and controllers and stuff. The cost differential to get a server-class mainboard and some ECC memory can really pale in comparison to other parts of the system. If your data isn't worth it, don't save it to begin with. :smile:
 

RFeynman

Dabbler
Joined
Dec 15, 2012
Messages
15
From 4 of the 1GB RAM sticks, one was really bad. So now running memory at 3GB RAM, did a fresh install of FreeNAS v8.3.0, followed by a GUI upgrade to p1. Rebooted, from the shell: zpool import
Code:
[root@freenas] ~# zpool import

   pool: tank
     id: 2820821072774409311
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        tank        ONLINE
          raidz1-0  ONLINE
            ada0p2  ONLINE
            ada1p2  ONLINE
            ada2p2  ONLINE
But, unfortunately - I still get the Fatal Trap 12 error when trying to execute any of the other import commands. I'm forced to reboot without my pool being mounted. Any other suggestions? Please...?
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
I've seen so little RAM go bad, only 2 sticks in over 15 years in computers, I considered it very unlikely. Looks like maybe you are the example of why ECC RAM is more important than we all thought.

I'm with you here, I've also seen very little RAM go bad in 30 years. Although it is quite funny that 2 weeks ago one of my clients had a stick go bad, and this is the same place that had some other RAM go bad a few years ago on different hardware in a different location. Personally though, I've *maybe* had 2 or 3 sticks go bad in about 30 years. Coming from a background in electronics though, I can say I think proper handling in a static free work area can make a difference too, as well as having a clean power supply.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Maybe try and run a scrub again? Maybe ZFS can do it's magic better without the bad RAM?
 

RFeynman

Dabbler
Joined
Dec 15, 2012
Messages
15
I hear you Protosd, but my problem is that I can't even import or mount my pool to do the scrub. I was thinking of creating a FreeBSD LiveUSB boot stick, and try from shell.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
That's really really strange since it contradicts itself telling you it can be imported. There's definitely a need for some ZFS recovery and diagnostic tools.

FreeBSD 9.1 Live should be ok to try.
 
Status
Not open for further replies.
Top