FreeNAS 9.10.1 Continuous Reboot, "Fatal trap 12" RAIDZ2 Volume

Status
Not open for further replies.

Joseph Sharbutt

Dabbler
Joined
Apr 12, 2014
Messages
28
I've had six 4TB drives in a RAIDZ2 volume for two or three years now staying on the latest stable build of FreeNAS. I just noticed tonight that it has apparently started rebooting itself and reporting the "Fatal trap 12: page fault while in kernel mode" error. I have tried booting in to a previous version, reinstalling and importing backup, and installing from scratch, re configuring and attempting to import pool.

None of these have worked for me. I have 12-14TB of data in this pool that I would really like to salvage. Yes, the important stuff is backed up, but ZFS has always been so rock solid, I can't image that my data is gone. When I attempt to import the volume from either the GUI or command prompt, I see the identifier listed and it allows me to start the import. After several seconds, the machine reboots, and still no storage volume available.

Any ideas. I have done quite a bit of Google and forum searching already, and no suggestions that I have read appear to apply to my situation.

Please let me know if this should be posted elsewhere, and thanks in advance for the assistance.
 

Attachments

  • Capture.PNG
    Capture.PNG
    54.8 KB · Views: 262
D

dlavigne

Guest
Did you try all of this from the same boot device? In other words, have you tried another USB stick?
 

Joseph Sharbutt

Dabbler
Joined
Apr 12, 2014
Messages
28
Thank you for the response.

I reinstalled on the original USB. I didn't think that would be an issue since a fresh install seems to work with no problem. I could try that.

I was also wondering if I intentionally degraded the array by removing a drive if it might help in some way?? I know that sounds counter intuitive, but maybe there is some corrupt parity data on one of the drives or something. Is there any way of check the "metadata" of the pool on the individual drives without the volume being imported? Where does FreeNAS get the information that the volume even exists? I'm guessing it is stored across the drives with parity like the other data, so when it scans the drives it can tell that they are intended to be a single volume, with enough redundancy to lose two drives and still function.

I'm very reluctant to go down that route because of the headaches I've had just trying to mirror and resilver USB drives as a boot volume. I have other devices that have been booting from the same USB drive for nearly a decade with no issues. FreeNAS seems to corrupt two or three USB boot drives a year.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Can you provide System Hardware Specs? Also, what HBA/Controller you are using for the drives?

Normally a "Fatal trap 12" can be due to RAM issues, but other things like running a Hardware Raid can also be the issue.

Here is a similar thread on the that error: "My freenas wont start up. Fatal Trap 12"
I have other devices that have been booting from the same USB drive for nearly a decade with no issues. FreeNAS seems to corrupt two or three USB boot drives a year.
USBs are not inherently that great compared to HDD/SSD as far a longevity, for a lot of us we just avoid using them altogether.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
USB Flash Drives are a high failure component in a FreeNAS system. If you can replace it with a single small SSD then you would be doing yourself a favor in the long run. But for now, Dru requested you try a new/different USB flash drive and to see if that helps. Please use the current version you were using when the failure occurred, no need to inject another unknown yet.

My first thoughts were if you have been running routine SMART testing and now you may have several drives which have failed, or you may have other hardware component issues such as bad RAM or power supply.

1) Please post your list of hardware and what version of FreeNAS you are running.
2) Run Memtest86 on your system for 48 hours minimum.
3) Post the output of "smartctl -a /dev/adax" (where x is the drive identifier)

EDIT: You can do step 3 before step 2 if you like but you need to do both.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
I would advise against doing that.

My guess, is that it's a hardware problem, ie. mobo/cpu/ram.

I ran into the "Fatal trap 12" error back in May 2012 and couldn't track down the problem. With more money than time, I ended up replacing the server. My drives imported just fine and I haven't had a problem since then.

I was also wondering if I intentionally degraded the array by removing a drive if it might help in some way??
 

Joseph Sharbutt

Dabbler
Joined
Apr 12, 2014
Messages
28
So within the last six months I completely overhauled this system from consumer grade hardware that I had laying around to hardware that was recommended in the forums, wikis, etc. I ran the whole setup, non ECC and all with no issues whatsoever for two years or so. Now that I have more robust components, I seem to be having more issues. See below.

SuperMicro X10SLH-F
i3-4130 @3.4GHz
2x Crucial 16GB Kit (8GBx2) DDR3/DDR3L-1600MT/s (PC3-12800) DR x8 ECC UDIMM Server Memory (32GB)
2x Intel PRO/1000 Dual Port Server Adapters
6x 4TB WD Red NAS drives connect directly to the six SATA ports on the board.

PCIe SATA adapter is my next purchase, but I have not been down that road yet. With that said, it goes without saying that I do not have a spare SATA port to set up a boot SSD, although I have about half a dozen 64GB drives laying around my office. I guess I could use a USB to SATA adapter, but have not really expored that either.

Could be forgetting something, doing this in a hurry. I will say that I have been building consumer grade PCs for 20+ years, but the setup for this board has several options that I've had to do some digging to know for sure if they are set correct for FreeNAS and ZFS RAID.

I was just going through it and saw a setting called "SATA Frozen" that was enabled. That didn't look right to me, as I am using all six SATA ports on this board for my NAS drives. Is this ok to have enabled?

Also, it appears that for some reason there is a watchdog on this board. I'm not sure if it is from the BIOS/EFI part of the board or if IPMI is doing it. It is getting logged in IPMI, but the setup utility has an option to disable it.

I'm working on exploring these options. I just think its funny that it's worked for 6+ months without a hitch.

If anyone can shed some light on any of the above mentioned topics, it would be greatly appreciated. I will be getting the smartctl output requested and running memtest probably sometime tonight. Thank you for all of the responses. I hope I put my faith in the right file system this time. :)
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I guess I could use a USB to SATA adapter, but have not really expored that either.
Don't do that. If you don't have the available SATA ports then use a USB Flash drive.

I would still perform the Memtest86 and just ensure nothing bad happens. Sometimes parts fail and you might be the lucky dog to have to deal with one. You might even run a CPU stress test. Have you inspected to verify all the fans were running, this includes the power supply fan? And the trap 12 typically is a hardware failure but hopefully it isn't.

I wouldn't mess with any settings if the thing had been working fine, all you would do is possibly inject another unknown.

One other thing, what version of FreeNAS are you running and have you recently (within the last 2 months) upgraded? If yes, can you roll back to the previous version? But I wouldn't do that until you have tested out the hardware.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
I hope I put my faith in the right file system this time.
You did, but no matter what sometime parts go wacky or "give up the ghost". However, FreeNAS is so versatile that as long as hardware permits it to run and access the drives things can be swapped or replaced without any real concerns.

On the SuperMicro, I will have to defer to others since I don't run it. Might want to see if there are any BIOS or Firmware Updates anyways.
 

Joseph Sharbutt

Dabbler
Joined
Apr 12, 2014
Messages
28
Also, I just assume that Hot Plug should remain enabled for RAIDZ 2 drives, but I know it can do funny things in windows. Should it be enabled or disabled for FreeNAS?
 

Joseph Sharbutt

Dabbler
Joined
Apr 12, 2014
Messages
28
Thank you for the prompt responses. I just started looking through my BIOS setup utility and things just looked a little wonky, like things that should have jumped out at me in the beginning.

You are correct about not introducing further problems. I will stop fiddling with that. I would like a good answer that the SATA Frozen option. It seems to be linked to the setup admin password and that scares me, like it may have the ability to prevent the OS from accessing the drives. That would be a huge problem, expecially if it is writing anything to the drives, or god forbid, encrypting them.

Along with the spontaneous reboots from the Watchdog that I obviously need to read up on. That seems to be a timer that is designed to automatically perform a clean reboot at set intervals, but I'm watching it reboot every couple of minutes, almost like it is a case intrusion security or something. I may be chasing ghosts here, but I guess I'm learning some lessons about big boy hardware.
 

Joseph Sharbutt

Dabbler
Joined
Apr 12, 2014
Messages
28
Also, should I even bother trying to boot EFI with FreeNAS? I've had trouble and not sure if it's worth it. Trying to reinstall and it won't boot UEFI from the USB installer.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I would not change anything in the BIOS. If you have a question about the BIOS, I'd recommend the user manual. If it's a default value, I'd leave it alone if you never touched it before.

Booting to Legacy (not UEFI) is less problematic.

With respect to hot plugging a hard drive, DON'T if you can just shut the server down and then replace the drive. Hot plugging a drive can lead to problems and should only be used on a server that must remain up and running.

Report back once you have run the hardware tests. Don't second guess the BIOS yet.
 

Joseph Sharbutt

Dabbler
Joined
Apr 12, 2014
Messages
28
So, I have made a bit of progress, but still trying to make sense of it all. If I manually import the volume from the command line using the -f switch to force the import, the computer reboots and when it comes back up, from the command line, "zpool status" shows the volume. The web GUI does not. The web GUI actually shows the drives as available to create a new pool, which is very unsettling. Could this possibly be a simple mounting issue at this point?

I also read a forum not specific to freenas, but about the same situation, that with the the relational style file system, snapshotting, etc., that there is a way to restore it to an earlier version in time and correct the corruption, during the import process with a few other command switches.

I'll keep looking in to it. In the meantime, I have attached the requested output for all six physical devices in my pool. If anyone sees something obvious that I'm missing, please let me know.

Thanks again. Great group of people here.
 

Attachments

  • ada0.txt
    4.5 KB · Views: 257
  • ada1.txt
    4.7 KB · Views: 290
  • ada2.txt
    4.5 KB · Views: 267
  • ada3.txt
    4.5 KB · Views: 246
  • ada4.txt
    4.6 KB · Views: 250
  • ada5.txt
    4.6 KB · Views: 247

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
I only looked at ada0, but in 12,652 hours, no self-tests have ever been run. If this is the case for the others too, you should run short tests on all of them (only takes a couple of minutes). Look at the results. Depending on what you see, run the long test (takes hours).
 

Joseph Sharbutt

Dabbler
Joined
Apr 12, 2014
Messages
28
I noticed that it said that too, but I could have sworn that there were tasks scheduled in FreeNAS to do that regularly. How does Windows handle this? IRST, WD Drive Tools, Etc, or is it just part of the built in drive maintenance? These drives have never been in anything but this pool.

Thanks for taking the time to review it though. From glancing over the other five, the all look about the same.
 
Last edited:

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Also, I just assume that Hot Plug should remain enabled for RAIDZ 2 drives, but I know it can do funny things in windows.
I noticed that it said that too, but I could have sworn that there were tasks scheduled in FreeNAS to do that regularly. How does Windows handle this? IRST, WD Drive Tools, Etc, or is it just part of the built in drive maintenance? These drives have never been in anything but this pool.
Thanks for taking the time to review it though. From glancing over the other five, the all look about the same.
You keep mentioning Windows... Is there something particular about your configuration the we should be aware of?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
A few things, your last two drives had a SMART short test done probably right after they were installed and that was it, none of the other drives have been tested. As previously indicated, run a SMART short test and review the results. Read the troubleshooting guides for the hard drive ID info. Run a SMART long test after that.

Second, SMART testing is not setup by default. This is not Windoze and while it does look like a nicely finished program, there is a certain level of setup require. For instance, do you have your automatic emails setup? I suspect not since you don't have SMART testing setup. Make sure you setup the emails becasue they are crucial when a failure occurs.

Third, you should not be using the "-f" parameter unless you are just asking to accidentally corrupt your pool.
 

Joseph Sharbutt

Dabbler
Joined
Apr 12, 2014
Messages
28
You keep mentioning Windows... Is there something particular about your configuration the we should be aware of?

Only because I am more familiar with NTFS and Windows systems. I am just making comparisons. What do you mean, something particular about my config? The box in question is a standalone, physical FreeNAS machine consisting of the components mentioned above. I also run a dedicated Windows server for AD, DNS, DHCP, etc., and many client versions of windows in different flavors of physical and virtual machines, if that helps answer your question.
 
Status
Not open for further replies.
Top