fatal trap 30 / hang at boot

proto

Patron
Joined
Sep 28, 2015
Messages
269
Hi all,
I got a fatal trap 30 reserved (unknown) fault [...] stopped at intr_init_final this morning after a programmed reboot.

What is going on?
Is it something I should be worried about?

Server specs in brief:
Supermicro X10SL7
CPU Xeon E3-1230 v3
32 Gb RAM

freebsd source code comment says:
Code:
 /*     * Enable interrupts on the BSP after all of the interrupt     
* controllers are initialized.  Device interrupts are still     
* disabled in the interrupt controllers until interrupt     
* handlers are registered.  Interrupts are enabled on each AP    
* after their first context switch.     */
 


Ref:
https://github.com/freebsd/freebsd/...9dd42367f813087472/sys/x86/x86/intr_machdep.c
lines 488-499


Here is the screenshot:

Screenshot 2019-06-16 12.06.02.png
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
My first thoughts are:
If there have been no changes to the system since the previous reboot then I'd submit a bug report. If you had made changes then I'd restore a previous environment configuration to see if you can get the system running again, worst case would be a previous boot environment.

But I assume you have powered down and attempted to reboot by now. Also, me not knowing about this error message, I'd run some CPU and RAM tests to ensure system stability even though I don't think it's a hardware stability issue, you just never know unless someone can tell you that for certain.
 

proto

Patron
Joined
Sep 28, 2015
Messages
269
I can confirm no changes at all since my last system update and a full clean install with 11-2-U4.1 on a new boot disk (single SSD).
The only "big" hw change was swapping a quad ethernet with a dual one, but weeks before installing latest FreeNAS release.

Yes, I simply typed "reboot" on that debug console and that system booted normally. Just before this scheduled reboot I only had a pair of unscheduled shutdown due to power failures last month (there is an opened bug on UPS not shutting down the server).

I'll do a Memtest!
What would you recommend me for a CPU test?

This week I'll take some time to perform other "scheduled" reboots and stress/check tests; than I will submit a bug.

Thanks!


#### UPDATE ####
I have just performed another reboot moments ago because I saw an error during boot process but could not catch it in time, so I made a capture from my IPMI console... and I got this:

BIOS drive C: ...
BIOS drive D: ...
ZFS: i/o error - all block copies unavailable
ZFS: failed to read pool Edge0 directory object
BIOS ...


Then it loads... and Edge0 pool is imported.

But Edge0 is my ZRAID-2 data pool, not my freenas-boot and it's strange because it is ONLINE and last scrub was good:
scan: scrub repaired 0 in 0 days 01:32:53 with 0 errors on Sun Jun 9 01:32:54 2019

Same for other pools.

This server build is panicking...
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
After the RAM test I think you should run a Scrub on pool Edge0 and see what happens. Also, have you checked your SMART results, assuming you are running SMART Long/Extended Tests. You may have a failing hard drive in that pool. It's tuff to give you the exact cause, you just have to rule out what it isn't. What I don't understand is the original error message you got. Maybe the RAM Test will find something or the CPU Test, it could be an unstable system. A CPU Test would be for example Prime 95. There are several out there, and the Ultimate Boot CD (UBCD) has a few that work fine and the RAM Test.
 

proto

Patron
Joined
Sep 28, 2015
Messages
269
Oh many thanks for that hint again, I will come back with results in a few days.

I did a scrub on Edge0 pool just this morning after the restart and seems good. No errors logged and now I have launched a long smart test on those disks (my fault: I scheduled only short weekly tests... argh!). After RAM/CPU tests I'll repeat a scrub.

What I don't understand is the original error message you got.

Me too... that fatal trap 30 happened yesterday and maybe it's transient, but it's scarcely documented or too specific.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Hope it all turns out well.
 

proto

Patron
Joined
Sep 28, 2015
Messages
269
I'm back with some results.

MEMTEST: passed
CPU Prime Test (Small FFTs and Blend): passed. These tests did not detect errors in more than 24 hours of execution. CPU was relatively hot between 45-65°C, normally it runs @ 35-40°C.

The disks part: passed. I don't see relevant errors in smartctl.
Both smart disk tests that were performed before and after those on CPU and Memory seem to have passed.

That ZFS error @ loading time disappeared too!
A double mystery!
 

Attachments

  • smart_x_da5.txt
    12.5 KB · Views: 285
  • smart_x_da4.txt
    12.2 KB · Views: 294
  • smart_x_da3.txt
    12.3 KB · Views: 291
  • smart_x_da2.txt
    12.3 KB · Views: 338
  • smart_x_da1.txt
    10.1 KB · Views: 321
  • smart_x_da0.txt
    9.9 KB · Views: 288

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
CPU was relatively hot between 45-65°C
That is one of the purposes, heat can cause solder joints to fail or just the CPU to fail and if you have a bad CPU fan/heatsink then the CPU would get really hot and should throttle itself. So 65C is not bad.

All your SMART data looks good as well. You may have dodged a bullet. Or the problem will come back and haunt you. ;)
 

proto

Patron
Joined
Sep 28, 2015
Messages
269
Still lucky then! Eheheh : - )
Lesson Learned: once in a while it's better to spend some time on hardware tests.

I performed a real CPU burn-in test in February when I upgraded the system:
https://www.ixsystems.com/community/threads/x10sl7-f-cpu-temp-during-rsync.73470/#post-509794

Bonus: a few years ago my system literally burned because of that lousy SATA connector that connected the case fan controller (Node 804). Once again lucky because I was in the room and I felt a strong smell of burnt plastic...
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Ah yes, burning plastic smell, not a good thing.
 
Top