TrueNAS Core 12.0-U7 Unexpected reboot

adorobis

Dabbler
Joined
Oct 16, 2017
Messages
27
I've had an unexpected reboot today, first time ever and not sure if I should start worry about it. Below is what I've found in the messages log, last entries before the reboot.
It does not really mean much to me, maybe someone would be able to help understanding what could have caused it:
Code:
Feb  4 10:38:56 homeserwer syslog-ng[840]: syslog-ng starting up; version='3.29.1'
Feb  4 10:38:56 homeserwer kernel: vnet0.17: link state changed to DOWN
Feb  4 10:38:56 homeserwer kernel: epair0b: link state changed to DOWN
Feb  4 10:38:56 homeserwer Fatal trap 12: page fault while in kernel mode
Feb  4 10:38:56 homeserwer cpuid = 3; apic id = 06
Feb  4 10:38:56 homeserwer fault virtual address    = 0x218
Feb  4 10:38:56 homeserwer fault code        = supervisor read data, page not present
Feb  4 10:38:56 homeserwer instruction pointer    = 0x20:0xffffffff80c1df75
Feb  4 10:38:56 homeserwer stack pointer            = 0x28:0xfffffe00c3ff38b0
Feb  4 10:38:56 homeserwer frame pointer            = 0x28:0xfffffe00c3ff38e0
Feb  4 10:38:56 homeserwer code segment        = base 0x0, limit 0xfffff, type 0x1b
Feb  4 10:38:56 homeserwer             = DPL 0, pres 1, long 1, def32 0, gran 1
Feb  4 10:38:56 homeserwer processor eflags    = interrupt enabled, resume, IOPL = 0
Feb  4 10:38:56 homeserwer current process        = 0 (softirq_3)
Feb  4 10:38:56 homeserwer trap number        = 12
Feb  4 10:38:56 homeserwer panic: page fault
Feb  4 10:38:56 homeserwer cpuid = 3
Feb  4 10:38:56 homeserwer time = 1643967395
Feb  4 10:38:56 homeserwer KDB: stack backtrace:
Feb  4 10:38:56 homeserwer db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00c3ff3570
Feb  4 10:38:56 homeserwer vpanic() at vpanic+0x17b/frame 0xfffffe00c3ff35c0
Feb  4 10:38:56 homeserwer panic() at panic+0x43/frame 0xfffffe00c3ff3620
Feb  4 10:38:56 homeserwer trap_fatal() at trap_fatal+0x391/frame 0xfffffe00c3ff3680
Feb  4 10:38:56 homeserwer trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00c3ff36d0
Feb  4 10:38:56 homeserwer trap() at trap+0x286/frame 0xfffffe00c3ff37e0
Feb  4 10:38:56 homeserwer calltrap() at calltrap+0x8/frame 0xfffffe00c3ff37e0
Feb  4 10:38:56 homeserwer --- trap 0xc, rip = 0xffffffff80c1df75, rsp = 0xfffffe00c3ff38b0, rbp = 0xfffffe00c3ff38e0 ---
Feb  4 10:38:56 homeserwer igmp_change_state() at igmp_change_state+0x45/frame 0xfffffe00c3ff38e0
Feb  4 10:38:56 homeserwer in_leavegroup_locked() at in_leavegroup_locked+0xa4/frame 0xfffffe00c3ff3940
Feb  4 10:38:56 homeserwer inp_freemoptions() at inp_freemoptions+0x15a/frame 0xfffffe00c3ff3980
Feb  4 10:38:56 homeserwer in_pcbfree_deferred() at in_pcbfree_deferred+0x157/frame 0xfffffe00c3ff39d0
Feb  4 10:38:56 homeserwer epoch_call_task() at epoch_call_task+0x19a/frame 0xfffffe00c3ff3a20
Feb  4 10:38:56 homeserwer gtaskqueue_run_locked() at gtaskqueue_run_locked+0x121/frame 0xfffffe00c3ff3a80
Feb  4 10:38:56 homeserwer gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xb6/frame 0xfffffe00c3ff3ab0
Feb  4 10:38:56 homeserwer fork_exit() at fork_exit+0x7e/frame 0xfffffe00c3ff3af0
Feb  4 10:38:56 homeserwer fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c3ff3af0
Feb  4 10:38:56 homeserwer --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Feb  4 10:38:56 homeserwer KDB: enter: panic
 
Last edited:

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
Looks like a panic in the network stack. If this is the only time you've seen this, you can probably ignore. If it becomes a recurring thing, then you can file a ticket on https://jira.ixsystems.com so we can take a look.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Your board uses a Realtek RTL8111E NIC. The in-kernel re NIC driver isn't the greatest, stability-wise. Try loading the newer if_re.ko driver instead by setting a loader tunable if_re_load="YES", and then rebooting.

For best stability, try using an Intel PCI-E 1G NIC instead. These are pretty inexpensive on Amazon.
 

adorobis

Dabbler
Joined
Oct 16, 2017
Messages
27
Many thanks for your quick responses! It's the firs time I've had such reboot or any other instability but I'll give it a try with your suggested tunable.
 

adorobis

Dabbler
Joined
Oct 16, 2017
Messages
27
Your board uses a Realtek RTL8111E NIC. The in-kernel re NIC driver isn't the greatest, stability-wise. Try loading the newer if_re.ko driver instead by setting a loader tunable if_re_load="YES", and then rebooting.

For best stability, try using an Intel PCI-E 1G NIC instead. These are pretty inexpensive on Amazon.
After setting this tunable my SMB server is no longer discoverable by name (works ok by IP), after removing the tunable it works again. Any ideas what else should be configured to make it work?
 

pbasque

Cadet
Joined
Apr 26, 2022
Messages
7
Hi Guys, I'm new here in the community.
Sorry for any google translator translation errors.

I have a similar scenario, having unexpected reboots
Following message:
-------------------------------------------------- ----------------------------------------
truenas.192.168.0.1 had an unscheduled system reboot. The operating system successfully came back online on Thu Apr 28 07:46:41 2022.
2022-04-28 07:46:41 (America/São_Paulo)

truenas.192.168.0.1 had an unscheduled system reboot. The operating system successfully came back online on Tuesday, May 10 at 15:55:58, 2022.
2022-05-10 15:55:58 (America/São_Paulo)
-------------------------------------------------- ----------------------------------------

I have the following scenario:
Supermicro chassis – 24-bay case
Supermicro X11SPI-T motherboard
Rev. Device IPMI 1, firmware rev. 1.73, version 2.0, device support mask 0xbf

01 Intel(R) Xeon(R) Silver 4214 CPU
190 GB of ECC RAM memory
02 Intel(R) Ethernet Connection X722 NIC for 10GBASE-T integrated
01 AVAGO MegaRAID SAS FreeBSD mrsas driver version: 07.709.04.00-fbsd
24 HDs ST18000NM000J-2T SN02 - (SPC-4 Fixed Direct Access SCSI device)
01 SSD NvME
01 SanDisk SDSSDA240G Z32070RL - (ACS-2 ATA SATA 3.x device) ada0 boot-pool / OS Installed
03 SanDisk SDSSDA240G Z32070RL - (ACS-2 ATA SATA 3.x device)
System Cache: not configured
01 Samsung SSD 970 EVO Plus 250GB
System Cache: active

The Controller is operating in HBA mode, that is, it makes the disk available directly to the TrueNAS without creating any type of RAID.

TrueNAS-12.0-U8 / FreeBSD 12.2-RELEASE-p12

Pool overview
Data: 01 vdev / Cahches: 01 / Spares: 02 / Logs: 0

Services Activate:
SMB and S.M.A.R.T

Directory Services don't use any kind of LDAP
Non-TrueNAS local users and groups
Directory Services and Deduplication = OFF
 

pbasque

Cadet
Joined
Apr 26, 2022
Messages
7
Looking in the Netgear Switch logs
I see that the 10 Gbps NICs go down and up automatically

to a point that the host of the reboot.

at IPMI does not accuse any problem with the Hardware.

The environment has dedicated air conditioning and sinusoidal UPS
 

pbasque

Cadet
Joined
Apr 26, 2022
Messages
7
attached the print of the logs on the reboot from 15 days ago.
 

Attachments

  • Print-06 TrueNas var_log_message Erro Muitos Erros de autenticação.png
    Print-06 TrueNas var_log_message Erro Muitos Erros de autenticação.png
    2 MB · Views: 152

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
01 AVAGO MegaRAID SAS FreeBSD mrsas driver version: 07.709.04.00-fbsd

Your HBA is probably the issue. Just running it in HBA mode isn't enough. It has to be flashed to IT mode.

 

pbasque

Cadet
Joined
Apr 26, 2022
Messages
7
Hi
Thanks for the quick response.
what would the "Has to be upgraded to IT mode."

How should I proceed to activate this mode?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
What's the exact model of your controller?
 

pbasque

Cadet
Joined
Apr 26, 2022
Messages
7
AVAGO MegaRAID SAS 9361-24i
 

Attachments

  • Print-05 Properties Tela-1 - Firmware Controller LSI AVAGO.png
    Print-05 Properties Tela-1 - Firmware Controller LSI AVAGO.png
    373.1 KB · Views: 133
  • Print-05 Properties Tela-2 - LSI AVAGO.png
    Print-05 Properties Tela-2 - LSI AVAGO.png
    212.6 KB · Views: 118

pbasque

Cadet
Joined
Apr 26, 2022
Messages
7
I was reading here about TI Mode
I understand that you need to reset the controller's flah, correct?

Doubt: if I had it in RAID mode and each disk in RAID-0 would work without problems?
Estava lendo a
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
I'm sorry, but your adapter can't be flashed to IT mode. You'll need to select a different HBA. Running it in RAID mode is not recommended, and will eventually result in data loss.
 

pbasque

Cadet
Joined
Apr 26, 2022
Messages
7
Samuel Tai, thanks for the feedback.
Is there any way to validate that the problem is the controller itself?
In the Logs some error message, something that I can be more evident.

Or if there was some configuration missing in TrueNAS that we didn't do., possibility of a Hardware defect that was not detected in the IPMI!?
I'm trying to see all possibilities before informing the client that he will have to change the controller.

If there is something in the logs it would help a lot to close this issue on my side.
 
Top