FreeNAS crash - "vm_fault: fault on nofault"

AbsolutIggy

Dabbler
Joined
Feb 29, 2020
Messages
31
Hi,

I've been having some crashes, too regularly for my liking, on one of my FreeNAS machines - twice lately, the 27th and 29th of Feb. In the debug I have a file called panic.txt with the following content: "vm_fault: fault on nofault entry, addr: fffffe00597da000" (for both crashes, the addr is slightly different).

When these crashes occur, the machine becomes unresponsive for roughly 45 minutes, then reboots itself and reports a unscheduled reboot without problems. (eg
nasb.domain.li had an unscheduled system reboot. The operating system successfully came back online at Sat Feb 29 12:47:44 2020.)

If anybody could help me figure out what that means I would be happy. I'm not running any virtual machines or jails on this machine.

I have (nearly) identical machines running FreeNAS. This one, called NASB, is the backup and only there for receiving snapshots from the primary machine. Since the update to 11.3, I have moved the snapshot replication task to NASB in 'pull' mode. Since then, I have updated to 11.3-U1. Before, while using 11.2, I had no issues like this.

The main machine has a bunch of datasets in one pool, and one jail running (FreeBSD for snapshot). The secondary machines does not have any jails or so. Both crashes occurred while replication was running. However, lately the replication has been more or less constant because a lot of data is being added to the main machine.

Some machine info:
Code:
HPE Proliant Microserver Gen10
AMD Opteron X3421
16 GB Memory
System is on two mirrored USB drives
Pool is on 4*8TB Seagate Exos drives, GELI encrypted.
FreeNAS-11.3-U1 (34d6357440)


In the debug I've downloaded, the info.0 file shows this (hostname changed)
Code:
Dump header from device: /dev/ada3p1
  Architecture: amd64
  Architecture Version: 1
  Dump Length: 580608
  Blocksize: 512
  Dumptime: Thu Feb 27 13:54:31 2020
  Hostname: nasb.domain.li
  Magic: FreeBSD Text Dump
  Version String: FreeBSD 11.3-RELEASE-p5 #0 r325575+8ed1cd24b60(HEAD): Mon Jan 27 18:07:23 UTC 2020
    root@tnbuild02.tn.ixsystems.com:/freenas-releng/freenas/_BE/objs/freenas-releng/freenas/_BE/os/sys/Free
  Panic String: vm_fault: fault on nofault entry, addr: fffffe00183eb000
  Dump Parity: 1445934717
  Bounds: 0
  Dump Status: good


info.1:
Code:
Dump header from device: /dev/ada3p1
  Architecture: amd64
  Architecture Version: 1
  Dump Length: 582144
  Blocksize: 512
  Dumptime: Sat Feb 29 12:02:37 2020
  Hostname: nasb.domain.li
  Magic: FreeBSD Text Dump
  Version String: FreeBSD 11.3-RELEASE-p6 #0 r325575+d5b100edfcb(HEAD): Fri Feb 21 18:53:26 UTC 2020
    root@tnbuild02.tn.ixsystems.com:/freenas-releng/freenas/_BE/objs/freenas-releng/freenas/_BE/os/sys/Free
  Panic String: vm_fault: fault on nofault entry, addr: fffffe00597da000
  Dump Parity: 2310618400
  Bounds: 1
  Dump Status: good


Does the line Dump header from device: /dev/ada3p1 mean that something is wrong with disk ada3?

I've attached ddb.txt and msgbuf.txt for the two crashes (just slightly anonymised)

Any help is appreciated!
 

Attachments

  • ddb.1.txt
    534.8 KB · Views: 314
  • msgbuf.0.txt
    24.1 KB · Views: 381
  • ddb.1.txt
    535.5 KB · Views: 288
  • msgbuf.1.txt
    25.2 KB · Views: 271

AbsolutIggy

Dabbler
Joined
Feb 29, 2020
Messages
31
Looks a bit more like something related to the hard drives. I got this message yesterday, a few hours before the most recent crash:
Code:
Device: /dev/ada3, not capable of SMART self-check.


Currently running a long SMART test, to see what happens then. The long self-test was scheduled to take place yesterday.
 

AbsolutIggy

Dabbler
Joined
Feb 29, 2020
Messages
31
So the SMART test ran through when I started it manually, without reporting any errors. The results of the previous tests can't be seen though - somehow this disk never carried out the tests (even though it should have), or has forgotten the results.

Still haven't gotten any closer to an explanation, so any input would be appreciated :)
 

Attachments

  • ada0.txt
    5.5 KB · Views: 259
  • ada1.txt
    5.5 KB · Views: 262
  • ada2.txt
    5.5 KB · Views: 266
  • ada3.txt
    5.4 KB · Views: 266

AbsolutIggy

Dabbler
Joined
Feb 29, 2020
Messages
31
Alright, no more crashes in the last 2.4 days since I disabled three options in the replication task:

Code:
Send Deduplicated Stream
Allow Blocks Larger than 128KB
Allow Compressed WRITE Records


These were enabled by default when setting up the task, and I assumed FreeNAS-FreeNAS replication could handle them. Right now I don't have a clue which of the three is a problem, and I'm going to wait a few days for confirmation that the crashes don't occur anymore.
 

ykhodo

Explorer
Joined
Oct 19, 2017
Messages
52
Alright, no more crashes in the last 2.4 days since I disabled three options in the replication task:

Code:
Send Deduplicated Stream
Allow Blocks Larger than 128KB
Allow Compressed WRITE Records


These were enabled by default when setting up the task, and I assumed FreeNAS-FreeNAS replication could handle them. Right now I don't have a clue which of the three is a problem, and I'm going to wait a few days for confirmation that the crashes don't occur anymore.
Are you still not seeing crashes? I'll try disabling these settings as well.
 

AbsolutIggy

Dabbler
Joined
Feb 29, 2020
Messages
31
Yes, no more crashes since more than a week.

I have no idea which of the three settings is the culprit - or if it's a combination, since I haven't tested. I need this to system to be running.

I've reported a bug here: NAS-105468
 

Elamir

Cadet
Joined
Apr 18, 2020
Messages
1
Hi,
I just discovered that I have the same 'panic: vm_fault' as you have, tho the circumstances are different.

this crash happened before (it happened while running 11.2-U7), though I never were able to catch it. and until now, did not have the time to investigate.
I have no way of confirming that it was the same crash. the behavior and circumstances were the same:
  • transfer of data to one of the pool (high disk usage): ssh <target> 'zfs send ...' | pv -s 100G | zfs recv ...
  • freeze
  • reboot

I only have one freenas system running freenas 11.3-U2

This time, it happened while trying to move data between two pool:
one raid-z2 consisting of 6 sata disks
one mirror consisting of 2 sata disks.

the command was executed twice:
  • firstly: from the web interface.
  • secondly: directly from the system terminal (without ssh)
in both cases, the crash happened roughly after 20GB+ of data were transferred.

the command was:
zfs send -RDce Data/Media/Music@switch | pv -s 60G | zfs recv Media/Music

attach is the picture of the terminal:
freenas_panic_vm_fault.jpg

the system is home build:
  • cpu: AMD Ryzen Threadripper 2970WX
  • Memory: 64GB (4x 16GB DDR4-2133 UDIMM PC4-17000P-E Dual Rank x8 Module)
  • Motherboard: ASROCK X399 PROF. GAMING ATX AMD-STR4

As the panic message is the same, I am posting this here.
I did not yet post this on the Jira, as I am not sure whether this is related, or if I should create another ticket.

Thank you
 

AbsolutIggy

Dabbler
Joined
Feb 29, 2020
Messages
31
Elamir, I see that you added your error to the bug report, at least I'm not the only one seeing this..

ykhodo: I've narrowed it down to these two options:
Code:
Send Deduplicated Stream
Allow Compressed WRITE Records

If these two are activated, I experience crashes, just as if all three are activated. Any one option on it's own doesn't seem to cause the crashes. If you have experienced crahes, it would be good to add to the bug report:
NAS-105468
 

francisaugusto

Contributor
Joined
Nov 16, 2018
Messages
153
I am having these crashes, but only on the FreeNAS that is receiving the replication snapshots. I have turned off the above mentioned options to see if the crashes will stop.
 

AbsolutIggy

Dabbler
Joined
Feb 29, 2020
Messages
31
It was the same for me - the bug is fixed, option to deduplicate streams will be removed in a future version. For now, the fix is - as you say - to disable this option.
 
Top