nfsrv cache session: no session crashes TrueNAS

cole-maxwell

Cadet
Joined
Mar 18, 2022
Messages
3
Hi all,

I have an NFSv4 share that is causing my system to crash. I am not sure nfsrv cache session is. Has anyone seen this before? Any suggested resolutions?
rpviewer.png
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
From this, it's not clear that NFS is actually what's causing your system to crash; it could just be the messenger that's revealing some other underlying problem.

I would try the following:
  • Full scrub on your pool(s) to make sure that data is good.
  • Update TrueNAS
  • Fresh Install of TrueNAS, and only configure the minimum needed to test the NFS share.
  • Memory test of your hardware.
Going forward, please make sure to follow the forum rules by posting your hardware and attaching a debug file.
 

cole-maxwell

Cadet
Joined
Mar 18, 2022
Messages
3
From this, it's not clear that NFS is actually what's causing your system to crash; it could just be the messenger that's revealing some other underlying problem.

I would try the following:
  • Full scrub on your pool(s) to make sure that data is good.
  • Update TrueNAS
  • Fresh Install of TrueNAS, and only configure the minimum needed to test the NFS share.
  • Memory test of your hardware.
Going forward, please make sure to follow the forum rules by posting your hardware and attaching a debug file.
Hi Nick,

Thanks for the suggestions! I did your recommended steps and unfortunately, it did not turn up any issues. My apologies for not looking at the forum rules closely enough. Hopefully I can correct that mistake here.

Hardware Information:
CSCINAS || TrueNAS Core 12-U8 || PowerEdge FC630 in Dell Fx2s Blade Enclosure | PowerEdge FD332 drive bay in HBA | 1 x Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz | 8x 16GB ECC RAM | 8x 2TB Crucial ATA CT250MX500SSD1 | 2x Dell 1600W PSUs (Part #: 095HR5A01)

See attached debug file:

Thanks for your help!
 

Attachments

  • debug-cscinas-20220319104258.tgz
    9.8 MB · Views: 135

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
No worries :smile:. That's some pretty impressive hardware. Is this a work system, or just a personal one? I ask, because if it's a work system, I'd recommend official iXsystems support. The volunteers here at the forum are pretty good, but we're no substitute for the pros.

How did you test your memory? Can you see an ECC errors through the BIOS (or iDRAC; not sure if the blade systems have that capability).

Can you reliably reproduce this problem, or is it something that just happens periodically? Does it matter if you disable NFS entirely?
 

cole-maxwell

Cadet
Joined
Mar 18, 2022
Messages
3
Haha yes, it is some pretty sweet hardware. This setup is a part of our student-run Computer Science development lab at my University. The hardware was donated to us by alumni. We are pretty resource constrained so official iXsystems support isn't really in the card for us. I am just one of the student admins in the lab.

I tested the memory through the BIOS. I have not been able to reliably reproduce this issue since doing a cold boot of the server/sled setup. I took a look through the crash log after the suggestion to attach them and I noticed that the Dump header is from device: /dev/da10p1. I believe that this is from a hot spare drive that I recently added to the system. In the past, I have noticed issues with the compute node and dive sled connection if the system does not do a cold boot. I had not done a proper cold boot since installing that drive, so my best guess is that rebooting with cold boot fixed the issue. Here is the log I am referencing:
Code:
Dump header from device: /dev/da10p1 
Architecture: amd64  Architecture Version: 4 
Dump Length: 1118208  Blocksize: 512 
Compression: none 
Dumptime: Fri Mar 18 13:49:34 2022 
Hostname: cscinas.morris.umn.edu 
Magic: FreeBSD Text Dump 
Version String: FreeBSD 12.2-RELEASE-p12 ec84e0c52a1(HEAD) TRUENAS 
Panic String: page fault
Dump Parity: 483007013 
Bounds: 4 
Dump Status: good


Screen Shot 2022-03-20 at 11.08.31 PM.png

I also believe that I solved the original nfsrv cache session: no session issue as well. This was caused by some NFS permissions issues on our end. Enabling the "NFSv3 ownership model for NFSv4" setting in NFS prevents the permissions from the TrueNAS system from pushing out to the clients.
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
Good catch on the drive sled. Definitely keep an eye on it. It might be worth getting into the connectors with a little alcohol and trying to wipe them down in case some dirt got inside.

For a memory test, I've found that Dell's onboard memory test is ok, but not great. I'd recommend a tool like MemTest86, which is both faster and more thorough.
 
Top