nfsd hangs after reboot, unless turned off in advance

Status
Not open for further replies.
Joined
Feb 17, 2015
Messages
12
If I reboot FreeNAS (9.3) without first turning off NFS, then NFS is hung after the reboot. It doesn't accept connections and can't even be turned off. (The "wait" indicator in the web gui stays on forever and nfsd becomes a zombie.)

If after attempting to turn it off, you reboot FreeNAS, then FreeNAS starts up again with NFS off. Now you can turn NFS back on and everything is OK.

If I shutdown NFS before I reboot, then turn it on again afterwards, everything is OK, although NFS clients also need to be rebooted.

I reported this as bug 7857 but it was closed as "cannot reproduce". I can still reproduce it and have to remember to turn off NFS before every update/reboot. I'd like some help trying to diagnose what the problem might be.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Hardware, version? Please read the rules. You most likely have messed up the configuration and need to fix it or start over. Also you clients are ending up with a stale fd. This doesn't mean you need to reboot them you just need to mount with the soft flag or force unmount and remount.
 
Joined
Feb 17, 2015
Messages
12
Currently FreeNAS-9.3-STABLE-201506292130, but the bug has persisted throughout 9.3. (Details are in bug 7857, as mentioned.)

Hardware is a HP Gen8 Microserver, debug info is attached.

Thanks for the info about the clients. The biggest problem is still that NFS starts up in a hung state after a reboot.
 

Attachments

  • debug-dawnstar-20150806204908.tgz
    769.4 KB · Views: 228
Joined
Feb 17, 2015
Messages
12
Further investigation:

When nfs is "stuck", attempting to stop it results in this command in the process list:

sh -c (/usr/sbin/service nfsd forcestop) 2>&1 | logger -p daemon.notice -t notifier

But the command never completes and sits there forever. If you reboot while it's there, FreeNAS starts up with NFS stopped.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Just looked over your system. Unfortunately, I don't think your hardware is compatible with FreeBSD/FreeNAS.

Look at the list.. http://h20195.www2.hp.com/v2/GetDoc...kspecs&doclang=EN_US&searchquery=&cc=us&lc=en

Windows, Linux, and VMWare are supported. FreeBSD is not. Files in the debug file you posted normally provide info on the server, but instead error messages are provided.

I've seen that chassis before while trying to help others, and I'm pretty sure the only good solution is "replace the hardware".
 
Joined
Feb 17, 2015
Messages
12
It's entirely possible that it's officially incompatible, but there are many other Gen8 MicroServer users in these forums, and this weird NFS issue is literally the only issue I'm having. Everything else has been totally problem free.

It seems odd that a really specific issue with one specific service would be a hardware issue.

Thanks for looking into my details though! I appreciate the effort.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
What always scares me with HP stuff is the way they don't separate product lines or revisions of their hardware.

I've seen two HP servers, one AMD-based and one Intel-based, that both were 'Gen(X) some-server-name'. Unless you can provide the specific model under that very wide name, then it's almost useless to argue what is or isn't supported.

This may or may not benefit you though. It does add to the confusion and makes it even harder for someone to definitively say *your* hardware is/isn't compatible because you have no clue what "the other guy" is using relative to yours. But the fact that there are the errors in the debug log for retrieving hardware info.... that makes me want to scream "not compatible".

I'll agree it is odd that NFS is the only issue you are having, but I also have to wonder what other problems exist behind the scenes that software is covering up for. How many network retransmits are having to take place? How many times are other protocols (if you are using any) having to renegotiate the protocol? Would you even have the knowledge/experience to find these clues? I won't lie, I don't go looking for problems unless I have a problem.

Intel 10Gb NICs had a problem about a year ago where only NFS would be affected by a bug, and only under *very* specific conditions. The cause was a bug in the Intel driver.

So while you might want to dismiss this because "only NFS is affected" I don't have that kind of confidence that this issue is the only issue you are having. In fact, I'd bet you're probably having several other problems (whether you know about them or not) just because of the hardware on that board. That onboard NIC is nothing we recommend.
 
D

dlavigne

Guest
Are there any NFS shares still mounted when you reboot the system?
 
Joined
Feb 17, 2015
Messages
12
There are NFS shares mounted when reboots occur, yes. They don't work after a reboot because NFS is hung after a reboot. If I remember to shut down NFS (on the server) before rebooting, everything continues to work.
 
D

dlavigne

Guest
Does the problem persist if you umount the shares first (you really should or else NFS gets unhappy)?
 

szczy7

Cadet
Joined
Aug 20, 2015
Messages
8
This is the exact same issue I am having! I thought my NFS service was completely locked until I followed the steps in your bug and got my NFS mounts back. It seems very strange but I turned off the NFS service and got the endless "wait" indicator, powered off my machine, started my machine, then started NFS from the FreeNAS GUI, rebooted my clients, and everything is happy for now atleast.

Release: 9.3-STABLE-201506292332
System:
X8DTU /w 2x Intel E5540.
32gig ECC Memory.
4x Intel PRO/1000 NICs
3ware SE-9650SE-12ML /w 12x2TB WD-RE4 drives in JBOD

I would love to figure out what is going on because I have run into this issue a few times now and each time I have had to rebuild my system.
 
Last edited:

szczy7

Cadet
Joined
Aug 20, 2015
Messages
8
Does the problem persist if you umount the shares first (you really should or else NFS gets unhappy)?

I just did a test where I unmounts and shutdown all of my clients cleanly then restarted FreeNAS with NFS enabled. When FreeNAS came back up, I started my clients and none of them were able to mount. They all timed-out. So I would say yes the problem persists for me.
 
D

dlavigne

Guest
Please create a bug report at bugs.freenas.org and post the issue number here.
 

szczy7

Cadet
Joined
Aug 20, 2015
Messages
8
FreeNAS NFS in a working state, nfsstat -m on the client reveals:

/home from nas1:/mnt/vol1/home/ Flags: rw,noatime,vers=4,rsize=131072,wsize,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=xxx.yyy.zzz.BBB,minorversion=0,local_lock=none,addr=xxx.yyy.zzz.AAA
/mnt/stuff from nas1:/mnt/vol1/stuff/ Flags: rw,noatime,vers=4,rsize=131072,wsize,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=xxx.yyy.zzz.BBB,minorversion=0,local_lock=none,addr=xxx.yyy.zzz.AAA
/mnt/pic from nas1:/mnt/vol1/pic/ Flags: rw,noatime,vers=4,rsize=131072,wsize,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=xxx.yyy.zzz.BBB,minorversion=0,local_lock=none,addr=xxx.yyy.zzz.AAA
I created a bug: Bug #11099 --> https://bugs.freenas.org/issues/11099
 
Joined
Feb 17, 2015
Messages
12
Does the problem persist if you umount the shares first (you really should or else NFS gets unhappy)?

Yes. It even happens if you never connect anything to the NFS server in the first place. All you have to do is reboot the server while NFS is turned on, and NFS is broken after a reboot.
 

zafiro17

Dabbler
Joined
Sep 14, 2014
Messages
13
Hello – Sorry to hear this bug is 'unreproducible' because I reproduce it almost daily. I'm running FreeNAS-9.3-STABLE-201506292130 on an ixsystems FreeNAS Mini (the previous generation, not the new machine; still - nice specs). It's a really annoying bug and it occasionally affected me in 9.2 I think. My symptoms are 100% described above. Reading dmesg on the NAS I don't see anything out of the ordinary (though I'm no guru), other than "nfsd: can't register svc name" and immediately after, "pid 1630 (syslog-ng), uid 0: exited on signal 6 (core dumped)"

I don't have much to add to this conversation, unfortunately, other than to point out this is definitely NOT simply the consequence of David's hardware.
 

viniciusferrao

Contributor
Joined
Mar 30, 2013
Messages
192
I've this bug too. But I don't need to reboot the machine for a second time. Just stopping and the starting the nfs service solves the problem.

Here's a snippet of the /var/log/messages file during the procedure:
Code:
Sep  7 15:49:33 storage mountd[3225]: mount request denied from 146.164.36.64 for /mnt/pool/ui.lape.if.ufrj.br
Sep  7 15:51:05 storage notifier: Stopping lockd.
Sep  7 15:51:05 storage notifier: Waiting for PIDS: 3262.
Sep  7 15:51:05 storage notifier: Stopping statd.
Sep  7 15:51:05 storage notifier: Waiting for PIDS: 3251.
Sep  7 15:51:05 storage notifier: Stopping nfsd.
Sep  7 15:51:05 storage notifier: Waiting for PIDS: 18511 18512.
Sep  7 15:51:05 storage notifier: Stopping mountd.
Sep  7 15:51:06 storage notifier: Waiting for PIDS: 3225, 3225.
Sep  7 15:51:06 storage notifier: Stopping nfsuserd.
Sep  7 15:51:06 storage notifier: Waiting for PIDS: 3188 3189 3190 3191 3193.
Sep  7 15:51:06 storage notifier: Stopping gssd.
Sep  7 15:51:06 storage notifier: Waiting for PIDS: 3195.
Sep  7 15:51:06 storage notifier: Stopping rpcbind.
Sep  7 15:51:06 storage notifier: Waiting for PIDS: 3221.
Sep  7 15:51:08 storage notifier: mountd not running? (check /var/run/mountd.pid).
Sep  7 15:51:08 storage notifier: Starting gssd.
Sep  7 15:51:08 storage notifier: Starting nfsuserd.
Sep  7 15:51:08 storage notifier: Starting rpcbind.
Sep  7 15:51:08 storage notifier: Starting mountd.
Sep  7 15:51:08 storage notifier: NFSv4 is disabled
Sep  7 15:51:08 storage notifier: Starting nfsd.
Sep  7 15:51:08 storage notifier: Starting statd.
Sep  7 15:51:08 storage nfsd: can't register svc name
Sep  7 15:51:09 storage notifier: Starting lockd.
Sep  7 15:51:13 storage mountd[19391]: mount request succeeded from 146.164.36.64 for /mnt/pool/ui.lape.if.ufrj.br
 
Status
Not open for further replies.
Top