FreeNAS 11.1u2 - zfs + nfs hang requiring reboot

Status
Not open for further replies.

segfault

Cadet
Joined
Sep 5, 2017
Messages
6
Every few weeks since upgrading to 11.1 I've encountered random zfs+nfs hangs. These appear to be due to locks. When I say hang I mean the server no longer responds all clients (freebsd, linux, esxi) enter iowait until the box is rebooted.

--

PID TID COMM TDNAME KSTACK
2753 102815 nfsd nfsd: master mi_switch sleepq_wait _sleep svc_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc amd64_syscall Xfast_syscall
2753 102883 nfsd nfsd: service mi_switch sleepq_wait sleeplk __lockmgr_args lockmgr_lock_fast_path VOP_LOCK1_APV _vn_lock zfs_fhtovp nfsvno_fhtovp nfsd_fhtovp nfsrvd_dorpc nfssvc_pro
2753 102884 nfsd nfsd: service mi_switch sleepq_wait sleeplk __lockmgr_args lockmgr_lock_fast_path VOP_LOCK1_APV _vn_lock vget cache_lookup vfs_cache_lookup VOP_LOOKUP_APV lookup nfs
2753 102885 nfsd nfsd: service mi_switch sleepq_wait sleeplk __lockmgr_args lockmgr_lock_fast_path VOP_LOCK1_APV _vn_lock zfs_fhtovp nfsvno_fhtovp nfsd_fhtovp nfsrvd_dorpc nfssvc_pro
2753 102886 nfsd nfsd: service mi_switch sleepq_wait sleeplk __lockmgr_args lockmgr_lock_fast_path VOP_LOCK1_APV _vn_lock zfs_fhtovp nfsvno_fhtovp nfsd_fhtovp nfsrvd_dorpc
 
D

dlavigne

Guest
Hardware specs?

Also, anything in /var/log/messages around the time of the hang?
 

segfault

Cadet
Joined
Sep 5, 2017
Messages
6
Hardware specs?

Dell R710
2x Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
76gb 10600R ecc ram
2x LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
ConnectX-3 Pro

dell compellent hb-1235

Three raidz2 vdevs.


Also, anything in /var/log/messages around the time of the hang?
There is nothing logged. I've got some procstat output. I wish I had some dtrace scripts to count the locks involved...
 

segfault

Cadet
Joined
Sep 5, 2017
Messages
6
The system itself remains responsive to both web and ssh traffic but refuses to respond to any nfs requests. Attempting to restart nfsd or kill it results in a hung console.
 

segfault

Cadet
Joined
Sep 5, 2017
Messages
6
Here's a bit of the /var/log/messages log file from right before I rebooted it.


Mar 30 09:49:50 freenas mountd[2748]: export request succeeded from 192.168.1.36
Mar 30 09:49:52 freenas mountd[2748]: export request succeeded from 192.168.1.36
Mar 30 09:50:00 freenas mountd[2748]: export request succeeded from 192.168.1.36
Mar 30 09:50:02 freenas mountd[2748]: export request succeeded from 192.168.1.36
Mar 30 09:50:10 freenas mountd[2748]: export request succeeded from 192.168.1.36
Mar 30 09:50:34 freenas daemon[4250]: 2018/03/30 09:50:34 [WARN] agent: Check 'freenas_health' is now warning
Mar 30 09:52:34 freenas daemon[4250]: 2018/03/30 09:52:34 [WARN] agent: Check 'freenas_health' is now warning
Mar 30 09:54:06 freenas uwsgi: GSSAPI Error: Miscellaneous failure (see text)MORDOR (unable to reach any KDC in realm MORDOR)
Mar 30 09:54:34 freenas daemon[4250]: 2018/03/30 09:54:34 [WARN] agent: Check 'freenas_health' is now warning
Mar 30 09:56:35 freenas daemon[4250]: 2018/03/30 09:56:35 [WARN] agent: Check 'freenas_health' is now warning
Mar 30 09:58:35 freenas daemon[4250]: 2018/03/30 09:58:35 [WARN] agent: Check 'freenas_health' is now warning
Mar 30 10:00:35 freenas daemon[4250]: 2018/03/30 10:00:35 [WARN] agent: Check 'freenas_health' is now warning
Mar 30 10:02:36 freenas daemon[4250]: 2018/03/30 10:02:36 [WARN] agent: Check 'freenas_health' is now warning
Mar 30 10:04:36 freenas daemon[4250]: 2018/03/30 10:04:36 [WARN] agent: Check 'freenas_health' is now warning
Mar 30 10:06:16 freenas shutdown: reboot by segfault:
Mar 30 10:06:17 freenas devd: notify_clients: send() failed; dropping unresponsive client
Mar 30 10:06:19 freenas kernel: epair0a: link state changed to DOWN
Mar 30 10:06:19 freenas kernel: epair0a: link state changed to DOWN
Mar 30 10:06:19 freenas kernel: epair0b: link state changed to DOWN
Mar 30 10:06:19 freenas kernel: epair0b: link state changed to DOWN
 
D

dlavigne

Guest
Try updating to U4 to see if that resolves it. If it doesn't, it's worth reporting at bugs.freenas.org so a dev can help pinpoint the issue. Post the issue number here if you create one.
 

segfault

Cadet
Joined
Sep 5, 2017
Messages
6
Try updating to U4 to see if that resolves it. If it doesn't, it's worth reporting at bugs.freenas.org so a dev can help pinpoint the issue. Post the issue number here if you create one.
Ok, I'll file a bug. I don't see anything in the U4 changelog that touches where these locks are.
 
D

dlavigne

Guest
Ok, I'll file a bug. I don't see anything in the U4 changelog that touches where these locks are.

The dev will want to use U4 as the base. There were some performance/resource fixes between U2 and U4.
 
Status
Not open for further replies.
Top