ZFS Lock Order Reversal making system unresponsive

Tom Johnson

Cadet
Joined
Jan 3, 2016
Messages
6
TrueNAS 12.2 - release -p2
MB: Supermicro MNL-H8DM8E-2
CPU: 2@Six-Core AMD Opteron(tm) Processor 2425 HE (2110.87-MHz K8-class CPU)
64 GB Memory

For about a year I have been experiencing system freezes about once a day. I have done 24 hour stress tests on my cpus and memory. Both passed without an error. Enabling Witness, I would often get lock order reversal messages just before the freeze - always dealing with OpenZFS. I moved 20TB of the server, eliminated all ZFS information in the first and last sector of all connected hard drives and installed a clean new version of TrueNAS 12.2.

On the initial run, I enabled debug and kernel logging and then rebooted. In the messages file I have the following Lock Order Reversal:


Code:
Feb  6 09:31:13 truenas lock order reversal:
Feb  6 09:31:13 truenas 1st 0xfffff80005406228 db->db_rwlock (db->db_rwlock) @ /wrkdirs/usr/ports/sysutils/openzfs-kmod/work/zfs-7ec2d8b9c/module/zfs/dbuf.c:1256
Feb  6 09:31:13 truenas 2nd 0xfffff8085519d938 dn->dn_mtx (dn->dn_mtx) @ /wrkdirs/usr/ports/sysutils/openzfs-kmod/work/zfs-7ec2d8b9c/module/zfs/dnode.c:2303
Feb  6 09:31:13 truenas stack backtrace:
Feb  6 09:31:13 truenas #0 0xffffffff80b49d41 at witness_debugger+0x71
Feb  6 09:31:13 truenas #1 0xffffffff80ae9e07 at _sx_xlock+0x67
Feb  6 09:31:13 truenas #2 0xffffffff82940608 at dnode_block_freed+0xb8
Feb  6 09:31:13 truenas #3 0xffffffff8290f31a at dbuf_read_impl+0x3ca
Feb  6 09:31:13 truenas #4 0xffffffff82905862 at dbuf_read_+0x622
Feb  6 09:31:13 truenas #5 0xffffffff82940fa8 at dnode_hold_impl+0x2e8
Feb  6 09:31:13 truenas #6 0xffffffff8291cc78 at dmu_buf_hold_noread+0x28
Feb  6 09:31:13 truenas #7 0xffffffff8291ce7c at dmu_buf_hold+0x1c
Feb  6 09:31:13 truenas #8 0xffffffff82a5a8ce at zap_lockdir+0x2e
Feb  6 09:31:13 truenas #9 0xffffffff82a5c06c at zap_lookup_norm+0x3c
Feb  6 09:31:13 truenas #10 0xffffffff82a5c021 at zap_lookup+0x11
Feb  6 09:31:13 truenas #11 0xffffffff829b299a at spa_ld_trusted_config+0x4a
Feb  6 09:31:13 truenas #12 0xffffffff829b1393 at spa_ld_mos_with_trusted_config+0x33
Feb  6 09:31:13 truenas #13 0xffffffff829af55c at spa_load_impl+0x7c
Feb  6 09:31:13 truenas #14 0xffffffff829a7bb3 at spa_load+0x53
Feb  6 09:31:13 truenas #15 0xffffffff829a7315 at spa_load_best+0x65
Feb  6 09:31:13 truenas #16 0xffffffff829a3521 at spa_open_common+0x131
Feb  6 09:31:13 truenas #17 0xffffffff8296dfc1 at dsl_pool_hold+0x21


Feb  6 09:31:13 truenas lock order reversal:
Feb  6 09:31:13 truenas 1st 0xfffff80004b67408 db->db_mtx (db->db_mtx) @ /wrkdirs/usr/ports/sysutils/openzfs-kmod/work/zfs-7ec2d8b9c/module/zfs/dbuf.c:2819
Feb  6 09:31:13 truenas 2nd 0xfffff80006832938 dn->dn_mtx (dn->dn_mtx) @ /wrkdirs/usr/ports/sysutils/openzfs-kmod/work/zfs-7ec2d8b9c/module/zfs/dbuf.c:2847
Feb  6 09:31:13 truenas stack backtrace:
Feb  6 09:31:13 truenas #0 0xffffffff80b49d41 at witness_debugger+0x71
Feb  6 09:31:13 truenas #1 0xffffffff80ae9e07 at _sx_xlock+0x67
Feb  6 09:31:13 truenas #2 0xffffffff8291106f at dbuf_dirty_compute_state+0x11f
Feb  6 09:31:13 truenas #3 0xffffffff8290734f at dbuf_dirty+0xdf
Feb  6 09:31:13 truenas #4 0xffffffff8294d126 at dsl_dataset_sync+0x56
Feb  6 09:31:13 truenas #5 0xffffffff8296c69a at dsl_pool_sync+0xca
Feb  6 09:31:13 truenas #6 0xffffffff829acea4 at spa_sync+0x1014
Feb  6 09:31:13 truenas #7 0xffffffff829c51e4 at txg_sync_thread+0x484
Feb  6 09:31:13 truenas #8 0xffffffff80aa08d0 at fork_exit+0x80
Feb  6 09:31:13 truenas #9 0xffffffff80fcffce at fork_trampoline+0xe
Feb  6 09:31:13 truenas lock order reversal:
Feb  6 09:31:13 truenas lock[1923]: Last message 'order reversal:' repeated 2 times, suppressed by syslog-ng on truenas.local

Feb  6 09:31:13 truenas 1st 0xfffff80820277400 mls->mls_lock (mls->mls_lock) @ /wrkdirs/usr/ports/sysutils/openzfs-kmod/work/zfs-7ec2d8b9c/module/zfslock order reversal:
Feb  6 09:31:13 truenas 1st 0xfffff80820277500 mls->mls_lock (mls->mls_lock) @ /wrkdirs/usr/ports/sysutils/openzfs-kmod/work/zfs-7ec2d8b9c/module/zfs 1st 0xfffff80820277540 mls->mls_lock (mls->mls_lock) @ /wrkdirs/usr/ports/sysutils/openzfs-kmod/work/zfs-7ec2d8b9c/module/zfs/multilist.c:288
Feb  6 09:31:13 truenas 2nd 0xfffff80004c42e10 buf->b_evict_lock (buf->b_evict_lock) @ /wrkdirs/usr/ports/sysutils/openzfs-kmod/work/zfs-7ec2d8b9c/mo 1st 0xfffff80820277440 mls->mls_lock (mls->mls_lock) @ /wrkdirs/usr/ports/sysutils/openzfs-kmod/work/zfs-7ec2d8b9c/module/zfsdule/zfs/arc.c:6682


These repeat over and over during the bootup process. Clearly there is a OpenZFS issue going on.

I will once again be installing my Pools and then transfering back the 20TB of data. I expect it to start freezing up again, but will report back if it doesn't.

Any ideas what might be going on?
 

Tom Johnson

Cadet
Joined
Jan 3, 2016
Messages
6
Well, I can confirm this is a FREENAS (TRUENAS) problem and not my computer. I put on OpenMediaVault and have been transferring back the data for 3 days without a hiccup. Love Freenas, but it's gotta work.
 
Top