norbs
Explorer
- Joined
- Mar 26, 2013
- Messages
- 91
Had the same pool for about 4 years. A few weeks ago it started crashing freenas into a "db>" prompt.
My hardware:
My hardware was a i7-3770 with 32gb of non-ecc ram, it was all on esxi a VM that was using VT-D to a LSI 9211-8i card that was in IT mode. The VM had 2 vcpu's and about 20gb or reserved memory assigned. I have since switch to a xeon e3 1231 v3 with 32gb of ecc ram on the a recommended super micro board. I'm still running freenas as a VM.
I tried quite a few trouble shooting thing i've found all over this site with no luck.
I have also tried importing the pool while having one disk from the array unplugged (tried all 4 pool members without any luck)
I ended up running a zdb -e -bcsvL RAIDZ with the following results:
zpool import:
gpart status:
I was not running ECC RAM, I am now... I'm just posting here as a last attempt at rescuing my data before I recreate a fresh pool.
EDIT: I was pretty close to 90% capacity when all this happened and I believe I do have snapshots configured. I'm wondering if this had more to do with it than anything. Can anyone chime in on this?
EDIT2:
Some emails I did receive before it just stopped mounting:
EDIT 3:
Sample of what a crash looks like:
My hardware:
My hardware was a i7-3770 with 32gb of non-ecc ram, it was all on esxi a VM that was using VT-D to a LSI 9211-8i card that was in IT mode. The VM had 2 vcpu's and about 20gb or reserved memory assigned. I have since switch to a xeon e3 1231 v3 with 32gb of ecc ram on the a recommended super micro board. I'm still running freenas as a VM.
I tried quite a few trouble shooting thing i've found all over this site with no luck.
I have also tried importing the pool while having one disk from the array unplugged (tried all 4 pool members without any luck)
I ended up running a zdb -e -bcsvL RAIDZ with the following results:
Code:
Assertion failed: 0 == dmu_bonus_hold(os, object, dl, &dl->dl_dbuf) (0x0 == 0x2, file /fusion/jkh/921/freebas.FreeBSD/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c, line 101. Abort (core dumpted)
zpool import:
Code:
pool: RAIDZ id: 16802863673492970021 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: RAIDZ ONLINE raidz1-0 ONLINE gptid/c96a7313-705a-11e3-9480-000c29717331 ONLINE gptid/ca4e8476-705a-11e3-9480-000c29717331 ONLINE gptid/cb226ccf-705a-11e3-9480-000c29717331 ONLINE gptid/cbf24925-705a-11e3-9480-000c29717331 ONLINE
gpart status:
Code:
Name Status Components da0s1 OK da0 da0s2 OK da0 da0s3 OK da0 da0s4 OK da0 da0s1a OK da0s1 da0s2a OK da0s2 da1p1 OK da1 da1p2 OK da1 da2p1 OK da2 da2p2 OK da2 da3p1 OK da3 da3p2 OK da3 da4p1 OK da4 da4p2 OK da4 da5p1 OK da5 da5p2 OK da5 da6p1 OK da6 da6p2 OK da6
I was not running ECC RAM, I am now... I'm just posting here as a last attempt at rescuing my data before I recreate a fresh pool.
EDIT: I was pretty close to 90% capacity when all this happened and I believe I do have snapshots configured. I'm wondering if this had more to do with it than anything. Can anyone chime in on this?
EDIT2:
Some emails I did receive before it just stopped mounting:
nas.local kernel log messages:
> panic: solaris assert: 0 == dmu_bonus_hold(os, object, dl, &dl->dl_dbuf) (0x0 == 0x2), file: /fusion/jkh/9.2.1/freenas/FreeBSD/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c, line: 101
> cpuid = 1
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffff85976ed170
> kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffff85976ed230
> panic() at panic+0x1ce/frame 0xffffff85976ed330
> assfail3() at assfail3+0x29/frame 0xffffff85976ed350
> dsl_deadlist_open() at dsl_deadlist_open+0xd7/frame 0xffffff85976ed3c0
> dsl_dataset_hold_obj() at dsl_dataset_hold_obj+0x20b/frame 0xffffff85976ed480
> dsl_dataset_stats() at dsl_dataset_stats+0x23e/frame 0xffffff85976ed770
> dmu_objset_stats() at dmu_objset_stats+0x1a/frame 0xffffff85976ed790
> zfs_ioc_objset_stats_impl() at zfs_ioc_objset_stats_impl+0x63/frame 0xffffff85976ed7d0
> zfs_ioc_snapshot_list_next() at zfs_ioc_snapshot_list_next+0x156/frame 0xffffff85976ed810
> zfsdev_ioctl() at zfsdev_ioctl+0x58d/frame 0xffffff85976ed8b0
> devfs_ioctl_f() at devfs_ioctl_f+0x7b/frame 0xffffff85976ed920
> kern_ioctl() at kern_ioctl+0x106/frame 0xffffff85976ed970
> sys_ioctl() at sys_ioctl+0xfd/frame 0xffffff85976ed9d0
> amd64_syscall() at amd64_syscall+0x5ea/frame 0xffffff85976edaf0
> Xfast_syscall() at Xfast_syscall+0xf7/frame 0xffffff85976edaf0
> --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8019b48fc, rsp = 0x7fffffffad38, rbp = 0x7fffffffae50 ---
> KDB: enter: panic
> Textdump complete.
> cpu_reset: Restarting BSP
> cpu_reset_proxy: Stopped CPU 1
> ugen0.2: <VMware> at usbus0
> uhid0: <VMware> on usbus0
> ums0: <VMware> on usbus0
> ums0: 16 buttons and [XYZT] coordinates ID=0
> Root mount waiting for: usbus1 usbus0
> uhub1: 6 ports with 6 removable, self powered
> ugen0.3: <vendor 0x0e0f> at usbus0
> uhub2: <VMware Virtual USB Hub> on usbus0
> Root mount waiting for: usbus0
> uhub2: 7 ports with 7 removable, self powered
> Root mount waiting for: usbus0
> ugen0.4: <CP1000PFCLCD> at usbus0
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffff85976f2170
> kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffff85976f2230
> panic() at panic+0x1ce/frame 0xffffff85976f2330
> assfail3() at assfail3+0x29/frame 0xffffff85976f2350
> dsl_deadlist_open() at dsl_deadlist_open+0xd7/frame 0xffffff85976f23c0
> dsl_dataset_hold_obj() at dsl_dataset_hold_obj+0x20b/frame 0xffffff85976f2480
> dsl_dataset_stats() at dsl_dataset_stats+0x23e/frame 0xffffff85976f2770
> dmu_objset_stats() at dmu_objset_stats+0x1a/frame 0xffffff85976f2790
> zfs_ioc_objset_stats_impl() at zfs_ioc_objset_stats_impl+0x63/frame 0xffffff85976f27d0
> zfs_ioc_snapshot_list_next() at zfs_ioc_snapshot_list_next+0x156/frame 0xffffff85976f2810
> zfsdev_ioctl() at zfsdev_ioctl+0x58d/frame 0xffffff85976f28b0
> devfs_ioctl_f() at devfs_ioctl_f+0x7b/frame 0xffffff85976f2920
> kern_ioctl() at kern_ioctl+0x106/frame 0xffffff85976f2970
> sys_ioctl() at sys_ioctl+0xfd/frame 0xffffff85976f29d0
> amd64_syscall() at amd64_syscall+0x5ea/frame 0xffffff85976f2af0
> Xfast_syscall() at Xfast_syscall+0xf7/frame 0xffffff85976f2af0
> --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8019b48fc, rsp = 0x7fffffffad38, rbp = 0x7fffffffae50 ---
> panic: solaris assert: 0 == dmu_bonus_hold(os, object, dl, &dl->dl_dbuf) (0x0 == 0x2), file: /fusion/jkh/9.2.1/freenas/FreeBSD/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c, line: 101
> cpuid = 1
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffff859762a170
> kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffff859762a230
> panic() at panic+0x1ce/frame 0xffffff859762a330
> assfail3() at assfail3+0x29/frame 0xffffff859762a350
> dsl_deadlist_open() at dsl_deadlist_open+0xd7/frame 0xffffff859762a3c0
> dsl_dataset_hold_obj() at dsl_dataset_hold_obj+0x20b/frame 0xffffff859762a480
> dsl_dataset_stats() at dsl_dataset_stats+0x23e/frame 0xffffff859762a770
> dmu_objset_stats() at dmu_objset_stats+0x1a/frame 0xffffff859762a790
> zfs_ioc_objset_stats_impl() at zfs_ioc_objset_stats_impl+0x63/frame 0xffffff859762a7d0
> zfs_ioc_snapshot_list_next() at zfs_ioc_snapshot_list_next+0x156/frame 0xffffff859762a810
> zfsdev_ioctl() at zfsdev_ioctl+0x58d/frame 0xffffff859762a8b0
> devfs_ioctl_f() at devfs_ioctl_f+0x7b/frame 0xffffff859762a920
> kern_ioctl() at kern_ioctl+0x106/frame 0xffffff859762a970
> sys_ioctl() at sys_ioctl+0xfd/frame 0xffffff859762a9d0
> amd64_syscall() at amd64_syscall+0x5ea/frame 0xffffff859762aaf0
> Xfast_syscall() at Xfast_syscall+0xf7/frame 0xffffff859762aaf0
> --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8019b48fc, rsp = 0x7fffffffad38, rbp = 0x7fffffffae50 ---
> KDB: enter: panic
> Textdump complete.
> cpu_reset: Restarting BSP
> cpu_reset_proxy: Stopped CPU 1
> da5 at mps0 bus 0 scbus4 target 3 lun 0
> da5: <ATA ST2000DL003-9VT1 CC32> Fixed Direct Access SCSI-6 device
> da5: Serial Number 5YD3KXS1
> da5: 600.000MB/s transfers
> da5: Command Queueing enabled
> da5: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
> da5: quirks=0x8<4K>
> WARNING: /data was not properly dismounted
-- End of security output --
> pid 25259 (mv), uid 0 inumber 8 on /data: filesystem full
> pid 27624 (mv), uid 0 inumber 9 on /data: filesystem full
> pid 29950 (mv), uid 0 inumber 9 on /data: filesystem full
> pid 32275 (mv), uid 0 inumber 9 on /data: filesystem full
> pid 35783 (mv), uid 0 inumber 8 on /data: filesystem full
> pid 39280 (mv), uid 0 inumber 8 on /data: filesystem full
> pid 41698 (mv), uid 0 inumber 9 on /data: filesystem full
> pid 45221 (mv), uid 0 inumber 8 on /data: filesystem full
> pid 51193 (mv), uid 0 inumber 8 on /data: filesystem full
EDIT 3:
Sample of what a crash looks like:
Last edited: