jasonsansone
Explorer
- Joined
- Jul 18, 2019
- Messages
- 79
My FreeNAS build has been rock solid since being built months ago... until last night. Something caused the NAS to panic and crash. I woke up to it in a constant loop of boot, panic, reboot, repeat. There wasn't a power loss as the system is on redundant UPS's and my home didn't lose power. After the crash my main pool can't be imported to a fresh config install nor will it allow the original config to boot.
System Specs:
Chassis: SuperMicro CSE-864
Motherboard: SuperMicro X9DRi-F
RAM: 8 x 16GB 1866mhz ECC (128GB total)
NIC: NC560SFP+ for 10Gbe and Intel i350 for 1Gbe
Hard Drives: Shucked WD white label 5400 rpm 10TB
HBA: LSI SAS 9207-8i
CPU: 2x Intel E5-2697 v2
OS: FreeNAS 11.3-U1
Boot Pool: Mirrored Samsung SSD's
L2ARC: None
SLOG: None
The pool in question is 2 vdevs of six drives in RAIDZ2 (12 drives total, 120TB raw capacity). The pool was under heavy IOP load when it crashed, but I don't care if there is any data loss related to those in flight writes. The storage is for media and multiple transcodes were running when the system went down. Those files will be corrupted and will need to be rerun under any condition. My primary concern is recovery of the pool.
It's also important to note that I use a replication task to clone a single NVMe drive to the main pool each night. All jails and VM's are on the NVMe, so I back it up nightly to the main pool. The system crashed during the replication task. The main pool that can not import had a zfs receive task in progress.
The pool can only be mounted read-only using "zpool import -o readonly=on -fF -R /mnt home-main". Read only avoids any kernel panics but the console did output "freenas savecore: /dev/ada0p3: Operation not permitted". Using the same command without "-o readonly=on" or booting normally results in the following kernel panic backtrace:
There does not appear to be any physical or mechanical failures. All drives show as online and pass SMART testing. Everything was burned in before originally being deployed.
System Specs:
Chassis: SuperMicro CSE-864
Motherboard: SuperMicro X9DRi-F
RAM: 8 x 16GB 1866mhz ECC (128GB total)
NIC: NC560SFP+ for 10Gbe and Intel i350 for 1Gbe
Hard Drives: Shucked WD white label 5400 rpm 10TB
HBA: LSI SAS 9207-8i
CPU: 2x Intel E5-2697 v2
OS: FreeNAS 11.3-U1
Boot Pool: Mirrored Samsung SSD's
L2ARC: None
SLOG: None
The pool in question is 2 vdevs of six drives in RAIDZ2 (12 drives total, 120TB raw capacity). The pool was under heavy IOP load when it crashed, but I don't care if there is any data loss related to those in flight writes. The storage is for media and multiple transcodes were running when the system went down. Those files will be corrupted and will need to be rerun under any condition. My primary concern is recovery of the pool.
It's also important to note that I use a replication task to clone a single NVMe drive to the main pool each night. All jails and VM's are on the NVMe, so I back it up nightly to the main pool. The system crashed during the replication task. The main pool that can not import had a zfs receive task in progress.
The pool can only be mounted read-only using "zpool import -o readonly=on -fF -R /mnt home-main". Read only avoids any kernel panics but the console did output "freenas savecore: /dev/ada0p3: Operation not permitted". Using the same command without "-o readonly=on" or booting normally results in the following kernel panic backtrace:
Code:
panic: Solaris(panic): blkptr at 0xfffffe0036f5d580 has invalid TYPE 101 cpuid = 3 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe202d7c9960 vpanic() at vpanic+0x17e/frame 0xfffffe202d7c99c0 panic() at panic+0x43/frame 0xfffffe202d7c9a20 vcmn_err() at vcmn_err+0xcf/frame 0xfffffe202d7c9b50 zfs_panic_recover() at zfs_panic_recover+0x5a/frame 0xfffffe202d7c9bb0 zfs_blkptr_verify() at zfs_blkptr_verify+0x53/frame 0xfffffe202d7c9bf0 zio_read() at zio_read+0x2c/frame 0xfffffe202d7c9c30 arc_read() at arc_read+0x754/frame 0xfffffe202d7c9ce0 traverse_prefetch_metadata() at traverse_prefetch_metadata+0xbd/frame 0xfffffe202d7c9d20 traverse_visitbp() at traverse_visitbp+0x9dc/frame 0xfffffe202d7c9de0 traverse_visitbp() at traverse_visitbp+0x430/frame 0xfffffe202d7c9ea0 traverse_visitbp() at traverse_visitbp+0x430/frame 0xfffffe202d7c9f60 traverse_visitbp() at traverse_visitbp+0x430/frame 0xfffffe202d7ca020 traverse_visitbp() at traverse_visitbp+0x430/frame 0xfffffe202d7ca0e0 traverse_visitbp() at traverse_visitbp+0x430/frame 0xfffffe202d7ca1a0 traverse_dnode() at traverse_dnode+0xd3/frame 0xfffffe202d7ca210 traverse_visitbp() at traverse_visitbp+0x703/frame 0xfffffe202d7ca2d0 traverse_impl() at traverse_impl+0x317/frame 0xfffffe202d7ca3f0 traverse_dataset_destroyed() at traverse_dataset_destroyed+0x2b/frame 0xfffffe202d7ca420 bptree_iterate() at bptree_iterate+0x15f/frame 0xfffffe202d7ca570 dsl_scan_sync() at dsl_scan_sync+0x43a/frame 0xfffffe202d7ca770 spa_sync() at spa_sync+0xb67/frame 0xfffffe202d7ca9a0 txg_sync_thread() at txg_sync_thread+0x238/frame 0xfffffe202d7caa70 fork_exit() at fork_exit+0x83/frame 0xfffffe202d7caab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe202d7caab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic
There does not appear to be any physical or mechanical failures. All drives show as online and pass SMART testing. Everything was burned in before originally being deployed.
Code:
root@freenas[~]# zpool import pool: home-main id: 8732520593021902914 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: home-main ONLINE raidz2-0 ONLINE gptid/14b5f8a8-0f34-11ea-b892-002590e49b70 ONLINE gptid/22f53628-0f34-11ea-b892-002590e49b70 ONLINE gptid/328e501f-0f34-11ea-b892-002590e49b70 ONLINE gptid/4168fa93-0f34-11ea-b892-002590e49b70 ONLINE gptid/50d48acd-0f34-11ea-b892-002590e49b70 ONLINE gptid/5b342562-0f34-11ea-b892-002590e49b70 ONLINE raidz2-1 ONLINE gptid/439e60f3-1223-11ea-b1fa-002590e49b70 ONLINE gptid/446b5b24-1223-11ea-b1fa-002590e49b70 ONLINE gptid/453c49d6-1223-11ea-b1fa-002590e49b70 ONLINE gptid/461a4f2a-1223-11ea-b1fa-002590e49b70 ONLINE gptid/46df8761-1223-11ea-b1fa-002590e49b70 ONLINE gptid/47a44bf9-1223-11ea-b1fa-002590e49b70 ONLINE
Attachments
Last edited: