hrana
Cadet
- Joined
- May 17, 2014
- Messages
- 4
System Specs
Over the last 3 days, I noticed that the rear 12 disks would not survive a scrub without TrueNAS having a kernel panic.
I have done the following:
1) Upgraded to U3.1. Result: No change
2) Updated the firmware on the 9405W-16e. Result: No change.
3) Swapped power supplies (2x PWS-1K28P-SQ, 1000W). The system only uses 350-400W. Result: No change.
3) Scrubbed the front 24 disks without an issue. The rear 12 disks always cause a crash on scrub.
4) Searched to hell and back on Google and forums for an answer.
I have not done but am ready to:
1) Replace the rear 12-port expander.
2) Swap out all of the cables (external, internal, and the internal-to-external connector that sites in a PCI bracket)
3) Ask for help here as I am sure it is probably something really basic that I am missing or got wrong.
All of the crash logs reference the aiodXX process.
Here is the latest crash log (and I can provide all of the others, if needed):
Does anyone have an idea?
- Motherboard make and model: Supermicro X10DRC-T4+ in SC846BE1C (SAS3) 24-bay case
- CPU make and model: 2x Intel Xeon E5-2699v4
- RAM quantity: 512GB RDIMM ECC
- Hard drives, quantity, model numbers, and RAID configuration, including boot drives: 12x WD Red 8TB in 2x 6-drive Z2 zpools in one vdev. 24x WD Red 8TB, 3x 8-drive Z2 zpools in one vdev. The bootdrive is a Supermicro SATADOM 64GB module.
- Hard disk controllers: HBA 9405W-16e to SC847BE1C (SAS3) 36-bay case (1x 24-port Supermicro SAS3 expander and 1x 12-port Supermicro SAS3 expander) acting as a JBOD through the Supermicro PTJBOB-CB3 controller. The two external SAS3 cables connect through a 4-port ext-to-int adapter. Two cables go to the 24-port expander. Two cables go from the 24-port to 12-port expander. Finally, two cables go from the 12-port expander to the ext-to-int adapter to allow for future daisy-chaining of another JBOD.
- Network cards: Intel x550-T2, X550-T4 (onboard), i350-T4
Over the last 3 days, I noticed that the rear 12 disks would not survive a scrub without TrueNAS having a kernel panic.
I have done the following:
1) Upgraded to U3.1. Result: No change
2) Updated the firmware on the 9405W-16e. Result: No change.
3) Swapped power supplies (2x PWS-1K28P-SQ, 1000W). The system only uses 350-400W. Result: No change.
3) Scrubbed the front 24 disks without an issue. The rear 12 disks always cause a crash on scrub.
4) Searched to hell and back on Google and forums for an answer.
I have not done but am ready to:
1) Replace the rear 12-port expander.
2) Swap out all of the cables (external, internal, and the internal-to-external connector that sites in a PCI bracket)
3) Ask for help here as I am sure it is probably something really basic that I am missing or got wrong.
All of the crash logs reference the aiodXX process.
Here is the latest crash log (and I can provide all of the others, if needed):
Code:
Fatal trap 12: page fault while in kernel mode
cpuid = 11; apic id = 16
fault virtual address = 0x3454f8f9
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff827c62ce
stack pointer = 0x0:0xfffffe022f4227e8
frame pointer = 0x0:0xfffffe022f422820
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 34994 (aiod26)
trap number = 12
panic: page fault
cpuid = 11
time = 1621361081
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe022f4224a0
vpanic() at vpanic+0x17b/frame 0xfffffe022f4224f0
panic() at panic+0x43/frame 0xfffffe022f422550
trap_fatal() at trap_fatal+0x391/frame 0xfffffe022f4225b0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe022f422600
trap() at trap+0x286/frame 0xfffffe022f422710
calltrap() at calltrap+0x8/frame 0xfffffe022f422710
--- trap 0xc, rip = 0xffffffff827c62ce, rsp = 0xfffffe022f4227e8, rbp = 0xfffffe022f422820 ---
avl_rotation() at avl_rotation+0x3e/frame 0xfffffe022f422820
zfs_rangelock_enter_impl() at zfs_rangelock_enter_impl+0x4e8/frame 0xfffffe022f422880
zfs_get_data() at zfs_get_data+0x15f/frame 0xfffffe022f422910
zil_commit_impl() at zil_commit_impl+0xe11/frame 0xfffffe022f422a70
zfs_fsync() at zfs_fsync+0xc1/frame 0xfffffe022f422ab0
VOP_FSYNC_APV() at VOP_FSYNC_APV+0x7b/frame 0xfffffe022f422ae0
aio_process_sync() at aio_process_sync+0x121/frame 0xfffffe022f422b40
aio_daemon() at aio_daemon+0x227/frame 0xfffffe022f422bb0
fork_exit() at fork_exit+0x7e/frame 0xfffffe022f422bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe022f422bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panicDoes anyone have an idea?