Kernel Panic on ZFS import

Status
Not open for further replies.

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526

mtucker502

Dabbler
Joined
Aug 25, 2015
Messages
14
I didn't. ZDB finished, and after trying to import it still immediately goes into kernel panic.

I haven't had time with work and study to even take a look at it. But as I'm sitting here on a change call I can't think of anything else better to do so here it goes :)

My plan is to identify ada8 which had the CAM errors (SMART also noted 1 read error) and disconnect the drive. If that doesn't work, I may change out a few of the cables and worst case I'll throw a new HBA in there.

If that still doesn't work then I guess I'll buy a Synology.....

...Kidding!! I'll likely probe around the forums for some previous builds and buy that norco 24 bay I've always wanted.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Eh, buy a 24 bay Supermicro for less money on ebay. It'll be used but those things are beasts. That's how I got my Supermicro chassis and I love it. :)
 

mtucker502

Dabbler
Joined
Aug 25, 2015
Messages
14
The first 5-6 24 bay chassis I looked at came with motherboards that couldn't support >4TB drives. I have 10 2 TB now but if I have 12 more empty slots I can definitely see myself buying some early Christmas presents :).

I'll have to scour ebay some more. I still wish I knew what caused this in the first place. An 8x2TB raidz2 array is a lot of data to go bye bye. The important stuff is backed up but I can't afford to back it all up. :/
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Honestly, two things really make me think you did this to yourself..


1. RAM. We know from experience that if your system doesn't have 8GB of RAM, it's like a ticking time bomb. It may never go off, or it may corrupt your pool in ways that are unrecoverable.
2. The choice of SATA controllers. Those things have to pass through all your precious storage bits. If one controller goes bonkers it will potentially destroy data on multiple disks. That means you'll have not only no redundancy, but likely not enough redundancy to deal with the problem.

Unfortunately, we strongly recommend not using bargain bin controllers and Marvell stuff. What are you using?

  • HBA: Onboard & 1 Marvell Technology Group Ltd. 88SX7042 PCI-e 4-port SATA-II (rev 02)

Yeah.. that's not pretty. Not at all.

Edited: Removed an AMD reference. Confused you with another thread I had opened in another browser tab. :P
 

mtucker502

Dabbler
Joined
Aug 25, 2015
Messages
14
I'll take credit for picking the cheapeast HBA that did the job :). I cannot figure out why BSD only shows 4GB of RAM now. It did not do this previously. BIOS/Memtest/Debian show 8GB.

It still is a bit of guessing though without a definitive RCA. Too many maybes. I'd rather know for sure why this happened, even if it's a poor hardware choice on my part. Otherwise, moving forward, I may find myself in the future seeing this same issue even with appropriate hardware?

Again, don't get me wrong, I get that I'm running non-ECC memory with a flea market HBA and that it might/probably/maybe/likely/could be the cause of the kernel panic. (and a good spanking!). But I'll never know for sure.

I still think I'm seeing a kernel panic because of trying to import a 16TB raidz2 pool with only 4GB of available RAM. I just don't know enough about debugging a kernel panic to prove it.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Kernel panics are typically the result of a hardware failure.

In this case we are talking about importing a zpool. A corrupt zpool can (and often will) cause ZFS to crash if there isn't enough redundancy to cover for the corruption since ZFS itself is in the kernel. More than likely the problem is that your zpool has some kind of significant corruption, ZFS tries to mount the zpool, knows something is wrong but cannot fix it. Since it cannot fix it, it simply accepts the data it is receiving and hopes it all works out. Unfortunately, it doesn't and a kernel panic results.

This is an example of why many of us harp so much about "do it right or don't do it". ZFS is pretty awesome. Things work great when you use it appropriately. But if things go beyond the ability for ZFS to correct the issue, the problem is you are flat out screwed. You aren't partially screwed. You aren't limited to using some recovery tool you bought for $50 to recover your data. Your data is locked away, forever, since no recovery tools exist. It's a very big step from "everything is okay" to "my data is irretrievably lost forever". Once you've crossed that line, coming back is damn near impossible. :(

Even without "lots of RAM" a healthy zpool should mount, given enough time, without causing a panic. The exception to this is if you do something like use dedup, or you have so little RAM that you can't even store the primary ZFS metadata in RAM (I think it uses something like 16MB of RAM per device). The fact that you are getting a panic on import.. that's pretty much "worst case scenario" for your zpool.
 
Status
Not open for further replies.
Top