Reboot when deleting/transferring large files

Status
Not open for further replies.

k_day

Cadet
Joined
Jan 31, 2015
Messages
5
Whenever I try to delete/transfer large files, I appear to get a kernal panic. Huge amounts of text dumps to the screen too fast for me to read what is going on, and I can't find this logged anywhere after the system reboots.

My current hardware is:

-Supermicro MBD-A1Sri-2758F-O Mini ITX Server (8 core Intel Atom processor C2758)
-1x8GB Kingston ECC RAM

A few other things of note - This crash is easily reproducible, and was on old hardware as well. Previously I was my mobo only supported 4GB of RAM so I suspected this had something to do with it and upgraded my board, RAM, and power supply. The problem remains though.

I tried letting freeNAS autotune the system, but this did not help.

The easiest way for me to reproduce this is do try to delete a large directory that contains some old backups (crash happens over both SMB and ssh).

My pool appears healthy. Scrubs aren't reporting errors (with the exception of a few newer disks having a larger native blocksize. This has never caused me any problems in the past though):

pool: zfs

state: ONLINE

status: One or more devices are configured to use a non-native block size.

Expect reduced performance.

action: Replace affected devices with devices that support the configured block size, or migrate data to a properly configured pool.

scan: scrub repaired 0 in 2h12m with 0 errors on Sat Jan 17 22:59:30 2015
config:

NAME STATE READ WRITE CKSUM

zfs ONLINE 0 0 0

raidz1-0 ONLINE 0 0 0

ada1p2 ONLINE 0 0 0

gptid/a3e05374-9277-11e1-987d-00259063b368 ONLINE 0 0 0

gptid/a438b050-9277-11e1-987d-00259063b368 ONLINE 0 0 0

gptid/a00b7c26-5c76-11e4-bc07-00259063b368 ONLINE 0 0 0 block size: 512B configured, 4096B native

gptid/a6d64601-29c2-11e4-b0b1-00259063b368 ONLINE 0 0 0 block size: 512B configured, 4096B native

errors: No known data errors

Anyone have any suggests for how I may go about fixing/debugging this? Thank you!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Damage that has already been done to the pool can cause panicks.

A scrub is mostly intended to check the integrity of the data on the disk, and not necessarily the ZFS structure itself. It is possible for a scrub to succeed and for the pool to still be corrupt. A scrub is not looking closely at the structure of the ZFS pool, and it isn't clear how it could repair it even if it did.

If it is indeed the pool that is corrupt, then copy whatever data you can off the pool, nuke the pool, create a new pool, move the data back, and move on with life. You might be able to "fix" whatever is causing the current problem, but there's no guarantee that you will find and correct all damage.
 

k_day

Cadet
Joined
Jan 31, 2015
Messages
5
Thanks jgreco. I will probably go that route and see how it goes.

In the meantime, are there any easy ways to capture the dump log or debug this to confirm? I've poked around in /var/log quite a bit and haven't really found anything useful.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Think it's in the handbook. I'm on mobile so I'll invite you to look.
 

k_day

Cadet
Joined
Jan 31, 2015
Messages
5
Looks like something is in fact corrupt. Tracked it down to a single 9.9M file:
/mnt/zfs/Backups/Viper/Cobra.sparsebundle/bands/1c4c

Hoping I can find a way to remove this without a kernel panic so I don't have to recreate the entire pool...
 

k_day

Cadet
Joined
Jan 31, 2015
Messages
5
I was able empty the contents of the file by doing

"> 1c4c"

So now it is empty. rm still causing crash.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Guessing the directory itself is somehow screwed up.
 

k_day

Cadet
Joined
Jan 31, 2015
Messages
5
Dump files were indicating a problem with the metadata on that file (something about a link count on a vnode being off). I decided against fumbling my way through trying to repair that metadata by hand. I was able to back up all the files I care about and recreate the pool. Back in business now.

Thanks for the help!
 
Status
Not open for further replies.
Top