Server Crashing: not sure why

Status
Not open for further replies.

Bilco

Cadet
Joined
Dec 19, 2013
Messages
2
Hi All,

Im hoping someone might be able to give me a hand. My server started crashing, I believe this is due to a bad disk. It seems the server goes into a kernel panic during a normal boot. If I pull the drive I think is bad, the server will boot up but then I can not mount the Volume on it. If I boot up with out the disk and place and try to mount the volume, I get another crash.

Would someone be able to help and help me shed some light on this? I've exhausted my search and not sure how to proceed anymore.

My setup:

I have 8 total which I split up into 2x4 disk raids. One volume is called Vol_1 (the problem) and the other is VOL_2.

[root@intersect] ~# cat /etc/version
FreeNAS-9.1.1-RELEASE-x64 (a752d35)

[root@intersect] ~# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
VOL_2 7.25T 5.01T 2.24T 69% 1.00x ONLINE /mnt


[root@intersect] ~# zpool status
pool: VOL_2
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: scrub in progress since Sat Dec 21 09:19:54 2013
261G scanned out of 5.01T at 70.6M/s, 19h37m to go
0 repaired, 5.09% done
config:

NAME STATE READ WRITE CKSUM
VOL_2 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/8d1f4eb1-54a6-11e3-a70b-5404a6497364 ONLINE 0 0 0
gptid/8ec1a62b-54a6-11e3-a70b-5404a6497364 ONLINE 0 0 0
gptid/9059a0dc-54a6-11e3-a70b-5404a6497364 ONLINE 0 0 0
gptid/912caa24-54a6-11e3-a70b-5404a6497364 ONLINE 0 0 0

errors: 2 data errors, use '-v' for a list


[root@intersect] ~# camcontrol devlist
<ATA WDC WD20EZRX-00D 0A80> at scbus0 target 0 lun 0 (da0,pass0)
<ATA WDC WD20EARS-00J 0A80> at scbus0 target 1 lun 0 (da1,pass1)
<ATA WDC WD20EZRX-00D 0A80> at scbus0 target 3 lun 0 (da2,pass2)
<ATA WDC WD30EZRX-00M 0A80> at scbus0 target 4 lun 0 (da3,pass3)
<ATA WDC WD30EZRX-00M 0A80> at scbus0 target 5 lun 0 (da4,pass4)
<ATA WDC WD30EZRX-00M 0A80> at scbus0 target 6 lun 0 (pass8,da8)
<ATA WDC WD30EZRX-00M 0A80> at scbus0 target 7 lun 0 (pass9,da9)
<ATA WDC WD20EFRX-68E 0A80> at scbus0 target 8 lun 0 (da5,pass5)
< > at scbus12 target 0 lun 0 (da6,pass6)
<Generic- USB3.0 CRW -1 1.00> at scbus12 target 0 lun 1 (da7,pass7)


[root@intersect] ~# glabel status
Name Status Components
gptid/8d1f4eb1-54a6-11e3-a70b-5404a6497364 N/A da0p2
gptid/8ec1a62b-54a6-11e3-a70b-5404a6497364 N/A da1p2
gptid/9059a0dc-54a6-11e3-a70b-5404a6497364 N/A da2p2
gptid/b374bcae-5bcb-11e2-b965-5404a6497364 N/A da3p2
gptid/b2ae06a0-5bcb-11e2-b965-5404a6497364 N/A da4p2
gptid/90ed4dcf-54a6-11e3-a70b-5404a6497364 N/A da5p1
gptid/912caa24-54a6-11e3-a70b-5404a6497364 N/A da5p2
ufs/FreeNASs3 N/A da7s3
ufs/FreeNASs4 N/A da7s4
ufs/FreeNASs1a N/A da7s1a
gptid/b1d4f55f-5bcb-11e2-b965-5404a6497364 N/A da8p1
gptid/b1ec9c06-5bcb-11e2-b965-5404a6497364 N/A da8p2
gptid/b110613b-5bcb-11e2-b965-5404a6497364 N/A da9p1
gptid/b1299abc-5bcb-11e2-b965-5404a6497364 N/A da9p2




[root@intersect] ~# gpart show
=> 34 3907029101 da0 GPT (1.8T)
34 94 - free - (47k)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834696 2 freebsd-zfs (1.8T)
3907029128 7 - free - (3.5k)

=> 34 3907029101 da1 GPT (1.8T)
34 94 - free - (47k)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834696 2 freebsd-zfs (1.8T)
3907029128 7 - free - (3.5k)

=> 34 3907029101 da2 GPT (1.8T)
34 94 - free - (47k)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834696 2 freebsd-zfs (1.8T)
3907029128 7 - free - (3.5k)

=> 34 5860533101 da3 GPT (2.7T)
34 94 - free - (47k)
128 4194304 1 freebsd-swap (2.0G)
4194432 5856338703 2 freebsd-zfs (2.7T)

=> 34 5860533101 da4 GPT (2.7T)
34 94 - free - (47k)
128 4194304 1 freebsd-swap (2.0G)
4194432 5856338703 2 freebsd-zfs (2.7T)

=> 34 3907029101 da5 GPT (1.8T)
34 94 - free - (47k)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834696 2 freebsd-zfs (1.8T)
3907029128 7 - free - (3.5k)

=> 63 15564737 da7 MBR (7.4G)
63 1930257 1 freebsd [active] (942M)
1930320 63 - free - (31k)
1930383 1930257 2 freebsd (942M)
3860640 3024 3 freebsd (1.5M)
3863664 41328 4 freebsd (20M)
3904992 11659808 - free - (5.6G)

=> 0 1930257 da7s1 BSD (942M)
0 16 - free - (8.0k)
16 1930241 1 !0 (942M)

=> 34 5860533101 da8 GPT (2.7T)
34 94 - free - (47k)
128 4194304 1 freebsd-swap (2.0G)
4194432 5856338703 2 freebsd-zfs (2.7T)

=> 34 5860533101 da9 GPT (2.7T)
34 94 - free - (47k)
128 4194304 1 freebsd-swap (2.0G)
4194432 5856338703 2 freebsd-zfs (2.7T)



[root@intersect] ~# zpool import -f
pool: Vol_1
id: 162564732152805204
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

Vol_1 ONLINE
raidz1-0 ONLINE
gptid/b1299abc-5bcb-11e2-b965-5404a6497364 ONLINE
gptid/b1ec9c06-5bcb-11e2-b965-5404a6497364 ONLINE
gptid/b2ae06a0-5bcb-11e2-b965-5404a6497364 ONLINE
gptid/b374bcae-5bcb-11e2-b965-5404a6497364 ONLINE


**** Crash message below ****

Mounting local file systems:.
cannot import '162564732152805204': no such pool available
(da2:mps0:0:3:0):READ(10).CDB2800e8e0860000010000
(da2:mps0:0:3:0): CAM status: SCSI Status Error
(da2:mps0:0:3:0): SCSI status: Check Condition
(da2:mps0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
(da2:mps0:0:3:0): Info: 0xe8e08688
(da2:mps0:0:3:0): Error 5, Unretrable error


Fatal Trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address =0x40
fault code = supervisor read data, page not present
instruction pointer= 0x20:0xffffffff8167ddc1
stack pointer= 0x28:0xffffffff8919928f00
frame point= 0x28:0xffffffff89119s8f30
code segment= base 0x0 limit 0xfffff, type 0x1b
processor eflags= interrupt enabled, resume, IOPL = 0
current process= 197 (txg_thread_enter)
[ thread pid 197 tid 100638 ]
stopped atvdev_is_dead+0x1:cmpq$ox5,ox40(%rdi)


Any and all help is appreciated.

Thanks.
 

Bilco

Cadet
Joined
Dec 19, 2013
Messages
2
32 Gigs Non ECC.

I have tried pulling some out in various configurations to check for bad ram.

It will boot fine as long as that one drive is removed.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You are making a big mistake using non-ECC RAM.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
stopped at vdev_is_dead hints at unrecoverable pool damage. It may still be possible to recover the data from the pool if vital, there was someone with what looks to be a similar problem a year or two ago on freebsd-fs. But generally the fix is:

1) Use ECC RAM (suspect anytime there is pool corruption evident on non-ECC systems)
2) Strongly suggest RAIDZ2 when you rebuild your pool
3) Then restore the data from backup
 
Status
Not open for further replies.
Top