Kernel Panic during pool import - Hardware RAID6 presented as 1 Drive

Ravi

Cadet
Joined
May 4, 2021
Messages
5
I have a NFS server running well for the last year and a couple of days ago the server crashed and was stuck in a reboot cycle. Everytime the pool was being imported kernel panics and reboots.

Here are the details of my installation:

Server Details
Server : Supermicro SuperChassis 826BA-R1K28WB 2U Server W/ X9DRW-3F
Memory : 64GB
Storage Controller : ADAPTEC 71605
Data Drives: 12 X 12TB 3.5 7.2K 6Gbps SAS
OS Drives: 2 X 256GB SSD

2X RAID6 Volumes created with 6 Drives each. The RAID Controller shows the drives are Optimal. I have also performed a verification of the volumes, it is good.

TrueNAS:
Version : 12.0-U1
Pools : Two Pools with 1 drive each, 43TB. Vol-01 and Vol-02.

I had to re-install the Truenas 12.0-U3, and the system came up with none of the pools came up. I went onto the UI and under Storage-> Pools, did an add Pool, and selected "Import Existing Pool", the system comes back with Vol-01 and Vol-02. Vol-02 imports fine no issues, no corruption of data.

When I select Vol-01 and perform the import, the kernel panics again. That means I have to redo the whole process again.

I then tried the following:

Boot and drop to boot prompt set the following flags:
set zfs:zfs_debug=1
set zfs:zfs_recover=1
set aok=1


Boot in multi-user mode, not Single user. Import Vol-02, and ran the following command on Vol-01

zdb -e -bcsvL Vol-01
Here is the ouput:

root@truenas[~]# zdb -e -bcsvL Vol-01
Traversing all blocks to verify checksums ...
385G completed ( 23MB/s) estimated time remaining: 442642hr 38min 50sec
bp count: 3790779
ganged count: 0
bp logical: 494650703872 avg: 130487
bp physical: 412161922048 avg: 108727 compression: 1.20
bp allocated: 412962242560 avg: 108938 compression: 1.20
bp deduped: 0 ref>1: 0 deduplication: 1.00
Normal class: 412947922944 used: 0.86%

additional, non-pointer bps of type 0: 6909
Dittoed blocks on same vdev: 27987

Blocks LSIZE PSIZE ASIZE avg comp %Total Type
- - - - - - - unallocated
2 32K 8K 24K 12K 4.00 0.00 object directory
45 54K 26K 540K 12K 2.08 0.00 object array
1 16K 4K 12K 12K 4.00 0.00 packed nvlist
- - - - - - - packed nvlist size
- - - - - - - bpobj
- - - - - - - bpobj header
- - - - - - - SPA space map header
7.42K 853M 277M 830M 112K 3.08 0.21 SPA space map
1 36K 36K 36K 36K 1.00 0.00 ZIL intent log
1.74K 33.5M 7.07M 20.3M 11.6K 4.73 0.01 DMU dnode
10 40K 40K 84K 8.40K 1.00 0.00 DMU objset
- - - - - - - DSL directory
12 6K 512 12K 1K 12.00 0.00 DSL directory child map
- - - - - - - DSL dataset snap map
21 290K 72K 216K 10.3K 4.02 0.00 DSL props
- - - - - - - DSL dataset
- - - - - - - ZFS znode
- - - - - - - ZFS V0 ACL
3.61M 460G 384G 384G 106K 1.20 99.78 ZFS plain file
272 2.60M 544K 1.45M 5.47K 4.89 0.00 ZFS directory
9 9K 9K 72K 8K 1.00 0.00 ZFS master node
- - - - - - - ZFS delete queue
- - - - - - - zvol object
- - - - - - - zvol prop
- - - - - - - other uint8[]
- - - - - - - other uint64[]
- - - - - - - other ZAP
- - - - - - - persistent error log
1 128K 8K 24K 24K 16.00 0.00 SPA history
- - - - - - - SPA history offsets
- - - - - - - Pool properties
- - - - - - - DSL permissions
- - - - - - - ZFS ACL
- - - - - - - ZFS SYSACL
- - - - - - - FUID table
- - - - - - - FUID table size
1 1K 1K 12K 12K 1.00 0.00 DSL dataset next clones
- - - - - - - scan work queue
- - - - - - - ZFS user/group/project used
- - - - - - - ZFS user/group/project quota
- - - - - - - snapshot refcount tags
- - - - - - - DDT ZAP algorithm
- - - - - - - DDT statistics
- - - - - - - System attributes
- - - - - - - SA master node
9 13.5K 13.5K 72K 8K 1.00 0.00 SA attr registration
18 288K 72K 144K 8K 4.00 0.00 SA attr layouts
- - - - - - - scan translations
- - - - - - - deduplicated block
- - - - - - - DSL deadlist map
- - - - - - - DSL deadlist map hdr
1 1K 1K 12K 12K 1.00 0.00 DSL dir clones
- - - - - - - bpobj subobj
- - - - - - - deferred free
- - - - - - - dedup ditto
36 475K 168K 576K 16K 2.84 0.00 other
3.62M 461G 384G 385G 106K 1.20 100.00 Total


Block Size Histogram
block psize lsize asize
size Count Size Cum. Count Size Cum. Count Size Cum.
512: 313 156K 156K 313 156K 156K 0 0 0
1K: 105 114K 270K 105 114K 270K 0 0 0
2K: 42 134K 405K 42 134K 405K 0 0 0
4K: 22.8K 91.3M 91.7M 73 330K 735K 17.2K 69.0M 69.0M
8K: 39.8K 415M 507M 24 204K 939K 29.6K 294M 363M
16K: 99.8K 2.04G 2.53G 2.70K 43.2M 44.1M 114K 2.38G 2.73G
32K: 248K 11.4G 13.9G 17.9K 573M 617M 249K 11.4G 14.1G
64K: 1.23M 117G 131G 30 3.48M 621M 1.23M 117G 131G
128K: 1.98M 253G 384G 3.59M 459G 460G 1.98M 253G 384G
256K: 0 0 384G 0 0 460G 1.33K 386M 385G
512K: 0 0 384G 0 0 460G 0 0 385G
1M: 0 0 384G 0 0 460G 0 0 385G
2M: 0 0 384G 0 0 460G 0 0 385G
4M: 0 0 384G 0 0 460G 0 0 385G
8M: 0 0 384G 0 0 460G 0 0 385G

16M: 0 0 384G 0 0 460G 0 0 385G
capacity operations bandwidth ---- errors ----

description used avail read write read write read write cksum

Vol-01 385G 43.2T 441 0 46.4M 0 0 0 0
/dev/gptid/19708e9b-6e31-11eb-9230-0cc47a17595c 385G 43.2T 441 0 46.4M 0 0 0 0


I tried to perform a:

# zpool scrub Vol-01 it says no such volume

Tried to import the pool, the system panics again.

Any help would be welcome, as we have data on this pool that needs to be recovered.
 

Ravi

Cadet
Joined
May 4, 2021
Messages
5
Futher update:
I ran the following command successfully:
zpool import -o readonly=on -fF -R /mnt/Vol-01
pool: Vol-01
id: 9179814278270120964
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
Vol-01 ONLINE
gptid/19708e9b-6e31-11eb-9230-0cc47a17595c ONLINE

but
root@truenas[/var/log]# zpool scrub Vol-01
cannot open 'Vol-01': no such pool
and
# zpool status
pool: Vol-02
state: ONLINE
scan: scrub repaired 0B in 00:54:14 with 0 errors on Sun Apr 25 00:54:14 2021
config:
NAME STATE READ WRITE CKSUM
Vol-02 ONLINE 0 0 0
gptid/b897abdd-6e31-11eb-9230-0cc47a17595c ONLINE 0 0 0
errors: No known data errors

pool: boot-pool
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ada0p2 ONLINE 0 0 0
ada1p2 ONLINE 0 0 0

Both zpool status and scrub tells me the pool is not there.
 

Ravi

Cadet
Joined
May 4, 2021
Messages
5
Yeah, this is totally expected as the Adaptec isn't well supported. Please see the following post:

https://www.truenas.com/community/r...bas-and-why-cant-i-use-a-raid-controller.139/

The damage that can happen to pools when a storage controller is not 100% is generally unfixable, and if the pool is refusing to import, I've never seen a successful recovery in this sort of situation.
Thanks for the link it is very helpful. Did not know that Adaptec is not fully supported. Based on the write up, is it better to disable write cache on the controller. This is also a good lesson to make sure the next storage I order does not need RAID controller.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Thanks for the link it is very helpful. Did not know that Adaptec is not fully supported. Based on the write up, is it better to disable write cache on the controller.

Adaptec used to provide great documentation for their chipsets back in the day, but that stopped being the case maybe twenty years ago. This puts driver authors at a disadvantage. This compares poorly to vendors like LSI, who actually have staff authoring (or helping to author) drivers for FreeBSD and Linux.

It's better to ditch the controller entirely. The cache is just an edge issue compared to the overall reliability issues.

This is also a good lesson to make sure the next storage I order does not need RAID controller.

Your *existing* storage does not need a RAID controller. All it needs is a SAS HBA. I
 

Ravi

Cadet
Joined
May 4, 2021
Messages
5
Futher update:
I ran the following command successfully:
zpool import -o readonly=on -fF -R /mnt/Vol-01
pool: Vol-01
id: 9179814278270120964
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
Vol-01 ONLINE
gptid/19708e9b-6e31-11eb-9230-0cc47a17595c ONLINE

but
root@truenas[/var/log]# zpool scrub Vol-01
cannot open 'Vol-01': no such pool
and
# zpool status
pool: Vol-02
state: ONLINE
scan: scrub repaired 0B in 00:54:14 with 0 errors on Sun Apr 25 00:54:14 2021
config:
NAME STATE READ WRITE CKSUM
Vol-02 ONLINE 0 0 0
gptid/b897abdd-6e31-11eb-9230-0cc47a17595c ONLINE 0 0 0
errors: No known data errors

pool: boot-pool
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ada0p2 ONLINE 0 0 0
ada1p2 ONLINE 0 0 0

Both zpool status and scrub tells me the pool is not there.

Any help here to proceed from here is great. I tried remounting the pool. kernel panics again. But this around the system came back up so not a huge waste of time
 

Ravi

Cadet
Joined
May 4, 2021
Messages
5
Any help here to proceed from here is great. I tried remounting the pool. kernel panics again. But this around the system came back up so not a huge waste of time
Adaptec used to provide great documentation for their chipsets back in the day, but that stopped being the case maybe twenty years ago. This puts driver authors at a disadvantage. This compares poorly to vendors like LSI, who actually have staff authoring (or helping to author) drivers for FreeBSD and Linux.

It's better to ditch the controller entirely. The cache is just an edge issue compared to the overall reliability issues.



Your *existing* storage does not need a RAID controller. All it needs is a SAS HBA. I
I will reconfigure the array once I get the volume to mount and if it does ( :) ) then will use a SAS HBA to present all the 12 drives to TrueNAS and bring it under software RAID, for now I just want to recover the data and move forward.

I can mount the drive RO, it states it is ONLINE, is there any way I can at least read the data out of the pool? both zpool status and zpool scrub refuses to acknowledge the presence of the pool.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Any help here to proceed from here is great. I tried remounting the pool. kernel panics again. But this around the system came back up so not a huge waste of time

Back up all your data, make sure it's good, then pull the RAID controller, replace it with an HBA, create a new pool, and reload the data onto the new pool.
 
Top