ZFS crash on import

mrloop

Cadet
Joined
Sep 18, 2018
Messages
3
I've get two system running Freenas 11 one, called caolila, a nfs server, afp server, plex server which I also send snapshots from a remote ubuntu zfs pool. The other Freenas instance, xanadu, at a remote location is solely for replicating the first Freenas instances. Both Freenas instances have are having issues after their pools are imported in read write mode. See error here

Caolila: Solaris(panic): blkptr at 0xfffffe00151abd80 has invalid TYPE 104
Xanadu: Solaris(panic): blkptr at 0xfffffe00151abd80 has invalid TYPE 104

Caolila
Proliant ML10Gen9
Intel Pentium G4400 3.30GHz
4GB ECC Hynix + 16GB ECC Micron
- Raid-Z2
1 * HGST HDN724030ALE640 2.7T
2 * WDC WD30EFRX-68N32N0 2.7T
2 * ST3000VN007-2E4166 2.7T
Boot USB stick 32GB Samsung
Intel I219-LM Ethernet

Xanadu
Prolient MicroServer - 663724-421
AMD Athlon(tm) II Neo N36L Dual-Core Processor 1.3GHz
8GB ECC
AMD/ATI SB7x0/SB8x0/SB9x0 SATA Controller AHCI mode
Raid-Z1
- 2 * Seagate IronWolf ST3000VN007-2E4166 2.7T
- 2 * Western Digital Red WDC WD30EFRX-68N32N0 2.7T
Boot USB stick 14G Kingston DataTraveler 2.0k
Broadcom Limited NetXtreme BCM5723 Gigabit Ethernet PCIe

Please see linked / attached file for debug log. Xanadu

I can import the pool in readonly mode and read data from it, doing the following:
* `ls /dev/gptid`
* `geli attach -pk ~/geli.key /dev/gptid/disk_id_for_each_disk`
* `zpool import -o readonly=on base

I've tried verifying metadata for xanadu

* zdb -e -bcsvL base
* runs for a few hours then fails
* swap_pager_getswapspace(2): failed Jun 23 21:53:25 xanadu kernel: pid 37952 (zdb). uid 0, was killed: out of swap space

Caolila the pool fails to mount readonly, doing

* `ls /dev/gptid`
* `geli attach -pk ~/geli.key /dev/gptid/disk_id_for_each_disk`
* `zpool import -o readonly=on dome`
* ERROR - similar blkptr one
readonly-import.jpeg
Next on caolila was going to try `zpool import -FX dome` as suggested in https://www.ixsystems.com/community/threads/zfs-has-failed-you.11951/, is there anything I should do before this? If that doesn't work it looks like I need to destroy the existing pool on caolila and then replicate data from readonly xanadu across to caolila to a newly created pool, and then have a functioning system, and then destory existing xanadu pool and replicate data from caolila back to xanadu to a newly created pool. However xanadu is missing most recent data, that is caolila had/has data that is more recent than last time data was sent to xanadu to replicate.

I'd like to be able to
1) Make caolila / xanadu existing pools read writable again
2) If not make caolila readable again to retrieve most recent data
3) Understand what went wrong, how to avoid it in the future.

Useful posts
https://www.ixsystems.com/community/threads/zfs-has-failed-you.11951/
https://www.ixsystems.com/community...nstallation-fails-because-its-too-full.75047/
 

Attachments

  • debug-xanadu-20190528215806.tgz
    1.6 MB · Views: 290
  • debug-caolila-20190528215921.tgz
    525 KB · Views: 313

mrloop

Cadet
Joined
Sep 18, 2018
Messages
3
No, I've not spent anymore time on it since posting, out of the country for a couple of weeks, with no access to machines at the moment, was going to have a look when I get back. Was hoping somebody would reply, either with other suggestions or confirm yes should try 'zpool import -FX dome'. Is it worth / possible to get the verifying meta data command to work, would this help understand what went wrong and maybe fix it?
 
Last edited:

appliance

Explorer
Joined
Nov 6, 2019
Messages
96
having this TYPE 101 on freeNAS 11.3RC, replication related. in debug mode, i see tons of other errors (CHECKSUM, incorrect vdev). New errorfree drives, UPS, lot of ECC memory. Usually memory modules are blamed, like it was 1995, but i believe in BUGS. You can't have same stacktraces happening at the same moment each time, on two different ECC module sets used. Similar setup on ZOL never produced a panic, freebsd panicked 100 times for me in 2 months. i now have to quit replicating as this is the cause of every single panic in 11.3 and apparently also in 11.2 as seen in open tickets.
 
Top