zpool import online but not

recode

Cadet
Joined
Mar 11, 2024
Messages
3
Hello everyone, Since I cannot import zpool in my any efforts,
one day after I notice I already stepped in danger zone.
Want to know any options are remain.
I haven't gotten enough sleep, so please excuse the organization of this post.

What happend? in Short history.​

I'll call it as DBOX that current system in danger. below is short history about DBOX.
  • DBOX is Truenas Scale inside Truenas VM. Using dev/disk/by-id passthrough. created one RAIDZ2 pool named hdd10x16t, 1 vdev, no log, no spare.
  • DBOX made for a remote replica of ABOX. after ABOX to DBOX replication complete, ran scrub on DBOX without error.
  • Setup Cloudsync between DBOX to LBOX(Truenas Scale at local). Since dataset differs between two Truenas, I couldn't setup zfs replication.
  • CloudSync ran about 30%, I destroyed ABOX for cost reason.
  • DBOX boot-pool ok but cannot import raidz2 pool (hdd10x16t).
    • No warning Smartctl LONG, Avg Hdd temp 30~35 monthly, No DEGRADED or OFFLINE logged in Truenas.
      • Couldn't saved logged at DBOX, since I ran several restart DBOX.
  • DBOX's hdd10x16t is current pool
  • Host and VM timezone are Asia/Seoul for convenience.

Host Hardware, running Proxmox
- CPU: AMD Ryzen 9 3900
- Motherboard: ASRockRack - B565D4-V1L
- RAM: 128GB ECC
- 2x 1.92t NVME, 10x 16t HDD

Proxmox VM, running Truenas Scale
- Processor: host, 16core, SeaBios, i440fx
- RAM: 96GB
- VirtIO SCSI Single / OS disks are in NVME as image / Harddisks are passthrough using /dev/disk/by-id
- HDD: 10x 16t SATA HDD, ST16000NM001J-2TW113_[SERIAL]
- No VM No Sharing services are enabled like SMB, only running Tailscale in Apps.
- Harddisks are connected to SATA Port on Motherboard. There is no HBA card.

How I recognize I'm in danger?​

1 days ago, (10.Mar.2024)​

I made Cloudsync job between DBOX and LBOX(Truenas Scale at local). All of sudden, there are telegram alarm from LBOX that cannot proceed CloudSync.
So I entered to DBOX but there are all red on pool on hdd10x16t that intacted cloudsync.
I don't know why, but I moved my mouse in my intuition, export that pool named hdd10x16t
And tried to import pool but GUI shows Error: 2095 is not a valid Error

Entered shell, tried below commands to import pool
Code:
zpool import
  pool: hdd10x16t
  id: 14448620205443767059
  state: ONLINE
  action: The pool can be imported using its name or numeric identifier.
    config:hdd10x16t                              ONLINE
      raidz2-0                              ONLINE
        ata-ST16000NM001J-2TW113_ZRS0ZEA4                              ONLINE
        ata-ST16000NM001J-2TW113_ZRS0ZD7P                              ONLINE    
        ata-ST16000NM001J-2TW113_ZRS0ZESV                              ONLINE    
        ata-ST16000NM001J-2TW113_ZRS0ZDE6                              ONLINE    
        ata-ST16000NM001J-2TW113_ZRS0ZE7G                              ONLINE    
        ata-ST16000NM001J-2TW113_ZRS0ZDMX                              ONLINE    
        ata-ST16000NM001J-2TW113_ZRS0ZD05                              ONLINE    
        ata-ST16000NM001J-2TW113_ZRS0ZED5                              ONLINE    
        ata-ST16000NM001J-2TW113_ZRS0ZEEA                              ONLINE    
        ata-ST16000NM001J-2TW113_ZRS0ZE3T                              ONLINE

zpool status
  no pools available

Tried another command that I found on Forum. -f no Luck. But this time inside Truenas and Proxmox Host(restarted host while vm turned off passthrough).
below are commands ran on host, not Truenas scale VM. Truenas scale VM showed alike outputs.
Code:
zpool import hdd10x16t
  cannot import 'hdd10x16t': pool was previously in use from another system. Last accessed by truenas (hostid=7da77e55) at Mon Mar 11 02:04:48 2024
  The pool can be imported, use 'zpool import -f' to import the pool.

zpool import hdd10x16t -f
  cannot import 'hdd10x16t': insufficient replicas
  Destroy and re-create the pool from a backup source.



I have checked whether devid changes with no reason, but it seems okay.
Or should path and devid directs disk not part1?
zdb -l hdd10x16t
------------------------------------
LABEL 0
------------------------------------
version: 5000
name: 'hdd10x16t'
state: 0
txg: 1268725
pool_guid: 14448620205443767059
errata: 0
hostid: 2285398396
hostname: 'recode-hetzner10x16t'
top_guid: 3320680006278793367
guid: 9307791802452327806
vdev_children: 1
vdev_tree:
type: 'raidz'
id: 0
guid: 3320680006278793367
nparity: 2
metaslab_array: 256
metaslab_shift: 34
ashift: 12
asize: 160008854568960
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 9307791802452327806
path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZEA4-part1'
devid: 'ata-ST16000NM001J-2TW113_ZRS0ZEA4-part1'
phys_path: 'pci-0000:25:00.0-ata-4.0'
whole_disk: 1
DTL: 159
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 5722856747687887882
path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZD7P-part1'
devid: 'ata-ST16000NM001J-2TW113_ZRS0ZD7P-part1'
phys_path: 'pci-0000:01:00.0-ata-1.0'
whole_disk: 1
DTL: 158
create_txg: 4
children[2]:
type: 'disk'
id: 2
guid: 12515852929378121397
path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZESV-part1'
devid: 'ata-ST16000NM001J-2TW113_ZRS0ZESV-part1'
phys_path: 'pci-0000:01:00.0-ata-2.0'
whole_disk: 1
DTL: 155
create_txg: 4
children[3]:
type: 'disk'
id: 3
guid: 5347527874058330893
path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZDE6-part1'
devid: 'ata-ST16000NM001J-2TW113_ZRS0ZDE6-part1'
phys_path: 'pci-0000:01:00.0-ata-3.0'
whole_disk: 1
DTL: 154
create_txg: 4
children[4]:
type: 'disk'
id: 4
guid: 16523393813715497135
path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZE7G-part1'
devid: 'ata-ST16000NM001J-2TW113_ZRS0ZE7G-part1'
phys_path: 'pci-0000:01:00.0-ata-4.0'
whole_disk: 1
DTL: 153
create_txg: 4
children[5]:
type: 'disk'
id: 5
guid: 846333880863613494
path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZDMX-part1'
devid: 'ata-ST16000NM001J-2TW113_ZRS0ZDMX-part1'
phys_path: 'pci-0000:02:00.1-ata-1.0'
whole_disk: 1
DTL: 152
create_txg: 4
children[6]:
type: 'disk'
id: 6
guid: 5875470405935328920
path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZD05-part1'
devid: 'ata-ST16000NM001J-2TW113_ZRS0ZD05-part1'
phys_path: 'pci-0000:02:00.1-ata-2.0'
whole_disk: 1
DTL: 151
create_txg: 4
children[7]:
type: 'disk'
id: 7
guid: 7378420708011026499
path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZED5-part1'
devid: 'ata-ST16000NM001J-2TW113_ZRS0ZED5-part1'
phys_path: 'pci-0000:25:00.0-ata-1.0'
whole_disk: 1
DTL: 29336
create_txg: 4
children[8]:
type: 'disk'
id: 8
guid: 12934111643194929302
path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZEEA-part1'
devid: 'ata-ST16000NM001J-2TW113_ZRS0ZEEA-part1'
phys_path: 'pci-0000:25:00.0-ata-2.0'
whole_disk: 1
DTL: 25315
create_txg: 4
children[9]:
type: 'disk'
id: 9
guid: 7616272590451776413
path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZE3T-part1'
devid: 'ata-ST16000NM001J-2TW113_ZRS0ZE3T-part1'
phys_path: 'pci-0000:25:00.0-ata-3.0'
whole_disk: 1
DTL: 150
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
com.klarasystems:vdev_zaps_v2
labels = 0 1 2 3
Also ran -mfR and -F but it seems same to me.
It is ran out my knowledge, also cannot think new words for googling, I've searched Oracle zfs document, asked Gemini 1.5 Pro and Claude 3 Opus all saying same story

Today, (11.Mar.2024)​

I've found one success story in this Forum, and tried readonly parameter from OracleZFS document

  • loh's sysctl tune so I've made simple script to tune parameters. Parameters were changed after reboot, but cannot import zpool.
    Code:
    [Unit]
    Description=Set ZFS parameters
    Before=zfs-import-cache.service
    Before=zfs-import-scan.service
    
    [Service]
    Type=oneshot
    ExecStart=/bin/sh -c "echo 1 > /sys/module/zfs/parameters/zfs_max_missing_tvds; \
    echo 0 > /sys/module/zfs/parameters/spa_load_verify_data; \
    echo 0 > /sys/module/zfs/parameters/spa_load_verify_metadata"
    
    [Install]
    WantedBy=multi-user.target
  • Oracle Document: Importing a Pool in Read-Only Mode
    • zpool import -o readonly=on hdd10x16t shows
      Code:
      cannot import 'hdd10x16t : insufficient replicas
      Destroy and re-create the pool from a backup source'
      .
  • Tried import from Proxmox again to see Syslog
    Code:
    Mar 11 20:56:19 [HOSTNAME] zed[14219]: eid=3 class=checksum pool='hdd10x16t' vdev=ata-ST16000NM001J-2TW113_ZRS0ZD05-part1 size=4096 offset=15877681995776 priority=0 err=52 flags=0x100080 bookmark=0:61:0:0
    Mar 11 20:56:19 [HOSTNAME] zed[14218]: eid=1 class=checksum pool='hdd10x16t' vdev=ata-ST16000NM001J-2TW113_ZRS0ZEEA-part1 size=4096 offset=15877681995776 priority=0 err=52 flags=0x100080 bookmark=0:61:0:0
    Mar 11 20:56:19 [HOSTNAME] zed[14221]: eid=4 class=checksum pool='hdd10x16t' vdev=ata-ST16000NM001J-2TW113_ZRS0ZED5-part1 size=4096 offset=15870949339136 priority=0 err=52 flags=0x100080 bookmark=0:61:0:0
    Mar 11 20:56:19 [HOSTNAME] zed[14217]: eid=2 class=checksum pool='hdd10x16t' vdev=ata-ST16000NM001J-2TW113_ZRS0ZED5-part1 size=4096 offset=15877681995776 priority=0 err=52 flags=0x100080 bookmark=0:61:0:0
    Mar 11 20:56:19 [HOSTNAME] zed[14223]: eid=5 class=checksum pool='hdd10x16t' vdev=ata-ST16000NM001J-2TW113_ZRS0ZD05-part1 size=4096 offset=15870949339136 priority=0 err=52 flags=0x100080 bookmark=0:61:0:0
    Mar 11 20:56:19 [HOSTNAME] zed[14225]: eid=6 class=checksum pool='hdd10x16t' vdev=ata-ST16000NM001J-2TW113_ZRS0ZDMX-part1 size=4096 offset=15870949339136 priority=0 err=52 flags=0x100080 bookmark=0:61:0:0
    Mar 11 20:56:41 [HOSTNAME] zed[14403]: eid=7 class=data pool='hdd10x16t' priority=0 err=52 flags=0x808081 bookmark=0:0:0:1859
    Mar 11 20:56:41 [HOSTNAME] zed[14407]: eid=8 class=checksum pool='hdd10x16t' vdev=ata-ST16000NM001J-2TW113_ZRS0ZD7P-part1 size=4096 offset=8460656586752 priority=0 err=52 flags=0x100080 bookmark=0:0:0:1859
    Mar 11 20:56:41 [HOSTNAME] zed[14410]: eid=9 class=checksum pool='hdd10x16t' vdev=ata-ST16000NM001J-2TW113_ZRS0ZEA4-part1 size=4096 offset=8460656586752 priority=0 err=52 flags=0x100080 bookmark=0:0:0:1859
    Mar 11 20:56:41 [HOSTNAME] zed[14413]: eid=10 class=checksum pool='hdd10x16t' vdev=ata-ST16000NM001J-2TW113_ZRS0ZE3T-part1 size=4096 offset=8460656582656 priority=0 err=52 flags=0x100080 bookmark=0:0:0:1859
    Mar 11 20:56:41 [HOSTNAME] zed[14415]: eid=11 class=checksum pool='hdd10x16t' vdev=ata-ST16000NM001J-2TW113_ZRS0ZDMX-part1 size=4096 offset=7589513297920 priority=0 err=52 flags=0x100080 bookmark=0:0:0:1859
    Mar 11 20:56:41 [HOSTNAME] zed[14419]: eid=12 class=checksum pool='hdd10x16t' vdev=ata-ST16000NM001J-2TW113_ZRS0ZE7G-part1 size=4096 offset=7589513297920 priority=0 err=52 flags=0x100080 bookmark=0:0:0:1859
    Mar 11 20:56:41 [HOSTNAME] zed[14422]: eid=13 class=checksum pool='hdd10x16t' vdev=ata-ST16000NM001J-2TW113_ZRS0ZDE6-part1 size=4096 offset=7589513297920 priority=0 err=52 flags=0x100080 bookmark=0:0:0:1859
    Mar 11 20:56:41 [HOSTNAME] zed[14425]: eid=14 class=checksum pool='hdd10x16t' vdev=ata-ST16000NM001J-2TW113_ZRS0ZD7P-part1 size=4096 offset=6631704514560 priority=0 err=52 flags=0x100080 bookmark=0:0:0:1859
    Mar 11 20:56:41 [HOSTNAME] zed[14429]: eid=15 class=checksum pool='hdd10x16t' vdev=ata-ST16000NM001J-2TW113_ZRS0ZEA4-part1 size=4096 offset=6631704514560 priority=0 err=52 flags=0x100080 bookmark=0:0:0:1859
    Mar 11 20:56:41 [HOSTNAME] zed[14430]: eid=16 class=checksum pool='hdd10x16t' vdev=ata-ST16000NM001J-2TW113_ZRS0ZE3T-part1 size=4096 offset=6631704510464 priority=0 err=52 flags=0x100080 bookmark=0:0:0:1859
    Mar 11 20:56:41 [HOSTNAME] zed[14431]: eid=17 class=vdev.corrupt_data pool='hdd10x16t'
    Mar 11 20:56:41 [HOSTNAME] zed[14432]: eid=18 class=zpool pool='hdd10x16t'
    Mar 11 20:56:41 [HOSTNAME] zed[14563]: eid=19 class=data pool='hdd10x16t' priority=0 err=52 flags=0x808081 bookmark=0:0:0:1859
    Mar 11 20:56:41 [HOSTNAME] zed[14564]: eid=20 class=vdev.corrupt_data pool='hdd10x16t'
    Mar 11 20:56:41 [HOSTNAME] zed[14565]: eid=21 class=zpool pool='hdd10x16t'
    • Since I noticed err=52 on Syslog, I contacted Hosting provider and asked check my drives on DBOX. No errors on drives by provider.
Dear Client,

The server has finished the drive test without any issues.
-------------------------------------------------
Hardware Check report for IP [redacted]
Disk check: OK
NVMe SSD `697S102NTR0Q`: OK
S.M.A.R.T Tests: OK
Error counters: OK
NVMe SSD `697S102ITR0Q`: OK
S.M.A.R.T Tests: OK
Error counters: OK
SATA HDD `ZRS0ZESV`: OK
S.M.A.R.T Tests: OK
S.M.A.R.T Self-Test: OK
S.M.A.R.T Health self assessment: OK
Error counters: OK
SATA HDD `ZRS0ZD7P`: OK
S.M.A.R.T Tests: OK
S.M.A.R.T Self-Test: OK
S.M.A.R.T Health self assessment: OK
Error counters: OK
SATA HDD `ZRS0ZDMX`: OK
S.M.A.R.T Tests: OK
S.M.A.R.T Self-Test: OK
S.M.A.R.T Health self assessment: OK
Error counters: OK
SATA HDD `ZRS0ZE7G`: OK
S.M.A.R.T Tests: OK
S.M.A.R.T Self-Test: OK
S.M.A.R.T Health self assessment: OK
Error counters: OK
SATA HDD `ZRS0ZDE6`: OK
S.M.A.R.T Tests: OK
S.M.A.R.T Self-Test: OK
S.M.A.R.T Health self assessment: OK
Error counters: OK
SATA HDD `ZRS0ZD05`: OK
S.M.A.R.T Tests: OK
S.M.A.R.T Self-Test: OK
S.M.A.R.T Health self assessment: OK
Error counters: OK
SATA HDD `ZRS0ZED5`: OK
S.M.A.R.T Tests: OK
S.M.A.R.T Self-Test: OK
S.M.A.R.T Health self assessment: OK
Error counters: OK
SATA HDD `ZRS0ZEEA`: OK
S.M.A.R.T Tests: OK
S.M.A.R.T Self-Test: OK
S.M.A.R.T Health self assessment: OK
Error counters: OK
SATA HDD `ZRS0ZE3T`: OK
S.M.A.R.T Tests: OK
S.M.A.R.T Self-Test: OK
S.M.A.R.T Health self assessment: OK
Error counters: OK
SATA HDD `ZRS0ZEA4`: OK
S.M.A.R.T Tests: OK
S.M.A.R.T Self-Test: OK
S.M.A.R.T Health self assessment: OK
Error counters: OK
-------------------------------------------------

We have started the server for you again.
Please do not hesitate to contact us again if you have further questions or requirements.

So.. Here what I've tried.​

It might be not enough to everyone, but I feel it is over my knowledge.

Summarizing my tries​

  • zpool import with -f, -mfr, readonly=on
    • when zpool import or zpool import -mfr only, drive lists showed, but there is no pool zpool list
    • when specify pool name or pool_guid zpool import -f hdd10x16t shows cannot import or insufficient replicas.
    • -f readonly=on is same as specifying pool name.
  • smartctl shows drives are healthy.
  • changed zfs parameter zfs_max_missing_tvds, spa_load_verify_data, spa_load_verify_metadata to bypass zpool hates using script, and ran again above, all outputs are same above.
  • There is running the flawless EBOX that almost same environment like DBOX, that zdb -l prints the path: /dev/disk/by-partuuid not alike path: /dev/disk/by-id on DBOX.
    • But I don't know how can I reconfigure using zdb -e

My endless question marks. Help needed.​

I've roamed zfs pools, shelf to shelf or re-insert different slot enormously but surprisingly it is first time to me.
Below are which I want to ask to everyones wisdom.
  • Why alerts were not functioned from DBOX? checked click test message it works.
  • Why no dangers were detected from Truenas or ZFS backend or Smartctl?
  • Why zpool import prints out state: ONLINE that seems working but actually not?
  • Can I change zdb parameter using zdb -e command?
  • There is output insufficient replicas meant if I insert SPARE drive, is it possible resilver?
    • But currently stats: ONLINE it feels misreporting everywhere.
    • If there is error in some drives, how can I know how many needs? and can I add SPARE on currently not imported pools?
  • As far as I know scrub or resilver need to import pool first. or.. Can I?
  • Or is there any other options are remain?

Thank you.

Below is lshw prints for information.
*-sata
description: SATA controller
product: JMB58x AHCI SATA controller
vendor: JMicron Technology Corp.
physical id: 0
bus info: pci@0000:01:00.0
logical name: scsi0
logical name: scsi1
logical name: scsi2
logical name: scsi3
version: 00
width: 32 bits
clock: 33MHz
capabilities: sata pm msi pciexpress ahci_1.0 bus_master cap_list rom emulated
configuration: driver=ahci latency=0
resources: irq:41 ioport:f200(size=128) ioport:f180(size=128) ioport:f100(size=128) ioport:f080(size=128) ioport:f000(size=128) memory:fcb10000-fcb11fff memory:fcb00000-fcb0ffff
*-disk:0
description: ATA Disk
product: ST16000NM001J-2T
physical id: 0
bus info: scsi@0:0.0.0
logical name: /dev/sda
version: SS04
serial: ZRS0ZD7P
size: 14TiB (16TB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: ansiversion=5 guid=7f414e0b-fc89-a24e-9918-c09b5941ce04 logicalsectorsize=512 sectorsize=4096
*-disk:1
description: ATA Disk
product: ST16000NM001J-2T
physical id: 1
bus info: scsi@1:0.0.0
logical name: /dev/sdb
version: SS04
serial: ZRS0ZESV
size: 14TiB (16TB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: ansiversion=5 guid=b22fd5d4-f32b-8a45-99aa-34f5513df680 logicalsectorsize=512 sectorsize=4096
*-disk:2
description: ATA Disk
product: ST16000NM001J-2T
physical id: 2
bus info: scsi@2:0.0.0
logical name: /dev/sdc
version: SS04
serial: ZRS0ZDE6
size: 14TiB (16TB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: ansiversion=5 guid=f74c9c5f-c813-f14b-8fe8-8bf4a7119a3f logicalsectorsize=512 sectorsize=4096
*-disk:3
description: ATA Disk
product: ST16000NM001J-2T
physical id: 3
bus info: scsi@3:0.0.0
logical name: /dev/sdd
version: SS04
serial: ZRS0ZE7G
size: 14TiB (16TB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: ansiversion=5 guid=e75d9f00-02ac-8041-9bef-3bf32d8de1f9 logicalsectorsize=512 sectorsize=4096
*-sata
description: SATA controller
product: 500 Series Chipset SATA Controller
vendor: Advanced Micro Devices, Inc. [AMD]
physical id: 0.1
bus info: pci@0000:02:00.1
logical name: scsi5
logical name: scsi6
version: 00
width: 32 bits
clock: 33MHz
capabilities: sata msi pm pciexpress ahci_1.0 bus_master cap_list rom emulated
configuration: driver=ahci latency=0
resources: irq:49 memory:fc480000-fc49ffff memory:fc400000-fc47ffff
*-disk:0
description: ATA Disk
product: ST16000NM001J-2T
physical id: 0
bus info: scsi@5:0.0.0
logical name: /dev/sde
version: SS04
serial: ZRS0ZDMX
size: 14TiB (16TB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: ansiversion=5 guid=a0ab55f9-6641-914d-83ad-825c5f91005b logicalsectorsize=512 sectorsize=4096
*-disk:1
description: ATA Disk
product: ST16000NM001J-2T
physical id: 1
bus info: scsi@6:0.0.0
logical name: /dev/sdf
version: SS04
serial: ZRS0ZD05
size: 14TiB (16TB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: ansiversion=5 guid=a771726f-a1cf-9f4e-a027-cb96b31a7748 logicalsectorsize=512 sectorsize=4096
*-sata
description: SATA controller
product: JMB58x AHCI SATA controller
vendor: JMicron Technology Corp.
physical id: 0
bus info: pci@0000:25:00.0
logical name: scsi11
logical name: scsi12
logical name: scsi13
logical name: scsi14
version: 00
width: 32 bits
clock: 33MHz
capabilities: sata pm msi pciexpress ahci_1.0 bus_master cap_list rom emulated
configuration: driver=ahci latency=0
resources: irq:50 ioport:e200(size=128) ioport:e180(size=128) ioport:e100(size=128) ioport:e080(size=128) ioport:e000(size=128) memory:fc310000-fc311fff memory:fc300000-fc30ffff
*-disk:0
description: ATA Disk
product: ST16000NM001J-2T
physical id: 0
bus info: scsi@11:0.0.0
logical name: /dev/sdg
version: SS04
serial: ZRS0ZED5
size: 14TiB (16TB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: ansiversion=5 guid=83c7107a-cbdc-a845-a001-f6b018489a47 logicalsectorsize=512 sectorsize=4096
*-disk:1
description: ATA Disk
product: ST16000NM001J-2T
physical id: 1
bus info: scsi@12:0.0.0
logical name: /dev/sdh
version: SS04
serial: ZRS0ZEEA
size: 14TiB (16TB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: ansiversion=5 guid=1bad2939-e035-9d44-9c28-eb11b947a10e logicalsectorsize=512 sectorsize=4096
*-disk:2
description: ATA Disk
product: ST16000NM001J-2T
physical id: 2
bus info: scsi@13:0.0.0
logical name: /dev/sdi
version: SS04
serial: ZRS0ZE3T
size: 14TiB (16TB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: ansiversion=5 guid=0a80beb8-22a4-9148-8bfc-297420f34835 logicalsectorsize=512 sectorsize=4096
*-disk:3
description: ATA Disk
product: ST16000NM001J-2T
physical id: 3
bus info: scsi@14:0.0.0
logical name: /dev/sdj
version: SS04
serial: ZRS0ZEA4
size: 14TiB (16TB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: ansiversion=5 guid=669afe70-aaa9-d441-a8e5-aed3f834e9b3 logicalsectorsize=512 sectorsize=4096
*-nvme
description: NVMe device
product: KXD51RUE960G TOSHIBA
vendor: Toshiba Corporation
physical id: 0
bus info: pci@0000:2c:00.0
logical name: /dev/nvme0
version: 1CEE6111
serial: 697S102NTR0Q
width: 64 bits
clock: 33MHz
capabilities: nvme pm msix pciexpress msi nvm_express bus_master cap_list
configuration: driver=nvme latency=0 nqn=nqn.2017-03.jp.co.toshiba:KXD51RUE960G TOSHIBA:697S102NTR0Q state=live
resources: irq:65 memory:fca00000-fca03fff
*-namespace:0
description: NVMe disk
physical id: 0
logical name: hwmon1
*-namespace:1
description: NVMe disk
physical id: 2
logical name: /dev/ng0n1
*-namespace:2
description: NVMe disk
physical id: 1
bus info: nvme@0:1
logical name: /dev/nvme0n1
size: 894GiB (960GB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: guid=63b2957f-2400-44fe-82e8-375603842aa2 logicalsectorsize=512 sectorsize=512 wwid=eui.00000000000000108ce38e0200083515
*-nvme
description: NVMe device
product: KXD51RUE960G TOSHIBA
vendor: Toshiba Corporation
physical id: 0
bus info: pci@0000:2d:00.0
logical name: /dev/nvme1
version: 1CEE6111
serial: 697S102ITR0Q
width: 64 bits
clock: 33MHz
capabilities: nvme pm msix pciexpress msi nvm_express bus_master cap_list
configuration: driver=nvme latency=0 nqn=nqn.2017-03.jp.co.toshiba:KXD51RUE960G TOSHIBA:697S102ITR0Q state=live
resources: irq:63 memory:fc900000-fc903fff
*-namespace:0
description: NVMe disk
physical id: 0
logical name: hwmon0
*-namespace:1
description: NVMe disk
physical id: 2
logical name: /dev/ng1n1
*-namespace:2
description: NVMe disk
physical id: 1
bus info: nvme@1:1
logical name: /dev/nvme1n1
size: 894GiB (960GB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: guid=41b9fa90-b74f-4341-b8b8-353f74247610 logicalsectorsize=512 sectorsize=512 wwid=eui.00000000000000108ce38e0200083510
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
DBOX is Truenas Scale inside Truenas VM. Using dev/disk/by-id passthrough.
WoW. ok. first of all, TrueNAS VM should be passed a whole disk controller. the disk-by-id passthroughs and VM disks are usually asking for trouble. they often add things that makes zfs unable to recover and otherwise obfusate the disks so zfs and truenas cannot tell anything is wrong.
next, you virtualized truenas inside a virtualized truenas? what the heck?
As far as I know scrub or resilver need to import pool first.
scrub and resilver are VERY different things. you cannot do either if the pool is unavailable, as there is not enough redundancy for the pool to function.


(restarted host while vm turned off passthrough).
if this means you disabled the passthrough, you might need to re-enable it. whatever weird stuff proxmox does when it passes through disks can make it impossible to use without that passthrough in place. this is one of the reasons is so highly discouraged to do so.

you have paniced and made things far worse. do you have a backup? I wasn't full able to parse everything you put here.

you appear to list 5 systems but do not indicate which is which so I am thoroughly confused
you should have something more like:
dbox:
amd epyc blah blah with #sata #nvme controller type
lbox
amd epyc blah blah with #sata #nvme controller type
unrelated hardware
amd epyc blah blah
 

recode

Cadet
Joined
Mar 11, 2024
Messages
3
Thank you for reply, artlessknave!
and apologize for late reply. I've writing damage report for that.

you have paniced and made things far worse. do you have a backup?
Yes I have a backup but partially. because it is cloudsync-ing to local truenas ;)
And as you noticed, yes.. I paniced..

virtualized truenas inside a virtualized truenas? what the heck?
for Truenas currently in error, it is Proxmox's vm. not nested truenas inside truenas. My apologies for any misunderstanding on my part.

TrueNAS VM should be passed a whole disk controller. the disk-by-id passthroughs and VM disks are usually asking for trouble.
I was in such a rush to build the DBOX that I used the by-id method out of complacency, thinking "it just works, okay go." I had been using PCI passthroughs all others, but I didn't realize that the by-id method I used for the sake of convenience would change things so drastically.

I regret my laziness in using "by-id", I've re-setup to pci-passthrough all disks by setting proper iommu groups.
zdb -l hdd10x16t can be list-able, while zpool import -f hdd10x16t outputs same failure.
Code:
sudo zpool import -f hdd10x16t
cannot import 'hdd10x16t': insufficient replicas
        Destroy and re-create the pool from
        a backup source.


I wonder can I add spare to already exported pool on hdd10x16t (RAIDz2, no spare, no log) and that might be over the hurdle "insufficient replicas"?

And below is zdb -l hdd10x16t. it shows devid and path is same with /dev/disk/by* on current truenas.
Anything can I check or try?
Code:
sudo zdb -l hdd10x16t
------------------------------------
LABEL 0 
------------------------------------
    version: 5000
    name: 'hdd10x16t'
    state: 0
    txg: 1268725
    pool_guid: 14448620205443767059
    errata: 0
    hostid: 2285398396
    hostname: 'recode-hetzner10x16t'
    top_guid: 3320680006278793367
    guid: 5347527874058330893
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 3320680006278793367
        nparity: 2
        metaslab_array: 256
        metaslab_shift: 34
        ashift: 12
        asize: 160008854568960
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 9307791802452327806
            path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZEA4-part1'
            devid: 'ata-ST16000NM001J-2TW113_ZRS0ZEA4-part1'
            phys_path: 'pci-0000:25:00.0-ata-4.0'
            whole_disk: 1
            DTL: 159
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 5722856747687887882
            path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZD7P-part1'
            devid: 'ata-ST16000NM001J-2TW113_ZRS0ZD7P-part1'
            phys_path: 'pci-0000:01:00.0-ata-1.0'
            whole_disk: 1
            DTL: 158
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 12515852929378121397
            path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZESV-part1'
            devid: 'ata-ST16000NM001J-2TW113_ZRS0ZESV-part1'
            phys_path: 'pci-0000:01:00.0-ata-2.0'
            whole_disk: 1
            DTL: 155
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 5347527874058330893
            path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZDE6-part1'
            devid: 'ata-ST16000NM001J-2TW113_ZRS0ZDE6-part1'
            phys_path: 'pci-0000:01:00.0-ata-3.0'
            whole_disk: 1
            DTL: 154
            create_txg: 4
        children[4]:
            type: 'disk'
            id: 4
            guid: 16523393813715497135
            path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZE7G-part1'
            devid: 'ata-ST16000NM001J-2TW113_ZRS0ZE7G-part1'
            phys_path: 'pci-0000:01:00.0-ata-4.0'
            whole_disk: 1
            DTL: 153
            create_txg: 4
        children[5]:
            type: 'disk'
            id: 5
            guid: 846333880863613494
            path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZDMX-part1'
            devid: 'ata-ST16000NM001J-2TW113_ZRS0ZDMX-part1'
            phys_path: 'pci-0000:02:00.1-ata-1.0'
            whole_disk: 1
            DTL: 152
            create_txg: 4
        children[6]:
            type: 'disk'
            id: 6
            guid: 5875470405935328920
            path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZD05-part1'
            devid: 'ata-ST16000NM001J-2TW113_ZRS0ZD05-part1'
            phys_path: 'pci-0000:02:00.1-ata-2.0'
            whole_disk: 1
            DTL: 151
            create_txg: 4
        children[7]:
            type: 'disk'
            id: 7
            guid: 7378420708011026499
            path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZED5-part1'
            devid: 'ata-ST16000NM001J-2TW113_ZRS0ZED5-part1'
            phys_path: 'pci-0000:25:00.0-ata-1.0'
            whole_disk: 1
            DTL: 29336
            create_txg: 4
        children[8]:
            type: 'disk'
            id: 8
            guid: 12934111643194929302
            path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZEEA-part1'
            devid: 'ata-ST16000NM001J-2TW113_ZRS0ZEEA-part1'
            phys_path: 'pci-0000:25:00.0-ata-2.0'
            whole_disk: 1
            DTL: 25315
            create_txg: 4
        children[9]:
            type: 'disk'
            id: 9
            guid: 7616272590451776413
            path: '/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZE3T-part1'
            devid: 'ata-ST16000NM001J-2TW113_ZRS0ZE3T-part1'
            phys_path: 'pci-0000:25:00.0-ata-3.0'
            whole_disk: 1
            DTL: 150
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
        com.klarasystems:vdev_zaps_v2
    labels = 0 1 2 3 


Thank you in advance.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
I wonder can I add spare to already exported pool on hdd10x16t (RAIDz2, no spare, no log) and that might be over the hurdle "insufficient replicas"?
no. a spare is not a data disk. you need 8 data disks online for the pool to function, without that all data is lost.

what's weird is that it looks like all 10 are online yet it still thinks there are insufficient replicas.

if the by-id passthrough was the problem, I wouldn't expect it to show a pool being present at all, unfortunately, I have no further ideas for you, beyond that if you either recover it or rebuild it, you need to get a full backup and a reliable setup.
 

recode

Cadet
Joined
Mar 11, 2024
Messages
3
Thanks for your feedback, even my panicked post.

I completely agree with you that backups are super important. Thanks for sharing your thoughts!
The current situation has made me reconsider how and where I consistently backup and organize my files.
I had accumulated around 800TB of data on Google Drive, and I was simply storing it wherever I could find storage, so I didn't have a good backup strategy.

What I've found so far while writing the damage report, and I'm still hopeful, is that I was inconsistent in uploading to Google Drive in the first place. As a result, there were many files that I uploaded duplicately to different locations.
Thanks to those files, I estimate that around 50% of the files that appeared to be lost in DBOX are recoverable.
I also have a borgbackup (which seems to be somewhat of an unnecessary duplicate), so I think I can recover a bit more if I get the entire borgbackup repository from another BOX.

While I appreciate and acknowledge your feedback, I'm currently not ready to give up on this pool... I believe it has some promising possibilities when I see the import command outputs ONLINE, so I'm not going to remove it immediately, I will continue to explore other options as well. I want to assure you that I don't mean to come across as rude.

what's weird is that it looks like all 10 are online yet it still thinks there are insufficient replicas.
Yes it is weird.. all path, devid, and serial are match..
after I try to re-import on truenas (with iommu passthrough), same story continues,
  • zpool import -f shows ONLINE, that seems promising to me but pool not imported when entered zpool list.
  • only zpool import -f hdd10x16t shows insufficient replicas
Code:
sudo zpool import -f
   pool: hdd10x16t
     id: 14448620205443767059
  state: ONLINE
status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
 config:

        hdd10x16t                                    ONLINE
          raidz2-0                                   ONLINE
            ata-ST16000NM001J-2TW113_ZRS0ZEA4-part1  ONLINE
            ata-ST16000NM001J-2TW113_ZRS0ZD7P-part1  ONLINE
            ata-ST16000NM001J-2TW113_ZRS0ZESV-part1  ONLINE
            ata-ST16000NM001J-2TW113_ZRS0ZDE6-part1  ONLINE
            ata-ST16000NM001J-2TW113_ZRS0ZE7G-part1  ONLINE
            ata-ST16000NM001J-2TW113_ZRS0ZDMX-part1  ONLINE
            ata-ST16000NM001J-2TW113_ZRS0ZD05-part1  ONLINE
            ata-ST16000NM001J-2TW113_ZRS0ZED5-part1  ONLINE
            ata-ST16000NM001J-2TW113_ZRS0ZEEA-part1  ONLINE
            ata-ST16000NM001J-2TW113_ZRS0ZE3T-part1  ONLINE

Code:
sudo zpool list  
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
apps       29.5G  1.17G  28.3G        -         -     8%     3%  1.00x    ONLINE  /mnt
boot-pool    31G  4.76G  26.2G        -         -     4%    15%  1.00x    ONLINE  -

Code:
sudo zpool import -f hdd10x16t
cannot import 'hdd10x16t': insufficient replicas
        Destroy and re-create the pool from
        a backup source.


Again, I really appreciate everything your feedback and time for me.
I'll update this thread when I give up or recover this pool.

Thank you.
 
Top