TrueNAS Scale incorrectly reporting Mixed Capacity VDEVS

Brandito · Nov 17, 2023

I don't get an output with that command, but I am on Scale and those look like core drive labels.

Brandito · Nov 17, 2023

Figured it out with the help of ai

I counted up the number of drives below listed as home and I'm 6 short, or 1 vdev. I'm guessing that's due to the partitioning issue that started this whole thing. I appended those 6 drives below the other. I modified the command to look at partition 1 instead of 2

Code:

root@truenas[~]# for n in {a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,aa,ab,ac,ad,ae,af,ag}; do
zdb -l "/dev/sd"$n"2" | grep 'name\|txg'
done
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
            create_txg: 4
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
            create_txg: 2241036
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
    name: 'NZB-Scratch'
    txg: 2265203
    hostname: 'truenas'
        create_txg: 4
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
            create_txg: 203597
    name: 'boot-pool'
    txg: 6790201
    hostname: '(none)'
        create_txg: 4
            create_txg: 4
            create_txg: 4
    name: 'boot-pool'
    txg: 6790201
    hostname: '(none)'
        create_txg: 4
            create_txg: 4
            create_txg: 4

#6 drives without swap partition

root@truenas[~]# for n in {a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}; do                    
zdb -l "/dev/sd"$n"1" | grep 'name\|txg'
done
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
    name: 'Home'
    txg: 3080242
    hostname: 'truenas'
        create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030
            create_txg: 2994030

HoneyBadger · Nov 17, 2023

It looks like they all agree on the latest txg value of 3080242 - you could try rolling back manually to an earlier one.

Let's try a combo here:

Code:

echo 0 >> /sys/module/zfs/parameters/spa_load_verify_data
echo 0 >> /sys/module/zfs/parameters/spa_load_verify_metadata
zpool import -f -T 3080241 Home

The first two lines disable verification of data and metadata. Yes, this is normally a really bad thing to do, but when rewinding pools to earlier transactions, the verification can take "hours to days" on large pools especially when RAIDZ is used. The last line tries to import your pool at an earlier transaction group - the latest one is 3080242, so we just N-1 and try a step back in time.

Brandito · Nov 17, 2023

Got a reboot during the import again, do I try 3080240 next?

HoneyBadger · Nov 17, 2023

Yep. We might need to change something to stop it immediately rebooting on panic/failure though if it continues to do it.

Brandito · Nov 17, 2023

HoneyBadger said:
Yep. We might need to change something to stop it immediately rebooting on panic/failure though if it continues to do it.

Tried 3080240, got another reboot

Brandito · Nov 18, 2023

I've been trying the import subtracting 1 each time it fails and I might be making progress. Normally when the import fails the system refuses to boot (hangs completely at boot) on that particular environment. currently trying 3080237

My workflow has been to clone a working environment before running the import so I don't need to deal with a full reinstall. Now at least the machine boots. Not sure it's related, but I haven't changed anything else I feel is relevant.

Just ran zpool import without any flags and got the following output. I didn't get the alert about corrupted metadata before. Is this just a symptom of rolling back to an earlier txg?

Code:

root@truenas[~]# zpool import                   
   pool: NZB-Scratch
     id: 9983942308470126613
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        NZB-Scratch                             ONLINE
          0f5eeaa9-3419-49f0-8601-f7132129256b  ONLINE

   pool: Home
     id: 4985077989531387090
  state: FAULTED
status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-72
 config:

        Home                                      FAULTED  corrupted data
          raidz2-0                                ONLINE
            a7d78b0d-f891-11ed-a2f8-90e2baf17bf0  ONLINE
            a7b00eef-f891-11ed-a2f8-90e2baf17bf0  ONLINE
            a7d01f81-f891-11ed-a2f8-90e2baf17bf0  ONLINE
            a7c951e3-f891-11ed-a2f8-90e2baf17bf0  ONLINE
            a7bfef1b-f891-11ed-a2f8-90e2baf17bf0  ONLINE
            e4f37ae1-f494-4baf-94e5-07db0c38cb0c  ONLINE
          raidz2-1                                ONLINE
            8cca2c8f-39ee-40a6-88e0-24ddf3485aa0  ONLINE
            74f3cc23-1b32-4faf-89cc-ba0cd72ba308  ONLINE
            4e5f5b16-6c2b-4e6b-a907-3e1b9b1c4886  ONLINE
            cde58bb6-9d8e-4cdc-a1bf-847f459b459b  ONLINE
            58c22778-521b-4e8f-aadd-6d5ad17a8f68  ONLINE
            33633f68-920b-4a40-bd4d-45e30b6872bc  ONLINE
          raidz2-2                                ONLINE
            2a2e5211-d4ea-4da9-8ea5-bdabdc542bdb  ONLINE
            56c07fd7-6cb6-4985-9a20-2b5ff9d42631  ONLINE
            1147286d-8cd8-4025-8e5d-bbf06e2bd795  ONLINE
            7e1fa408-7565-4913-b045-49447ef9253b  ONLINE
            3d56d2fa-d505-4bea-b9a2-80c121e4e559  ONLINE
            a9906b32-2690-4f7b-8d8f-00ca915d8f3d  ONLINE
          raidz2-5                                ONLINE
            b8c63108-353b-4ed7-a927-ca3df817bd21  ONLINE
            58782264-02f1-41c6-9b91-d07144cb0ccb  ONLINE
            03df98a5-a86d-4bc8-879a-5cf611d4306c  ONLINE
            022c7ffb-0a07-45cb-b3af-ad1730a08054  ONLINE
            a5786a1f-a7ad-4a30-877a-88a03c94a774  ONLINE
            4c59238e-5cbd-428e-8a72-a018d9dae9c2  ONLINE
        logs
          mirror-6                                ONLINE
            5ba1f70b-be51-470f-94ed-777683425477  ONLINE
            f2605776-46a9-4455-a4bc-322d4cf8a688  ONLINE

Arwen · Nov 18, 2023

You might check the pool's failmode property. If it is set to panic, then it may very well reboot.

Of course, checking an exported pool's property would be tricky. I don't know how to do it with zdb.

But, their may be an easier way;

echo 0 >> /sys/module/zfs/parameters/spa_load_verify_data
echo 0 >> /sys/module/zfs/parameters/spa_load_verify_metadata
zpool import -f -T 3080241 -o failmode=continue Home

If continue does not work, perhaps wait will.

Brandito · Nov 18, 2023

Arwen said:
You might check the pool's failmode property. If it is set to panic, then it may very well reboot.

Of course, checking an exported pool's property would be tricky. I don't know how to do it with zdb.

But, their may be an easier way;

echo 0 >> /sys/module/zfs/parameters/spa_load_verify_data echo 0 >> /sys/module/zfs/parameters/spa_load_verify_metadata zpool import -f -T 3080241 -o failmode=continue Home

If continue does not work, perhaps wait will.

I'll give that a shot, I edited my last post, but should the pool be reporting "The pool metadata is corrupted" if i run 'zpool import' due to the process of trying to roll back?

Brandito · Nov 18, 2023

Arwen said:
echo 0 >> /sys/module/zfs/parameters/spa_load_verify_data echo 0 >> /sys/module/zfs/parameters/spa_load_verify_metadata zpool import -f -T 3080241 -o failmode=continue Home

If continue does not work, perhaps wait will.

Neither wait nor continue kept the machine from rebooting

I also notice zfs-import-cache.service fails to load at boot, the output of service zfs-import-cache status reads

Code:

root@truenas[~]# service zfs-import-cache status      
× zfs-import-cache.service - Import ZFS pools by cache file
     Loaded: loaded (/lib/systemd/system/zfs-import-cache.service; enabled; preset: disabled)
     Active: failed (Result: exit-code) since Sat 2023-11-18 07:33:42 CST; 6min ago
       Docs: man:zpool(8)
   Main PID: 1897 (code=exited, status=1/FAILURE)
        CPU: 209ms

Nov 18 07:33:39 truenas zpool[1897]: cannot import 'Home': I/O error
Nov 18 07:33:42 truenas zpool[1897]: cannot import 'Home': I/O error
Nov 18 07:33:42 truenas zpool[1897]:         Destroy and re-create the pool from
Nov 18 07:33:42 truenas zpool[1897]:         a backup source.
Nov 18 07:33:42 truenas zpool[1897]: cachefile import failed, retrying
Nov 18 07:33:42 truenas zpool[1897]:         Destroy and re-create the pool from
Nov 18 07:33:42 truenas zpool[1897]:         a backup source.
Nov 18 07:33:42 truenas systemd[1]: zfs-import-cache.service: Main process exited, code=exited, status=1/FAILURE
Nov 18 07:33:42 truenas systemd[1]: zfs-import-cache.service: Failed with result 'exit-code'.
Nov 18 07:33:42 truenas systemd[1]: Failed to start zfs-import-cache.service - Import ZFS pools by cache file.

and journalctl -xeu zfs-import-cache.service

Code:

root@truenas[~]# journalctl -xeu zfs-import-cache.service
Nov 18 07:33:31 truenas systemd[1]: Starting zfs-import-cache.service - Import ZFS pools by cache file...
░░ Subject: A start job for unit zfs-import-cache.service has begun execution
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit zfs-import-cache.service has begun execution.
░░
░░ The job identifier is 184.
Nov 18 07:33:39 truenas zpool[1897]: cannot import 'Home': I/O error
Nov 18 07:33:42 truenas zpool[1897]: cannot import 'Home': I/O error
Nov 18 07:33:42 truenas zpool[1897]:         Destroy and re-create the pool from
Nov 18 07:33:42 truenas zpool[1897]:         a backup source.
Nov 18 07:33:42 truenas zpool[1897]: cachefile import failed, retrying
Nov 18 07:33:42 truenas zpool[1897]:         Destroy and re-create the pool from
Nov 18 07:33:42 truenas zpool[1897]:         a backup source.
Nov 18 07:33:42 truenas systemd[1]: zfs-import-cache.service: Main process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ An ExecStart= process belonging to unit zfs-import-cache.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 1.
Nov 18 07:33:42 truenas systemd[1]: zfs-import-cache.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit zfs-import-cache.service has entered the 'failed' state with result 'exit-code'.
Nov 18 07:33:42 truenas systemd[1]: Failed to start zfs-import-cache.service - Import ZFS pools by cache file.
░░ Subject: A start job for unit zfs-import-cache.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit zfs-import-cache.service has finished with a failure.
░░
░░ The job identifier is 184 and the job result is failed.
Nov 18 07:41:28 truenas systemd[1]: Starting zfs-import-cache.service - Import ZFS pools by cache file...
░░ Subject: A start job for unit zfs-import-cache.service has begun execution
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit zfs-import-cache.service has begun execution.
░░
░░ The job identifier is 911.
Nov 18 07:41:30 truenas zpool[10279]: cannot import 'Home': I/O error
Nov 18 07:41:32 truenas zpool[10279]: cannot import 'Home': I/O error
Nov 18 07:41:32 truenas zpool[10279]:         Destroy and re-create the pool from
Nov 18 07:41:32 truenas zpool[10279]:         a backup source.
Nov 18 07:41:32 truenas zpool[10279]: cachefile import failed, retrying
Nov 18 07:41:32 truenas zpool[10279]:         Destroy and re-create the pool from
Nov 18 07:41:32 truenas zpool[10279]:         a backup source.
Nov 18 07:41:32 truenas systemd[1]: zfs-import-cache.service: Main process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ An ExecStart= process belonging to unit zfs-import-cache.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 1.
Nov 18 07:41:32 truenas systemd[1]: zfs-import-cache.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit zfs-import-cache.service has entered the 'failed' state with result 'exit-code'.
Nov 18 07:41:32 truenas systemd[1]: Failed to start zfs-import-cache.service - Import ZFS pools by cache file.
░░ Subject: A start job for unit zfs-import-cache.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit zfs-import-cache.service has finished with a failure.
░░
░░ The job identifier is 911 and the job result is failed.

Arwen · Nov 18, 2023

I have no further suggestions.

It would be really good to know why the pool corruption occurred. Early on in this thread, enabling swap seems to have triggered the corruption. I wonder if SCALE tried to use the first partition as swap, damaging the first part of the new vDev disks.

Brandito · Nov 18, 2023

Arwen said:
I have no further suggestions.

It would be really good to know why the pool corruption occurred. Early on in this thread, enabling swap seems to have triggered the corruption. I wonder if SCALE tried to use the first partition as swap, damaging the first part of the new vDev disks.

I really hope that's not the case. Wouldn't truenas prevent the user from doing something like that with at least a warning?

Brandito · Nov 18, 2023

I made it to 3080229 and the following came up in the console at the bottom of the webui

Code:

Nov 18 08:07:09 truenas.local kernel: WARNING: Pool 'Home' has encountered an uncorrectable I/O failure and has been suspended.

the system also didn't reboot this time, no output from running the import yet, system isn't totally hung but the webui doesn't load much if I open a new browser tab. I also checked the disk shelf and I don't see any drive activity lights for the drives in the pool.

Do I wait or ctr + c or just reboot?

Brandito · Nov 18, 2023

Since there was no activity from the process or on the drives I rebooted, currently at zpool import -f -T 3080218 Home

Occasionally I catch some errors in the console and and copy them, it's mostly some variation of the below

Code:

Nov 18 09:26:54 truenas.local kernel: WARNING: can't open objset 59600, error 5
Nov 18 09:26:54 truenas.local kernel: WARNING: can't open objset 61095, error 5
Nov 18 09:26:54 truenas.local kernel: WARNING: can't open objset 60607, error 5
Nov 18 09:26:54 truenas.local kernel: WARNING: can't open objset 1044, error 5
Nov 18 09:26:54 truenas.local kernel: WARNING: can't open objset 261, error 5
Nov 18 09:26:54 truenas.local kernel: WARNING: can't open objset for 60607, error 5
Nov 18 09:26:54 truenas.local kernel: WARNING: can't open objset for 61095, error 5
Nov 18 09:26:54 truenas.local kernel: WARNING: can't open objset for 1044, error 5
Nov 18 09:26:54 truenas.local kernel: WARNING: can't open objset for 59600, error 5
Nov 18 09:26:54 truenas.local kernel: WARNING: can't open objset for 261, error 5

How far back do I go? Do I starting going n -5 or n -10? I don't know what's reasonable.

If the problem is what Arwen mentioned above with the swap partition, is all lost?

Brandito · Nov 18, 2023

Uncharted territory, I used the command zdb -u -l Home to list uberblocks and started working backwards from uberblock 0 and made it to a block dated the day the pool started having issues. It's earlier in the morning though and now I see there may have been better times to rollback too. Now that it's imported, do I risk exporting and reimporting to a more recent txg or do I just take what I got and run a scrub?

Code:

root@truenas[~]# echo 0 >> /sys/module/zfs/parameters/spa_load_verify_metadata
root@truenas[~]# echo 0 >> /sys/module/zfs/parameters/spa_load_verify_data   
root@truenas[~]# zpool import -f -T 3080103 Home
cannot mount 'Home/Media': Input/output error
Import was successful, but unable to mount some datasets#

I'm able to view the datasets in the datasets section, however storage does not show the pool mounted and nothing shows up when I type ls /mnt which is odd because I have that throwaway pool mounted still

below is the output of zpool status -v Home

Code:

root@truenas[~]# zpool status -v Home
  pool: Home
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: resilvered 1.32T in 02:06:45 with 0 errors on Sat Nov 11 13:45:36 2023
config:

        NAME                                      STATE     READ WRITE CKSUM
        Home                                      ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            a7d78b0d-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0     1
            a7b00eef-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0     1
            a7d01f81-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0     1
            a7c951e3-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0     0
            a7bfef1b-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0     0
            e4f37ae1-f494-4baf-94e5-07db0c38cb0c  ONLINE       0     0     0
          raidz2-1                                ONLINE       0     0     0
            8cca2c8f-39ee-40a6-88e0-24ddf3485aa0  ONLINE       0     0     2
            74f3cc23-1b32-4faf-89cc-ba0cd72ba308  ONLINE       0     0    14
            4e5f5b16-6c2b-4e6b-a907-3e1b9b1c4886  ONLINE       0     0    14
            cde58bb6-9d8e-4cdc-a1bf-847f459b459b  ONLINE       0     0    14
            58c22778-521b-4e8f-aadd-6d5ad17a8f68  ONLINE       0     0     2
            33633f68-920b-4a40-bd4d-45e30b6872bc  ONLINE       0     0     2
          raidz2-2                                ONLINE       0     0     0
            2a2e5211-d4ea-4da9-8ea5-bdabdc542bdb  ONLINE       0     0     0
            56c07fd7-6cb6-4985-9a20-2b5ff9d42631  ONLINE       0     0     0
            1147286d-8cd8-4025-8e5d-bbf06e2bd795  ONLINE       0     0    12
            7e1fa408-7565-4913-b045-49447ef9253b  ONLINE       0     0    12
            3d56d2fa-d505-4bea-b9a2-80c121e4e559  ONLINE       0     0    12
            a9906b32-2690-4f7b-8d8f-00ca915d8f3d  ONLINE       0     0     0
          raidz2-5                                ONLINE       0     0     0
            b8c63108-353b-4ed7-a927-ca3df817bd21  ONLINE       0     0     0
            58782264-02f1-41c6-9b91-d07144cb0ccb  ONLINE       0     0     0
            03df98a5-a86d-4bc8-879a-5cf611d4306c  ONLINE       0     0     0
            022c7ffb-0a07-45cb-b3af-ad1730a08054  ONLINE       0     0     0
            a5786a1f-a7ad-4a30-877a-88a03c94a774  ONLINE       0     0     0
            4c59238e-5cbd-428e-8a72-a018d9dae9c2  ONLINE       0     0     0
        logs
          mirror-6                                ONLINE       0     0     0
            5ba1f70b-be51-470f-94ed-777683425477  ONLINE       0     0     0
            f2605776-46a9-4455-a4bc-322d4cf8a688  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

mtron · Nov 18, 2023

Hey all, just chiming in, I have the same issue with one of our servers. Added a third z2 vdev after the 23.10.0.1 upgrade and am now seeing the same Mixed VDEV Capacities error.

Pre 23.10.0.1 vdev partx -s /dev/

Code:

NR   START         END     SECTORS  SIZE NAME UUID
 1     128     4194304     4194177    2G  
 2 4194432 39063650270 39059455839 18.2T

Post 23.10.0.1 vdev partx -s /dev/

Code:

NR START         END     SECTORS  SIZE NAME UUID
 1  2048 39063650270 39063648223 18.2T

Brandito · Nov 18, 2023

mtron said:
Hey all, just chiming in, I have the same issue with one of our servers. Added a third z2 vdev after the 23.10.0.1 upgrade and am now seeing the same Mixed VDEV Capacities error.

Pre 23.10.0.1 vdev partx -s /dev/

Code:
NR START END SECTORS SIZE NAME UUID 1 128 4194304 4194177 2G 2 4194432 39063650270 39059455839 18.2T

Post 23.10.0.1 vdev partx -s /dev/

Code:
NR START END SECTORS SIZE NAME UUID 1 2048 39063650270 39063648223 18.2T

Hopefully you're not experiencing the other issues I am?

mtron · Nov 18, 2023

I'm waiting on a window to reboot the system, but so far no.

Brandito · Nov 19, 2023

@HoneyBadger based on my progress, what are my next steps?

I want top avoid further damage by blindly moving forward
Pool is imported but no datasets are mounted
checksum errors have increased since yesterday
Should I, can I perform a scrub? The pool doesn't show up under storage, but I assume I can do it from cli

Do I shell out for Klennet? Honestly may be worth it if I can avoid losing the pool entirely. If the damage is limited to the directory I was actively rebalancing during the failure I can deal with that.

Thanks for getting me this far, if not for the help of those in this thread I'd have blown the whole pool away days ago in defeat

HoneyBadger · Nov 19, 2023

The ideal situation is "back up the entire contents of the pool with a zfs send/recv on a separate set of disks" but I'm assuming that's not feasible from a capacity pespective, and trying to do it with a cloud solution might be challenging from a bandwidth or cost perspective (although a one-time expense to shuttle it to the cloud might be cheaper than Klennet)

Importing from the older txg and forcing a scrub will cause any data committed afterwards to be effectively "permanently unrecoverable" but barring hardware failure I wouldn't expect it to do additional damage beyond anything that was already in place.

The increasing checksum errors worry me, as ZFS shouldn't "know about" any data newer than the txg you imported at, which implies that there's potentially something amiss still. Have you checked that cables are secure (both data and power)?

I'm going to see if I can reproduce this issue. Do you happen to recall anything about the series of events regarding the swap size changes, boot device changes, reinstallations or upgrades?

Important Announcement for the TrueNAS Community.

TrueNAS Scale incorrectly reporting Mixed Capacity VDEVS

Explorer

Explorer

actually does care

Explorer

actually does care

Explorer

Explorer

MVP

Explorer

Explorer

MVP

Explorer

Explorer

Explorer

Explorer

Cadet

Explorer

Cadet

Explorer

actually does care

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "TrueNAS Scale incorrectly reporting Mixed Capacity VDEVS"

Similar threads