TrueNAS Scale incorrectly reporting Mixed Capacity VDEVS

Brandito · Nov 22, 2023

Patrick M. Hausen said:
If the data is served by ZFS in a corrupt state to rsync without an error message, then probably (my guess) zfs send and receive would not behave differently. Advantage of rsync is that you can go by directory which you cannot with zfs send. Only full datasets with the latter.

My expectation is that rsync will transfer the files and upon encountering a broken one and hence an I/O error from ZFS, it will lig something like "skipped" and continue or abort altogether. Check the manpage for options - if there are any - to control this.

If abort is the only thing it can do, you can build an exclude list that you add to as you hit broken files. Rsync will always continue from where it stopped so it's good for an iterative process.

Should I be worried about any of this for the purposes of backing up? I'm not sure why that one drive is not part of the vdev, it's connected to the system and shows up as a drive that can be added to a pool in the webUI. That date for the scrub is weird too, at the time of the TXG I imported the pool at I know that scrub was no longer running.

I also see that despite all but one of the drives showing offline with zpool status, in the Webui I see the 6 drives belonging to the newest VDEV showing N/A instead of Home.

Here's the info I collected when I first started trying to roll back to a working TXG and this is the one I mounted today. The timestamp is from the day the pool went down. However it seems like I actually rolled back to Nov 10th which I believe is prior to adding the 4th vdev.

Code:

    Uberblock[7]
        magic = 0000000000bab10c
        version = 5000
        txg = 3080103
        guid_sum = 17417729159499886982
        timestamp = 1699889223 UTC = Mon Nov 13 09:27:03 2023
        mmp_magic = 00000000a11cea11
        mmp_delay = 0
        mmp_valid = 0
        checkpoint_txg = 0
        labels = 0 1 2 3

I was able to copy some of my media to my other pool just to see if it would work and the file plays fine. Even with that one drive gone it's a raidz2 so I don't know if it's worth trying to reattach it to the pool, I assume a resilver would ensue and being the pool is imported read only I doubt it would work anyhow. Seems like a problem to tackle once data is backed up. Maybe this isn't something to be concerned about until I see what data can be saved

Thoughts?

Code:

root@truenas[~]# zpool status Home
  pool: Home
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub in progress since Fri Nov 10 06:01:35 2023
        0B / 135T scanned, 0B / 135T issued
        0B repaired, 0.00% done, no estimated completion time
config:

        NAME                                      STATE     READ WRITE CKSUM
        Home                                      DEGRADED     0     0     0
          raidz2-0                                ONLINE       0     0     0
            a7d78b0d-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0     3
            a7b00eef-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0     3
            a7d01f81-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0     2
            a7c951e3-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0     1
            a7bfef1b-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0     3
            e4f37ae1-f494-4baf-94e5-07db0c38cb0c  ONLINE       0     0     3
          raidz2-1                                ONLINE       0     0     0
            8cca2c8f-39ee-40a6-88e0-24ddf3485aa0  ONLINE       0     0     2
            74f3cc23-1b32-4faf-89cc-ba0cd72ba308  ONLINE       0     0     5
            4e5f5b16-6c2b-4e6b-a907-3e1b9b1c4886  ONLINE       0     0     5
            cde58bb6-9d8e-4cdc-a1bf-847f459b459b  ONLINE       0     0     5
            58c22778-521b-4e8f-aadd-6d5ad17a8f68  ONLINE       0     0     2
            33633f68-920b-4a40-bd4d-45e30b6872bc  ONLINE       0     0     2
          raidz2-2                                ONLINE       0     0     0
            2a2e5211-d4ea-4da9-8ea5-bdabdc542bdb  ONLINE       0     0     5
            56c07fd7-6cb6-4985-9a20-2b5ff9d42631  ONLINE       0     0     5
            1147286d-8cd8-4025-8e5d-bbf06e2bd795  ONLINE       0     0     6
            7e1fa408-7565-4913-b045-49447ef9253b  ONLINE       0     0     9
            3d56d2fa-d505-4bea-b9a2-80c121e4e559  ONLINE       0     0     9
            a9906b32-2690-4f7b-8d8f-00ca915d8f3d  ONLINE       0     0     8
          raidz2-5                                DEGRADED     0     0     0
            b8c63108-353b-4ed7-a927-ca3df817bd21  ONLINE       0     0     0
            58782264-02f1-41c6-9b91-d07144cb0ccb  ONLINE       0     0     0
            03df98a5-a86d-4bc8-879a-5cf611d4306c  ONLINE       0     0     0
            12991589318322434965                  UNAVAIL      0     0     0  was /dev/disk/by-partuuid/1a865d37-0e03-4dd8-a0f4-96f35e6fcfd3
            a5786a1f-a7ad-4a30-877a-88a03c94a774  ONLINE       0     0     0
            4c59238e-5cbd-428e-8a72-a018d9dae9c2  ONLINE       0     0     0
        logs
          mirror-6                                ONLINE       0     0     0
            5ba1f70b-be51-470f-94ed-777683425477  ONLINE       0     0     0
            f2605776-46a9-4455-a4bc-322d4cf8a688  ONLINE       0     0     0

errors: No known data errors

Patrick M. Hausen · Nov 22, 2023

As long as ZFS reports errors: No known data errors and does not encounter any reading your existing data you should be fine. Checksums and stuff ...

NickF · Nov 22, 2023

I agree on rsync being a good play, just mount the destination side via NFS and copy it over.

Code:

mkdir /mnt/NFS_SHARE_NAME
mount -t nfs OTHER_SERVER_FQDN_OR_IP:/path/to/nfs/share /NFS_SHARE_NAME

Patrick M. Hausen · Nov 22, 2023

NickF said:
I agree on rsync being a good play, just mount the destination side via NFS and copy it over.

I would always run rsync over SSH ... why NFS? root to root - authentication via key. You need root privileges to use -a (archive) mode.

NickF · Nov 22, 2023

Patrick M. Hausen said:
I would always run rsync over SSH ... why NFS? root to root - authentication via key. You need root privileges to use -a (archive) mode.

That works too. No particular reason. Basic cp -r over NFS would work fine as well. There be options :)

Just use the file-based ones at this point and not the block-based ones. ZFS saved the day but ZFS may be part of the problem too. Too little info about what "happened" before his pool died. All I know for sure is that ZFS will just refuse to copy the corrupted data, so OP will know what files are broken.

Brandito · Nov 22, 2023

Patrick M. Hausen said:
I would always run rsync over SSH ... why NFS? root to root - authentication via key. You need root privileges to use -a (archive) mode.

The plan was to mount the backup pool on the same system as I have a 45 bay diskshelf with plenty of extra bays for the 10 disk pool

Any issues with that beyond the obvious that it could have been a hardware fault. I might be able to spin up another machine. I have a dl80 gen9 I could throw TrueNAS on. I was hoping to take advantage of the speed of doing the transfer over the same machine.

I'm guessing I wouldn't saturate 10gbit though and I can achieve that over the network

I've also been going back and forth on whether to do a 10 drive raidz2 or 2x 5 drive raidz1. This pool would stick around as long term backup and expanding a 5 drive vdevs is certainly easier. Being backup I am less concerned about redundancy in the long run.

The raidz1 config also offers a bit more storage (based on winter guys calculator) which I may need in order to get everything off my current pool without going too far over 80%

NickF · Nov 22, 2023

Brandito said:
The plan was to mount the backup pool on the same system as I have a 45 bay diskshelf with plenty of extra bays for the 10 disk pool

Any issues with that beyond the obvious that it could have been a hardware fault. I might be able to spin up another machine. I have a dl80 gen9 I could throw TrueNAS on. I was hoping to take advantage of the speed of doing the transfer over the same machine.

I'm guessing I wouldn't saturate 10gbit though and I can achieve that over the network

I've also been going back and forth on whether to do a 10 drive raidz2 or 2x 5 drive raidz1. This pool would stick around as long term backup and expanding a 5 drive vdevs is certainly easier. Being backup I am less concerned about redundancy in the long run.

The raidz1 config also offers a bit more storage (based on winter guys calculator) which I may need in order to get everything off my current pool without going too far over 80%

If you can do it locally on a different system and shuttle data between pools that's always a good option. I didn't want to assume what hardware you had.

Best advise I can give you is to take your time and build the new pool how you want it to stay, which may limit your ability to do it locally. Dont make more work for yourself and build half-steps if you can avoid it. Less moving parts the better.

Also figure out a good backup strategy thats valid and sane where you can re-use the left over parts when you are done. Which, again, may limit your ability to do things locally. Take the time now to save yourself headaches later.

Best of luck, but seems like the worst is over. I'll go knock on some wood now.

Patrick M. Hausen · Nov 22, 2023

Brandito said:
The plan was to mount the backup pool on the same system as I have a 45 bay diskshelf with plenty of extra bays for the 10 disk pool

Then just run rsync locally.

rsync -av /path/to/source/dir /path/to/one/level/above/destination/dir/ - mind the trailing slash for the destination respectively its absence from the source dir. This command will create the source dir inside the new destination and everything that it contains, recursively.

Brandito · Nov 22, 2023

NickF said:
If you can do it locally on a different system and shuttle data between pools that's always a good option. I didn't want to assume what hardware you had.

Best advise I can give you is to take your time and build the new pool how you want it to stay, which may limit your ability to do it locally. Dont make more work for yourself and build half-steps if you can avoid it. Less moving parts the better.

Also figure out a good backup strategy thats valid and sane where you can re-use the left over parts when you are done. Which, again, may limit your ability to do things locally. Take the time now to save yourself headaches later.

Best of luck, but seems like the worst is over. I'll go knock on some wood now.

I wouldn't run the dl80 long term. I use enough power at home, I'd eventually move the backup pool to the same NAS. I know not ideal, but better than what I had before

NickF · Nov 22, 2023

Keep us posted.

Brandito · Nov 22, 2023

NickF said:
Keep us posted.

Running rsync now and I believe at least a couple directories may be lost. It appears to be stuff I ran the rebalancing script on. I don't know if it's the script being modified to work around block cloning, or the block cloning itself. The original script was unable to work with zfs 2.2 due to block cloning, however it took some time for me to discover that so I ended up copying a lot of stuff with no data making it onto the newly added vdev and in that time the machine actually force restarted. I don't know that they're related or just a coincidence.

Also any "unbalanced" data seems to be syncing fine so far. I tested with some of the stuff I know was rebalanced and rsync is reporting I/O errors and checksum erros on zpool status are crazy high when trying to rsync that data.

You'll have to scroll side to side, I'm running everything in a tmux session. I find it strange that 2 of the 3 oldest vdevs have the vast majority of checksum errors.

I'm trying to capture any failed files by running

Code:

rsync -av /source /destination 2>logile.txt

Code:

  │
                                            capacity     operations     bandwidth                                                                                                                                    │  pool: Home
pool                                      alloc   free   read  write   read  write                                                                                                                                   │ state: DEGRADED
----------------------------------------  -----  -----  -----  -----  -----  -----                                                                                                                                   │status: One or more devices has experienced an error resulting in data
Home                                          0   349T  3.92K      0   144M      0                                                                                                                                   │        corruption.  Applications may be affected.
  raidz2-0                                    0  87.3T  1.21K      0  44.9M      0                                                                                                                                   │action: Restore the file in question if possible.  Otherwise restore the
    a7d78b0d-f891-11ed-a2f8-90e2baf17bf0      -      -    207      0  7.49M      0                                                                                                                                   │        entire pool from backup.
    a7b00eef-f891-11ed-a2f8-90e2baf17bf0      -      -    197      0  7.16M      0                                                                                                                                   │   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
    a7d01f81-f891-11ed-a2f8-90e2baf17bf0      -      -    207      0  7.57M      0                                                                                                                                   │  scan: scrub in progress since Fri Nov 10 06:01:35 2023
    a7c951e3-f891-11ed-a2f8-90e2baf17bf0      -      -    209      0  7.55M      0                                                                                                                                   │        0B / 135T scanned, 0B / 135T issued
    a7bfef1b-f891-11ed-a2f8-90e2baf17bf0      -      -    204      0  7.38M      0                                                                                                                                   │        0B repaired, 0.00% done, no estimated completion time
    e4f37ae1-f494-4baf-94e5-07db0c38cb0c      -      -    213      0  7.73M      0                                                                                                                                   │config:
  raidz2-1                                    0  87.3T  1.09K      0  40.1M      0                                                                                                                                   │
    8cca2c8f-39ee-40a6-88e0-24ddf3485aa0      -      -    189      0  6.82M      0                                                                                                                                   │        NAME                                      STATE     READ WRITE CKSUM
    74f3cc23-1b32-4faf-89cc-ba0cd72ba308      -      -    185      0  6.57M      0                                                                                                                                   │        Home                                      DEGRADED     0     0     0
    4e5f5b16-6c2b-4e6b-a907-3e1b9b1c4886      -      -    187      0  6.63M      0                                                                                                                                   │          raidz2-0                                ONLINE       0     0     0
    cde58bb6-9d8e-4cdc-a1bf-847f459b459b      -      -    185      0  6.63M      0                                                                                                                                   │            a7d78b0d-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0 65.5K
    58c22778-521b-4e8f-aadd-6d5ad17a8f68      -      -    183      0  6.58M      0                                                                                                                                   │            a7b00eef-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0 65.5K
    33633f68-920b-4a40-bd4d-45e30b6872bc      -      -    190      0  6.83M      0                                                                                                                                   │            a7d01f81-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0 65.5K
  raidz2-2                                    0  87.3T  1.61K      0  58.6M      0                                                                                                                                   │            a7c951e3-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0 65.5K
    2a2e5211-d4ea-4da9-8ea5-bdabdc542bdb      -      -    277      0  9.76M      0                                                                                                                                   │            a7bfef1b-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0 65.3K
    56c07fd7-6cb6-4985-9a20-2b5ff9d42631      -      -    282      0  9.95M      0                                                                                                                                   │            e4f37ae1-f494-4baf-94e5-07db0c38cb0c  ONLINE       0     0 65.5K
    1147286d-8cd8-4025-8e5d-bbf06e2bd795      -      -    276      0  9.74M      0                                                                                                                                   │          raidz2-1                                ONLINE       0     0     0
    7e1fa408-7565-4913-b045-49447ef9253b      -      -    269      0  9.57M      0                                                                                                                                   │            8cca2c8f-39ee-40a6-88e0-24ddf3485aa0  ONLINE       0     0 51.8K
    3d56d2fa-d505-4bea-b9a2-80c121e4e559      -      -    273      0  9.82M      0                                                                                                                                   │            74f3cc23-1b32-4faf-89cc-ba0cd72ba308  ONLINE       0     0 51.6K
    a9906b32-2690-4f7b-8d8f-00ca915d8f3d      -      -    274      0  9.81M      0                                                                                                                                   │            4e5f5b16-6c2b-4e6b-a907-3e1b9b1c4886  ONLINE       0     0 51.8K
  raidz2-5                                    0  87.3T      0      0  20.5K      0                                                                                                                                   │            cde58bb6-9d8e-4cdc-a1bf-847f459b459b  ONLINE       0     0 51.8K
    b8c63108-353b-4ed7-a927-ca3df817bd21      -      -      0      0  4.09K      0                                                                                                                                   │            58c22778-521b-4e8f-aadd-6d5ad17a8f68  ONLINE       0     0 51.8K
    58782264-02f1-41c6-9b91-d07144cb0ccb      -      -      0      0  4.12K      0                                                                                                                                   │            33633f68-920b-4a40-bd4d-45e30b6872bc  ONLINE       0     0 51.8K
    03df98a5-a86d-4bc8-879a-5cf611d4306c      -      -      0      0  4.10K      0                                                                                                                                   │          raidz2-2                                ONLINE       0     0     0
    12991589318322434965                      -      -      0      0      0      0                                                                                                                                   │            2a2e5211-d4ea-4da9-8ea5-bdabdc542bdb  ONLINE       0     0    17
    a5786a1f-a7ad-4a30-877a-88a03c94a774      -      -      0      0  4.11K      0                                                                                                                                   │            56c07fd7-6cb6-4985-9a20-2b5ff9d42631  ONLINE       0     0     5
    4c59238e-5cbd-428e-8a72-a018d9dae9c2      -      -      0      0  4.09K      0                                                                                                                                   │            1147286d-8cd8-4025-8e5d-bbf06e2bd795  ONLINE       0     0     6
logs                                          -      -      -      -      -      -                                                                                                                                   │            7e1fa408-7565-4913-b045-49447ef9253b  ONLINE       0     0     9
  mirror-6                                    -      -      0      0     55      0                                                                                                                                   │            3d56d2fa-d505-4bea-b9a2-80c121e4e559  ONLINE       0     0    21
    5ba1f70b-be51-470f-94ed-777683425477      -      -      0      0     27      0                                                                                                                                   │            a9906b32-2690-4f7b-8d8f-00ca915d8f3d  ONLINE       0     0    20
    f2605776-46a9-4455-a4bc-322d4cf8a688      -      -      0      0     27      0                                                                                                                                   │          raidz2-5                                DEGRADED     0     0     0
----------------------------------------  -----  -----  -----  -----  -----  -----                                                                                                                                   │            b8c63108-353b-4ed7-a927-ca3df817bd21  ONLINE       0     0     0
Media-Backup                              2.89T   143T      0  1.21K     44   553M                                                                                                                                   │            58782264-02f1-41c6-9b91-d07144cb0ccb  ONLINE       0     0     0
  raidz1-0                                1.44T  71.3T      0    619     22   276M                                                                                                                                   │            03df98a5-a86d-4bc8-879a-5cf611d4306c  ONLINE       0     0     0
    0035f846-da3e-41dd-b4f6-31bbe08914ed      -      -      0    124      4  55.2M                                                                                                                                   │            12991589318322434965                  UNAVAIL      0     0     0  was /dev/disk/by-partuuid/1a865d37-0e03-4dd8-a0f4-96f35e6fcfd3
    b47a16ba-3327-4c93-9c93-192369b45c5b      -      -      0    124      4  55.2M                                                                                                                                   │            a5786a1f-a7ad-4a30-877a-88a03c94a774  ONLINE       0     0     0
    79c0a613-738f-4b4c-84c3-2c62c36c4cda      -      -      0    123      4  55.2M                                                                                                                                   │            4c59238e-5cbd-428e-8a72-a018d9dae9c2  ONLINE       0     0     0
    c5c8519a-270e-4c68-ac01-c9814f3c5d9a      -      -      0    123      4  55.2M                                                                                                                                   │        logs
    9167bfd1-e38a-4050-be9b-4478c2732191      -      -      0    124      4  55.2M                                                                                                                                   │          mirror-6                                ONLINE       0     0     0
  raidz1-1                                1.44T  71.3T      0    618     22   277M                                                                                                                                   │            5ba1f70b-be51-470f-94ed-777683425477

NickF · Nov 22, 2023

That may be what happened, that does make sense. In any case, I would still double check for hardware faults and burn in your stuff before redeploying. It may have been a collision of multiple factors and not only an edge case with that script and block cloning. I am very familiar with that script and have used it several times, understanding what it does plus a potential weird interaction with block cloning certainly seems plausible.

For the state of your VDEVs, check to see if that script targeted the oldest data first. IIRC it logs what it does, and you can cross-reference what you are expecting. If a data was written a long time ago it may be clustered on the same VDEV. That would give you credence to your assumptions right now. I say all of this because your problem presents a bit different than the folks that found the block cloning bug https://github.com/openzfs/zfs/issues/15526. It may still be the same issue, but there may be other variables or not at all.

Wont really know for sure without checking is all I am saying. I just dont want to see you have the problem recur.

Brandito · Nov 23, 2023

28 TB transferred as of right now, one error logged, could be nothing, "Invalid Exchange (52)"

Backup pool is speedier than I thought it would be, transferring at 700MB/s

Brandito · Nov 24, 2023

Correlation not being causation and all that but so far all my corruption is in the directories/files I rebalanced.

Then I saw this post this morning https://www.reddit.com/r/zfs/comments/1826lgs/psa_its_not_block_cloning_its_a_data_corruption/

Should I do as the thread suggests to disable the features, should we all?

Some stuff that was rebalanced get's copied via rsync but a lot, hard to say how much so far, but likely greater >50% reports I/O errors

I'm also see a lot of this spammed in the console

Code:

Nov 24 07:32:25 truenas.local systemd-journald[1004]: /var/log/journal/cf7617868f02491092abd53f9c2769a6/system.journal: Journal header limits reached or header out-of-date, rotating.
Nov 24 07:32:25 truenas.local systemd-journald[1004]: Failed to set ACL on /var/log/journal/cf7617868f02491092abd53f9c2769a6/user-3000.journal, ignoring: Operation not supported
Nov 24 07:32:41 truenas.local systemd-journald[1004]: Data hash table of /var/log/journal/cf7617868f02491092abd53f9c2769a6/system.journal has a fill level at 75.0 (8535 of 11377 items, 6553600 file size, 767 bytes per hash table item), suggesting rotation.
Nov 24 07:32:41 truenas.local systemd-journald[1004]: /var/log/journal/cf7617868f02491092abd53f9c2769a6/system.journal: Journal header limits reached or header out-of-date, rotating.
Nov 24 07:32:41 truenas.local systemd-journald[1004]: Failed to set ACL on /var/log/journal/cf7617868f02491092abd53f9c2769a6/user-3000.journal, ignoring: Operation not supported

Once this last directory is rysnc'd to the best of its ability, is there anything else worth attempting to save the rest? Restore one of the two snapshots, try another txg, run a scrub? I'm assuming if there's corruption it's not going away at this point and I get what I get.

Here's a current output of zpool status for the pool I'm transferring from and the one I'm transferring to.

Code:

root@truenas[~]# zpool status
  pool: Home
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Fri Nov 10 06:01:35 2023
        0B / 135T scanned, 0B / 135T issued
        0B repaired, 0.00% done, no estimated completion time
config:

        NAME                                      STATE     READ WRITE CKSUM
        Home                                      DEGRADED     0     0     0
          raidz2-0                                ONLINE       0     0     0
            a7d78b0d-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0 17.8M
            a7b00eef-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0 17.9M
            a7d01f81-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0 17.6M
            a7c951e3-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0 17.4M
            a7bfef1b-f891-11ed-a2f8-90e2baf17bf0  ONLINE       0     0 16.9M
            e4f37ae1-f494-4baf-94e5-07db0c38cb0c  ONLINE       0     0 17.3M
          raidz2-1                                ONLINE       0     0     0
            8cca2c8f-39ee-40a6-88e0-24ddf3485aa0  ONLINE       0     0 13.3M
            74f3cc23-1b32-4faf-89cc-ba0cd72ba308  ONLINE       0     0 13.1M
            4e5f5b16-6c2b-4e6b-a907-3e1b9b1c4886  ONLINE       0     0 13.1M
            cde58bb6-9d8e-4cdc-a1bf-847f459b459b  ONLINE       0     0 12.9M
            58c22778-521b-4e8f-aadd-6d5ad17a8f68  ONLINE       0     0 13.0M
            33633f68-920b-4a40-bd4d-45e30b6872bc  ONLINE       0     0 13.4M
          raidz2-2                                ONLINE       0     0     0
            2a2e5211-d4ea-4da9-8ea5-bdabdc542bdb  ONLINE       0     0 14.6M
            56c07fd7-6cb6-4985-9a20-2b5ff9d42631  ONLINE       0     0 15.1M
            1147286d-8cd8-4025-8e5d-bbf06e2bd795  ONLINE       0     0 14.4M
            7e1fa408-7565-4913-b045-49447ef9253b  ONLINE       0     0 15.0M
            3d56d2fa-d505-4bea-b9a2-80c121e4e559  ONLINE       0     0 14.8M
            a9906b32-2690-4f7b-8d8f-00ca915d8f3d  ONLINE       0     0 15.5M
          raidz2-5                                DEGRADED     0     0     0
            b8c63108-353b-4ed7-a927-ca3df817bd21  ONLINE       0     0     0
            58782264-02f1-41c6-9b91-d07144cb0ccb  ONLINE       0     0     0
            03df98a5-a86d-4bc8-879a-5cf611d4306c  ONLINE       0     0     0
            12991589318322434965                  UNAVAIL      0     0     0  was /dev/disk/by-partuuid/1a865d37-0e03-4dd8-a0f4-96f35e6fcfd3
            a5786a1f-a7ad-4a30-877a-88a03c94a774  ONLINE       0     0     0
            4c59238e-5cbd-428e-8a72-a018d9dae9c2  ONLINE       0     0     0
        logs
          mirror-6                                ONLINE       0     0     0
            5ba1f70b-be51-470f-94ed-777683425477  ONLINE       0     0     0
            f2605776-46a9-4455-a4bc-322d4cf8a688  ONLINE       0     0     0

errors: 10529808 data errors, use '-v' for a list

  pool: Media-Backup
 state: ONLINE
config:

        NAME                                      STATE     READ WRITE CKSUM
        Media-Backup                              ONLINE       0     0     0
          raidz1-0                                ONLINE       0     0     0
            0035f846-da3e-41dd-b4f6-31bbe08914ed  ONLINE       0     0     0
            b47a16ba-3327-4c93-9c93-192369b45c5b  ONLINE       0     0     0
            79c0a613-738f-4b4c-84c3-2c62c36c4cda  ONLINE       0     0     0
            c5c8519a-270e-4c68-ac01-c9814f3c5d9a  ONLINE       0     0     0
            9167bfd1-e38a-4050-be9b-4478c2732191  ONLINE       0     0     0
          raidz1-1                                ONLINE       0     0     0
            daf4bc15-ec2c-4edd-a883-b95ba4364aaf  ONLINE       0     0     0
            0468e835-0b50-4e67-b731-6395aaf22e81  ONLINE       0     0     0
            f2ff2d1f-a54e-4ab4-a983-2a83a54ac8bd  ONLINE       0     0     0
            f39799df-6137-4449-b197-6ab28b714429  ONLINE       0     0     0
            4b081ffe-a028-49a3-bfb1-52a04a27e396  ONLINE       0     0     0

errors: No known data errors

Those checksums go up anytime I try to rsync from a rebalanced directory.

Edit: I notice no activity on my newest vdev, I know there's data on it, I feel like I could recover a lot more data than I currently am if that vdev would kick in. Not sure if it's from using too old of a TXG or what the issue might be.

Any suggestions?

Etorix · Nov 24, 2023

Brandito said:
Then I saw this post this morning https://www.reddit.com/r/zfs/comments/1826lgs/psa_its_not_block_cloning_its_a_data_corruption/

Should I do as the thread suggests to disable the features, should we all?

That's not (yet?) an official recommendation, but that seems to be the direction taken in this thread:

Silent corruption with OpenZFS (ongoing discussion and testing)

Happy Thanksgiving, everyone! Great timing, eh? A "silent corruption" bug has been discovered recently in OpenZFS. Here is the bug report for reference. (It's a long read, and the discussion and investigation is still ongoing): https://github.com/openzfs/zfs/issues/15526 From what I can...

www.truenas.com

Brandito · Nov 29, 2023

Little update. I was able to save around 50% of a directory I estimate was ~40TB

During the rsync process I noticed it was a very random distribution of files that transferred. It literally looked like a coin was flipped for each file as to whether it would be corrupt or not. I also noticed my newest VDEV had no activity on any of its drives during the entire process.

Mounting the pool in any way other than read-only seems to want to crash truenas so no type of scrub seems possible, assuming that would do anything to begin with.

I think my next step is to install windows and run the evaluation copy of klennet to see if it would even be possible to salvage the rest.

I also want to try moving that vdev to different slots on my diskshelf in the off chance that there's something wrong with just those 6 slots (seems unlikely but worth a shot)

Overall I'm pretty satisfied with what was salvaged though and what everyone here helped me accomplish. Thanks again everyone who helped me on this journey!

Brandito · Dec 4, 2023

Possibly one last update. Saved everything I could, didn't bother with klennet. I rebuilt the pool with the same drives and layout minus the zil for now. Also added the suggested fix for the issues brought to light by block clone to my pre-init just in case that was my problem.

I really don't know for sure what happened. Everything is working fine now. Could have been the the bug discovered in zfs 2.2 or something to do with the missing swap partitions on my newest vdev. I wish I knew for sure.

I ran a scrub on the drive after transfering all my data via a combination of rsync and replication.

I did have a small hiccup with replication. It started destroying my datasets on my newly created vdev. I used to to clone 3 datasets from one of my backups and on the third one, even though I set it as example: clone /mnt/WD-/Backup > /mnt/Home/Backup, it started attempting to destroy all my datasets on /mnt/Home. It succeeded in a couple that were easy enough to just re-copy from backup. I don't know if it's because I hadn't created a Backup dataset manually on my Home zpool yet, but that's ultimately what I needed to do. I didn't need to do that with the other two datasets

That could have been user error on my part, but TrueNAS didn't appear to make me aware it would destroy datasets before running the replication task either.

Either way, I'm back up thanks to all the help I received in this thread. I'm extremely grateful to all of you!

bklyngaucho · Jan 17, 2024

HoneyBadger said:
Is it fixable - possibly only through the "remove swap from all other vdevs and expand them to match" method. RAIDZ2 means that a full vdev evacuation and removal is off the table unfortunately.

Does it require fixing - no, IMHO. The capacity mismatch is so minor (2GB out of 16TB is basically a rounding error) that it won't have any measurable impact on balancing/metaslab sizes.

Got your debug - I'll link the ticket here and @ you when it's in.

I am also facing this "issue". I added another mirror vdev to an existing pool and see now get that alert about capacity mismatch. I suspect it was because I had, some time ago, moved the swap off the pool pool with the existing drives. But still those 2G partitions persist. Those partitions do not exist on the newly added drives on the new vdev.

I definitely don't want to screw things up just to remove the alert (as much as it bugs me to see it there). Just curious though: where is this "remove swap from all other vdevs and expand them to match" method described?

HoneyBadger · Jan 17, 2024

bklyngaucho said:
I am also facing this "issue". I added another mirror vdev to an existing pool and see now get that alert about capacity mismatch. I suspect it was because I had, some time ago, moved the swap off the pool pool with the existing drives. But still those 2G partitions persist. Those partitions do not exist on the newly added drives on the new vdev.

I definitely don't want to screw things up just to remove the alert (as much as it bugs me to see it there). Just curious though: where is this "remove swap from all other vdevs and expand them to match" method described?

In your case, with mirrors in use, you could potentially remove the mirror vdev through the UI, set your swap size back to 2G, and then re-add it - but there's a fix coming (I believe in 23.10.2) that should only alert if the size difference is >2GB, which means that this would be hidden.

The "remove swap from all other vdevs" would be an unsupported (and not recommended, honestly) process, that involves deleting the swap partitions from each of the other disks, manually expanding the partitions, and then having ZFS expand into those new partitions. Even knowing the necessary steps myself, I'd be really hesitant to do this just to clear up the alert.

Important Announcement for the TrueNAS Community.

TrueNAS Scale incorrectly reporting Mixed Capacity VDEVS

Explorer

Hall of Famer

Guru

Hall of Famer

Guru

Explorer

Guru

Hall of Famer

Explorer

Guru

Explorer

Guru

Explorer

Explorer

Wizard

Explorer

Explorer

Dabbler

actually does care

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "TrueNAS Scale incorrectly reporting Mixed Capacity VDEVS"

Similar threads