SOLVED Identifying permanent error/corruption - <metadata>:<0x0>

tarian · Sep 7, 2023

Current version - TrueNAS-13.0-U5.2 (haven't updated because of corruption)
Specs -
Intel Xeon E5-2650L v2 (40 threads)
383.9 GiB DDR3 ECC RAM
Network - 10GbE Mellanox Connect-X3 (MCX311A)
Pool Layout -

Disks:

Long story short: A bad RAM module (replaced and validated RAM integrity) and potentially some impatient upgrading of disks (didn't know resilvers can't run in parallel) caused some corruption in my pool.

Specifically 2 of 3 disks on one of my vdevs report 2 checksum errors and a permanent error in

Code:

<metadata>:<0x0>

was found.
There was also another file that had a permanent error. I deleted the file (wasn't precious data) and the snapshots that had that file in it. I also deleted the folder that file was in as well.

I have cleared the error and scrubbed the pool multiple times. So far the metadata error still exists.

Are there any steps to identify the problem metadata and rebuild it? Or do I need to rebuild the pool to get rid of the error? From what I can tell the rest of the data on the pool is intact with no issues. Should I try deleting all snapshots I have on my pool and see if that clears things up?

I am running

Code:

zdb -U /data/zfs/zpool.cache -c Backup-NAS

to see if it will be able to tell me any more details about the metadata error.

Current output:

So so far there are 4 metadata pointers that are unreadable.

What should I do with this information? ZDB isn't super well documented so I am kind of lost.

Thanks for reading!

EDIT:

tarian said:
An update:

Exporting the pool and reimporting the pool then scrubbing the pool cleared the error.

This was one of the solutions I found.

Basically:
After resolving corrupted data in the pool-
Scrub
Export
Import
Scrub
Repeat if metadata error isn't gone.

Arwen · Sep 7, 2023

While the pool is imported, use the following. It should show you the path to the metadata;
zpool status -v Backup-NAS

You would likely have to copy, (or have a backup of), any files out of the directory referenced. Then remove the entire directory. You can re-create it after.

By default, ZFS keeps 2 copies of standard metadata, and 3 copies of critical metadata. Plus, if I understand it, in a pool with more than 1 vDev, the extra copy(s) are written to separate vDevs to improve reliability. However, your RAM & disk problem probably impacted some metadata.

This extra copy function is controlled by dataset or zVol attribute, which defaults to "all";
redundant_metadata

tarian · Sep 7, 2023

Arwen said:
While the pool is imported, use the following. It should show you the path to the metadata;
zpool status -v Backup-NAS

Ya, sorry I should have just put the result of status -v directly in the post instead of just putting the errors it showed.

Code:

  pool: Backup-NAS
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 3 days 04:18:13 with 1 errors on Wed Sep  6 22:27:01 2023
config:

    NAME                                            STATE     READ WRITE CKSUM
    Backup-NAS                                      ONLINE       0     0     0
      raidz1-0                                      ONLINE       0     0     0
        gptid/f1fdd17e-38a5-11ee-87ea-246e967409d4  ONLINE       0     0     0
        gptid/ef8262b9-3a60-11ee-9b4e-246e967409d4  ONLINE       0     0     0
        gptid/97724fee-3a61-11ee-9b4e-246e967409d4  ONLINE       0     0     0
      raidz1-1                                      ONLINE       0     0     0
        gptid/ad7e4b02-3c06-11ee-9b4e-246e967409d4  ONLINE       0     0     2
        gptid/5b86f526-3e23-11ee-9b4e-246e967409d4  ONLINE       0     0     2
        gptid/7d3af88a-3e23-11ee-9b4e-246e967409d4  ONLINE       0     0     0
      raidz1-2                                      ONLINE       0     0     0
        gptid/3b859f67-9ec1-11ec-ac88-1cc1de324fa7  ONLINE       0     0     0
        gptid/58000732-a2b7-11ec-b0b2-1cc1de324fa7  ONLINE       0     0     0
        gptid/3e099df4-9ec1-11ec-ac88-1cc1de324fa7  ONLINE       0     0     0
      raidz1-4                                      ONLINE       0     0     0
        gptid/4229e152-0fbe-11ed-a0bd-246e967409d4  ONLINE       0     0     0
        gptid/42174aa8-0fbe-11ed-a0bd-246e967409d4  ONLINE       0     0     0
        gptid/4212e36d-0fbe-11ed-a0bd-246e967409d4  ONLINE       0     0     0
    logs 
      mirror-3                                      ONLINE       0     0     0
        gptid/e0fdbec2-a997-11ec-8595-1cc1de324fa7  ONLINE       0     0     0
        gptid/e0ffff39-a997-11ec-8595-1cc1de324fa7  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x0>

Arwen said:
You would likely have to copy, (or have a backup of), any files out of the directory referenced. Then remove the entire directory. You can re-create it after.

So should I just delete the whole dataset the corrupted file was in? Is the dataset holding onto the bad metadata possibly?

Arwen · Sep 8, 2023

Sorry, I don't know the answer to this particular question. The Metadata listed does not have a value I recognize.

Perhaps someone else can give you a better answer than re-make the pool and restore from backups.

tarian · Sep 10, 2023

An update:

Exporting the pool and reimporting the pool then scrubbing the pool cleared the error.

This was one of the solutions I found.

Basically:
After resolving corrupted data in the pool-
Scrub
Export pool
Import pool
Scrub
Repeat if metadata error isn't gone.

Arwen · Sep 10, 2023

Glad that worked.

Even though ZFS can be a pain to work with at times, I really like that it tells you when things are bad. And what was bad. Some file systems or RAID schemes simply can't detect problems and give you garbage back.

tarian · Sep 10, 2023

Arwen said:
Glad that worked.

Even though ZFS can be a pain to work with at times, I really like that it tells you when things are bad. And what was bad. Some file systems or RAID schemes simply can't detect problems and give you garbage back.

Ya, being able to know what went wrong and where really is great!

Important Announcement for the TrueNAS Community.

SOLVED Identifying permanent error/corruption - <metadata>:<0x0>

tarian

Dabbler

Arwen

MVP

tarian

Dabbler

Arwen

MVP

tarian

Dabbler

Arwen

MVP

tarian

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

SOLVED Identifying permanent error/corruption - <metadata>:<0x0>

tarian

Dabbler

Arwen

MVP

tarian

Dabbler

Arwen

MVP

tarian

Dabbler

Arwen

MVP

tarian

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Identifying permanent error/corruption - <metadata>:<0x0>"

Similar threads