SOLVED Identifying permanent error/corruption - <metadata>:<0x0>

tarian

Dabbler
Joined
Apr 28, 2021
Messages
12
Current version - TrueNAS-13.0-U5.2 (haven't updated because of corruption)
Specs -
Intel Xeon E5-2650L v2 (40 threads)
383.9 GiB DDR3 ECC RAM
Network - 10GbE Mellanox Connect-X3 (MCX311A)
Pool Layout -
1694137598546.png
Disks:
1694137791454.png

1694137830160.png

Long story short: A bad RAM module (replaced and validated RAM integrity) and potentially some impatient upgrading of disks (didn't know resilvers can't run in parallel) caused some corruption in my pool.

Specifically 2 of 3 disks on one of my vdevs report 2 checksum errors and a permanent error in
Code:
<metadata>:<0x0>
was found.
There was also another file that had a permanent error. I deleted the file (wasn't precious data) and the snapshots that had that file in it. I also deleted the folder that file was in as well.

I have cleared the error and scrubbed the pool multiple times. So far the metadata error still exists.

Are there any steps to identify the problem metadata and rebuild it? Or do I need to rebuild the pool to get rid of the error? From what I can tell the rest of the data on the pool is intact with no issues. Should I try deleting all snapshots I have on my pool and see if that clears things up?

I am running
Code:
zdb -U /data/zfs/zpool.cache -c Backup-NAS
to see if it will be able to tell me any more details about the metadata error.

Current output:
1694137229893.png


So so far there are 4 metadata pointers that are unreadable.

What should I do with this information? ZDB isn't super well documented so I am kind of lost.

Thanks for reading!

EDIT:

An update:

Exporting the pool and reimporting the pool then scrubbing the pool cleared the error.

This was one of the solutions I found.

Basically:
After resolving corrupted data in the pool-
Scrub
Export
Import
Scrub
Repeat if metadata error isn't gone.
 
Last edited:

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
While the pool is imported, use the following. It should show you the path to the metadata;
zpool status -v Backup-NAS

You would likely have to copy, (or have a backup of), any files out of the directory referenced. Then remove the entire directory. You can re-create it after.


By default, ZFS keeps 2 copies of standard metadata, and 3 copies of critical metadata. Plus, if I understand it, in a pool with more than 1 vDev, the extra copy(s) are written to separate vDevs to improve reliability. However, your RAM & disk problem probably impacted some metadata.

This extra copy function is controlled by dataset or zVol attribute, which defaults to "all";
redundant_metadata
 

tarian

Dabbler
Joined
Apr 28, 2021
Messages
12
While the pool is imported, use the following. It should show you the path to the metadata;
zpool status -v Backup-NAS

Ya, sorry I should have just put the result of status -v directly in the post instead of just putting the errors it showed.
Code:
  pool: Backup-NAS
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 3 days 04:18:13 with 1 errors on Wed Sep  6 22:27:01 2023
config:

    NAME                                            STATE     READ WRITE CKSUM
    Backup-NAS                                      ONLINE       0     0     0
      raidz1-0                                      ONLINE       0     0     0
        gptid/f1fdd17e-38a5-11ee-87ea-246e967409d4  ONLINE       0     0     0
        gptid/ef8262b9-3a60-11ee-9b4e-246e967409d4  ONLINE       0     0     0
        gptid/97724fee-3a61-11ee-9b4e-246e967409d4  ONLINE       0     0     0
      raidz1-1                                      ONLINE       0     0     0
        gptid/ad7e4b02-3c06-11ee-9b4e-246e967409d4  ONLINE       0     0     2
        gptid/5b86f526-3e23-11ee-9b4e-246e967409d4  ONLINE       0     0     2
        gptid/7d3af88a-3e23-11ee-9b4e-246e967409d4  ONLINE       0     0     0
      raidz1-2                                      ONLINE       0     0     0
        gptid/3b859f67-9ec1-11ec-ac88-1cc1de324fa7  ONLINE       0     0     0
        gptid/58000732-a2b7-11ec-b0b2-1cc1de324fa7  ONLINE       0     0     0
        gptid/3e099df4-9ec1-11ec-ac88-1cc1de324fa7  ONLINE       0     0     0
      raidz1-4                                      ONLINE       0     0     0
        gptid/4229e152-0fbe-11ed-a0bd-246e967409d4  ONLINE       0     0     0
        gptid/42174aa8-0fbe-11ed-a0bd-246e967409d4  ONLINE       0     0     0
        gptid/4212e36d-0fbe-11ed-a0bd-246e967409d4  ONLINE       0     0     0
    logs 
      mirror-3                                      ONLINE       0     0     0
        gptid/e0fdbec2-a997-11ec-8595-1cc1de324fa7  ONLINE       0     0     0
        gptid/e0ffff39-a997-11ec-8595-1cc1de324fa7  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x0>


You would likely have to copy, (or have a backup of), any files out of the directory referenced. Then remove the entire directory. You can re-create it after.

So should I just delete the whole dataset the corrupted file was in? Is the dataset holding onto the bad metadata possibly?
 
Last edited:

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Sorry, I don't know the answer to this particular question. The Metadata listed does not have a value I recognize.

Perhaps someone else can give you a better answer than re-make the pool and restore from backups.
 

tarian

Dabbler
Joined
Apr 28, 2021
Messages
12
An update:

Exporting the pool and reimporting the pool then scrubbing the pool cleared the error.

This was one of the solutions I found.

Basically:
After resolving corrupted data in the pool-
Scrub
Export pool
Import pool
Scrub
Repeat if metadata error isn't gone.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Glad that worked.

Even though ZFS can be a pain to work with at times, I really like that it tells you when things are bad. And what was bad. Some file systems or RAID schemes simply can't detect problems and give you garbage back.
 

tarian

Dabbler
Joined
Apr 28, 2021
Messages
12
Glad that worked.

Even though ZFS can be a pain to work with at times, I really like that it tells you when things are bad. And what was bad. Some file systems or RAID schemes simply can't detect problems and give you garbage back.
Ya, being able to know what went wrong and where really is great!
 
Top