Permanent errors detected in files

haze_1986 · May 31, 2023

Hi All, I am having issues after a scrub of an existing pool. Previously I seemingly imported a pool fine and I know I had 3 corrupted files. After manually deleting them I tried a scrub again and I am getting a permanent error still

Permanent errors have been detected in the following files:

myPool:<0xdf48>

Should I try to clear this and rescrub again or any other recommendations to fix this?

morganL · May 31, 2023

haze_1986 said:
Hi All, I am having issues after a scrub of an existing pool. Previously I seemingly imported a pool fine and I know I had 3 corrupted files. After manually deleting them I tried a scrub again and I am getting a permanent error still

Permanent errors have been detected in the following files:

myPool:<0xdf48>

Should I try to clear this and rescrub again or any other recommendations to fix this?

You should provide a hardware overview of your system.. and your layout.
For example do you have ECC memory?

Please provide the specific error messages your are seeing..

sretalla · Jun 1, 2023

haze_1986 said:
myPool:<0xdf48>

Should I try to clear this and rescrub again or any other recommendations to fix this?

Unfortunately, the response here is likely to be an unpopular one... you have metadata corruption at the root of the pool, so the only way to recover will be a rebuild and restore from backup (which may well not exist).

There is no way to "heal" a pool from that error without a rebuild.

You may find that your time is well spent in understanding how this came about, particularly if you plan to re-use any of the involved disks in the rebuild.

smartctl -a for all the disks would be a good start.

Davvo · Jun 1, 2023

Also a zpool status output would be useful.

Arwen · Jun 1, 2023

Yes, a zpool status would be helpful.

In general, ZFS does a MUCH better job of protecting metadata than other file systems. Most of the metadata corruption we see here are from non-redundant pools. Or heavily ignored pools, that had multiple disk failures.

On a non-redundant pool, ZFS normally keeps 2 copies of metadata:

Code:

NAME   PROPERTY              VALUE                  SOURCE
rpool  redundant_metadata    all                    default
rpool  copies                1                      default

The "copies" option above changes the configuration a bit. Using 1, (the default), is 1 copy of data, 2 copies of metadata, (and 3 copies of critical metadata). If set to 2, that is 2 copies of data, 3 copies of metadata.

In your case, both metadata copies would have to be corrupt. Rare in redundant pools, except in rare hardware faults.

On a Mirror pool, with 2 disks, ZFS keeps 4 copies of metadata, 2 on the file system, which is then mirrored to another disk, which would then have 2 more copies. Something similar occurs in RAID-Zx pools.

Davvo · Jun 1, 2023

Arwen said:
In your case, both metadata copies would have to be corrupt. Rare in redundant pools, except in rare hardware faults.

From my understanding a lack of ECC RAM could also result in metadata corruption, but it's a very improbable event. Is my understanding right?

Do note that this is likely unrelated to the OP issue.

Arwen · Jun 2, 2023

Davvo said:
From my understanding a lack of ECC RAM could also result in metadata corruption, but it's a very improbable event. Is my understanding right?
...

Yes, non-ECC memory can result in metadata corruption on redundant pools.

jgreco · Jun 2, 2023

Davvo said:
but it's a very improbable event.

People who very desperately want to avoid ECC memory typically want to portray memory bitflips as a "very improbable event" and from a certain perspective this is even somewhat true-ish. The problem is that once you start to conflate issues, people then seem to fall into the fact that ZFS typically stores multiple copies of critical metadata, and somehow through some magic that is supposed to save the day. But this data is not stored as multiple copies in ARC. If you have a block of metadata in ARC, and someone cosmic ray's your memory, your in-ARC unchecksummed copy of the data is corrupted without warning, and then when ZFS goes to update that block, it stores a few new bytes in the block, then pushes some copies out to the pool, quite probably after calculating a new checksum.

The amount of improbability is related to the number of bitflip events.

Important Announcement for the TrueNAS Community.

Permanent errors detected in files

haze_1986

Cadet

morganL

Captain Morgan

sretalla

Powered by Neutrality

Davvo

MVP

Arwen

MVP

Davvo

MVP

Arwen

MVP

jgreco

Resident Grinch

Similar threads

Important Announcement for the TrueNAS Community.

Permanent errors detected in files

Cadet

Captain Morgan

Powered by Neutrality

MVP

MVP

MVP

MVP

Resident Grinch

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Permanent errors detected in files"

Similar threads