Permanent errors detected in files

haze_1986

Cadet
Joined
May 29, 2023
Messages
6
Hi All, I am having issues after a scrub of an existing pool. Previously I seemingly imported a pool fine and I know I had 3 corrupted files. After manually deleting them I tried a scrub again and I am getting a permanent error still

Permanent errors have been detected in the following files:

myPool:<0xdf48>

Should I try to clear this and rescrub again or any other recommendations to fix this?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Hi All, I am having issues after a scrub of an existing pool. Previously I seemingly imported a pool fine and I know I had 3 corrupted files. After manually deleting them I tried a scrub again and I am getting a permanent error still

Permanent errors have been detected in the following files:

myPool:<0xdf48>

Should I try to clear this and rescrub again or any other recommendations to fix this?


You should provide a hardware overview of your system.. and your layout.
For example do you have ECC memory?

Please provide the specific error messages your are seeing..
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
myPool:<0xdf48>

Should I try to clear this and rescrub again or any other recommendations to fix this?
Unfortunately, the response here is likely to be an unpopular one... you have metadata corruption at the root of the pool, so the only way to recover will be a rebuild and restore from backup (which may well not exist).

There is no way to "heal" a pool from that error without a rebuild.

You may find that your time is well spent in understanding how this came about, particularly if you plan to re-use any of the involved disks in the rebuild.

smartctl -a for all the disks would be a good start.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Yes, a zpool status would be helpful.


In general, ZFS does a MUCH better job of protecting metadata than other file systems. Most of the metadata corruption we see here are from non-redundant pools. Or heavily ignored pools, that had multiple disk failures.

On a non-redundant pool, ZFS normally keeps 2 copies of metadata:
Code:
NAME   PROPERTY              VALUE                  SOURCE
rpool  redundant_metadata    all                    default
rpool  copies                1                      default

The "copies" option above changes the configuration a bit. Using 1, (the default), is 1 copy of data, 2 copies of metadata, (and 3 copies of critical metadata). If set to 2, that is 2 copies of data, 3 copies of metadata.

In your case, both metadata copies would have to be corrupt. Rare in redundant pools, except in rare hardware faults.

On a Mirror pool, with 2 disks, ZFS keeps 4 copies of metadata, 2 on the file system, which is then mirrored to another disk, which would then have 2 more copies. Something similar occurs in RAID-Zx pools.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
In your case, both metadata copies would have to be corrupt. Rare in redundant pools, except in rare hardware faults.
From my understanding a lack of ECC RAM could also result in metadata corruption, but it's a very improbable event. Is my understanding right?

Do note that this is likely unrelated to the OP issue.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
From my understanding a lack of ECC RAM could also result in metadata corruption, but it's a very improbable event. Is my understanding right?
...
Yes, non-ECC memory can result in metadata corruption on redundant pools.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
but it's a very improbable event.

People who very desperately want to avoid ECC memory typically want to portray memory bitflips as a "very improbable event" and from a certain perspective this is even somewhat true-ish. The problem is that once you start to conflate issues, people then seem to fall into the fact that ZFS typically stores multiple copies of critical metadata, and somehow through some magic that is supposed to save the day. But this data is not stored as multiple copies in ARC. If you have a block of metadata in ARC, and someone cosmic ray's your memory, your in-ARC unchecksummed copy of the data is corrupted without warning, and then when ZFS goes to update that block, it stores a few new bytes in the block, then pushes some copies out to the pool, quite probably after calculating a new checksum.

The amount of improbability is related to the number of bitflip events.
 
Top