Corrupted files

Blaster6

Cadet
Joined
May 16, 2020
Messages
3
I apparently have some corrupted files that I do not recognize. This leads me to believe this is bad.


FreeBSD 12.2-RELEASE-p9 2ee62d665f0(HEAD) TRUENAS

TrueNAS (c) 2009-2021, iXsystems, Inc.
All rights reserved.
TrueNAS code is released under the modified BSD license with some
files copyrighted by (c) iXsystems, Inc.

For more information, documentation, help or support, go here:
http://truenas.com
Welcome to FreeNAS

Warning: settings changed through the CLI are not written to
the configuration database and will be reset on reboot.

root@freenas[~]# zpool status -xv
pool: pool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: resilvered 2.78T in 05:10:05 with 3 errors on Wed Sep 8 12:12:58 2021
config:

NAME STATE READ WRITE CKSUM
pool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/44dd29c6-1043-11ec-ad24-2cf05d07cc05 ONLINE 0 0 0
gptid/3b3787c5-0ab8-11ec-bdf6-2cf05d07cc05 ONLINE 0 0 0
gptid/0429ad3f-09e3-11ec-99ac-2cf05d07cc05 ONLINE 0 0 0
gptid/4f234214-1094-11ec-8d7f-2cf05d07cc05 ONLINE 0 0 0
cache
gptid/5a7dcf62-8a4f-11ea-b7d4-2cf05d07cc05 ONLINE 0 0 0
gptid/5c4a7c95-8a4f-11ea-b7d4-2cf05d07cc05 ONLINE 0 0 0

errors: Permanent errors have been detected in the following files:

<0xd7>:<0x9b5b>
<0xd7>:<0x10084>
<0xdb>:<0x45c22>
root@freenas[~]#


Some additional info:
I had this error for a couple months and have not found anything corrupted or missing. I started seeing errors when my storage space was almost full. I recently replaced all 4 drives with larder ones and had some difficulty due to no valid replicas but everything was backed up so I just pulled them out one by one and replaced them without taking them offline. Everything seems to be fine with the data I access. Are these some sort of system files? Is there a way to repair this?
 
Joined
Oct 22, 2019
Messages
3,641
Run a full scrub again, then check the status output after it finishes. (Do this overnight and preferably when there will be no major I/O disk usage.)

I had one "error", but it was due to some underlying bug with syncoid + native encryption. A full scrub + a clear got me back to a healthy pool.

For the sake of safety, do you have a backup of this data?
 
Last edited:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Those are metadata files that got corrupted. Pool is toast, backup what you can because you will be rebuilding this.
 

Blaster6

Cadet
Joined
May 16, 2020
Messages
3
These errors have survived several scrubs. I have a complete backup. How do I rebuild? Do I delete the pool and start over.
I don't mind reloading the data but I have to lose all the configuration.
 
Joined
Oct 22, 2019
Messages
3,641
As a last-ditch attempt, since you're going to destroy and rebuild, by chance do you have any residual directories that remained from previously destroyed datasets?

For example,
  1. You create a dataset named pool/sandbox
  2. You do stuff with it
  3. You later decide to destroy it
  4. Little do you realize, there's a phantom folder that remains at /mnt/pool/sandbox

This is what happened when I played around with syncoid and got "permanent" metadata errors that coincided with the phantom folders. (I think I manually removed the phantom folders after destroying the datasets? Either way, the pool started to complain about permanent errors that supposedly cannot be fixed.)

Running scrub correctly fixed this and everything was back to normal. It's back to HEALTHY state with zero errors, permanent or otherwise; and all four drives passed the extended SMART tests.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
You might find this article useful, as it shares similarities to your issue, especially if your drives are fine.

Expand the quote and take note of the part to stop the scrub "within a minute".

The full article goes into more detail, which seems to reflect the same issue I faced with "phantom" folders and associated "errors".

IceSquare said:
https://icesquare.com/wordpress/zfs...rs-have-been-detected-in-the-following-files/

First, make sure that you have no checksum error and the pool is healthy, i.e., all hard drives are online, and all counts are zero.

Next, try to scrub the pool again:

sudo zpool scrub mypool

Within a minute, try to stop the process:

sudo zpool scrub -s mypool



Check the status again. The error should be gone:

Code:
sudo zpool status -v

  pool: mypool

state: ONLINE

  scan: scrub canceled on Sun Feb  3 12:18:06 2019

errors: No known data errors


If the error still presents, you may need to scrub the pool again.
 

Blaster6

Cadet
Joined
May 16, 2020
Messages
3
I don't know what may have been there before because I did not build this. I do know that for the last couple years there were no errors until it started to get full. With the new drives I am 30% full.

I have now run a long SMART test on all drives and got no errors. I still see errors on the same 3 files.
I am now running a scrub on the pool. It looks like it will take a while.
You might find this article useful, as it shares similarities to your issue, especially if your drives are fine.

Expand the quote and take note of the part to stop the scrub "within a minute".

The full article goes into more detail, which seems to reflect the same issue I faced with "phantom" folders and associated "errors".
The article looks exactly like my problem.
How do I make sure the checksums are all 0? What if they aren't?
1631275651471.png


The article implies this won't work if the drives are not all 0 but does not explain how to get them there.

I don't know if it makes a difference but all files are added and deleted through Windows SMB share.
 
Joined
Oct 22, 2019
Messages
3,641
The article looks exactly like my problem.
How do I make sure the checksums are all 0? What if they aren't?
Your first output from your original post shows 0 checksum errors.

However, your second screenshot shows 4 errors (throughout 3 different drives.)

You may in fact be facing a different issue then.

I am now running a scrub on the pool. It looks like it will take a while.

Did you do what the article said and cancel the scrub within a minute?
You can try canceling the scrub within a minute of starting it, after this full scrub completes.

One minute after starting the scrub, issue this command to cancel it:
zpool scrub -s pool


But since you do have checksum errors, it's probably more than just phantom files/folders that don't exist anymore.

Are you using native encryption by any chance?

You might have to resort to starting from a backup if the "cancel within 1 minute trick" doesn't clear it.

I have now run a long SMART test on all drives and got no errors. I still see errors on the same 3 files.
That's one assuring thing. Not as thorough as badblocks, but at least you can rule out read errors for now.
 
Last edited:
Top