Permanent Errors in RAIDZ2 pool, scrubs can't clean it. Thoughts on how to fix it?

Status
Not open for further replies.
Joined
Feb 9, 2017
Messages
8
So I will admit I have some older drives that have generally performed happy and healthy and pass their smart tests. I also do not have ECC ram running in the box. I do have snapshots running on each volume, and I am doing some light iSCSI off for utility boxes with VMware.

With all that being said the bulk of the box is files and related storage.

I recently received a warning in the console about a degradation in the pool:
Code:
=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2017.04.05 10:52:29 =~=~=~=~=~=~=~=~=~=~=~=
zpool status -xv DeepPool

  pool: DeepPool
state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub in progress since Wed Apr  5 10:36:46 2017
		5.89G scanned out of 23.8T at 6.37M/s, (scan is slow, no estimated time)
		0 repaired, 0.02% done
config:

	NAME											STATE	 READ WRITE CKSUM
	DeepPool										ONLINE	   0	 0	 0
	 raidz2-0									  ONLINE	   0	 0	 0
	   gptid/f90e499e-6c73-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	   gptid/f9d182fb-6c73-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	   gptid/fa8d66fd-6c73-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	   gptid/fb4e2db9-6c73-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	   gptid/fc115f3e-6c73-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	   gptid/fcdbdd0b-6c73-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	   gptid/fd9bd3ec-6c73-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	   gptid/fe5f4821-6c73-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	   gptid/ff13b7d2-6c73-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	   gptid/ffb398bf-6c73-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	 raidz2-1									  ONLINE	   0	 0	 0
	   gptid/006105c6-6c74-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	   gptid/01022727-6c74-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	   gptid/01ce519e-6c74-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	   gptid/0292a50f-6c74-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	   gptid/0359965c-6c74-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	   gptid/0416c60a-6c74-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	   gptid/04c825c5-6c74-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	   gptid/0574ab94-6c74-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	   gptid/0618d25c-6c74-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0
	   gptid/06c68e1f-6c74-11e6-a2df-60a44c62874e  ONLINE	   0	 0	 0

errors: Permanent errors have been detected in the following files:

		<0xf11>:<0x1>

I am not able to locate that file, and am still working on backing up all the data externally.
I have attached a drive smart dump, and the zpool status dump as well.

The snapshots I have were kicked up after I had a similar issue that the scrubs said to have cleared away. However with the most recent scrub this same file location came back. (about a month in between, I run scrubs every other day so I passed a couple cycles before today's scrub caught it again. I am hoping I could get some guidance on how to find and pull that file that keeps coming back.
 

Attachments

  • DriveDump.txt
    175.8 KB · Views: 344
Last edited by a moderator:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Only option is to backup, destroy your pool and rebuild it. There is metadata corruption that isn't fixable. Monitor your system better, you either have 2 bad disks or bad memory.

Sent from my Nexus 5X using Tapatalk
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I agree with @SweetAndLow.

Almost all your drives are running hot (40C or above).

da2 has far too many reallocations for comfort.

da3, da4, da5, da6, da15, da16, da17 and da18 look flaky, in that they have logged errors at some point. Some of those errors may be due to outside factors and not the drives themselves.

Overall, it looks like a system that hasn't had the TLC it deserves from day 1.
 
Status
Not open for further replies.
Top