Bibi40k
Contributor
- Joined
 - Jan 26, 2018
 
- Messages
 - 136
 
Maybe this helps someone, maybe there is a more efficient solution.
TrueNAS-SCALE-22.02.3
Pool Status: Unhealthy
Disks with Errors: 5
Total Disks: 6 (data)
Suddenly i got this pool error:
	
		
			
		
		
	
		
		
	
	
		
	
	
		
			
		
		
	
All disks are healthy due to both SHORT and LONG S.M.A.R.T. tests.
I have scrubbed the pool twice with the same result:
The only way i could find more details about corrupted files was this:
After all these steps, i have created a new Family dataset, moved everything from original Family dataset except the corrupted files which i copied from an old snapshot which i restored to a clone dataset.
	
		
			
		
		
	
		
	
	
		
			
		
		
	
Some hardware errors time to time:
	
		
			
		
		
	
			
			TrueNAS-SCALE-22.02.3
Pool Status: Unhealthy
Disks with Errors: 5
Total Disks: 6 (data)
Suddenly i got this pool error:
All disks are healthy due to both SHORT and LONG S.M.A.R.T. tests.
I have scrubbed the pool twice with the same result:
Code:
root@nas1-truenas[~]# zpool status -v Vol1-Z2
  pool: Vol1-Z2
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Tue Sep 20 06:31:16 2022
    7.60T scanned at 4.27G/s, 6.86T issued at 0B/s, 8.17T total
    0B repaired, 83.94% done, no estimated completion time
config:
    NAME                                      STATE     READ WRITE CKSUM
    Vol1-Z2                                   ONLINE       0     0     0
      raidz2-0                                ONLINE       0     0     0
        b68e0c13-0f2a-11e8-812e-b8ca3abd3f7a  ONLINE       0     0     0
        b97fc375-0f2a-11e8-812e-b8ca3abd3f7a  ONLINE       0     0     0
        bb5f70ba-0f2a-11e8-812e-b8ca3abd3f7a  ONLINE       0     0     0
        bc51eff8-0f2a-11e8-812e-b8ca3abd3f7a  ONLINE       0     0     0
        bd3fb2a4-0f2a-11e8-812e-b8ca3abd3f7a  ONLINE       0     0     0
        c00de708-0f2a-11e8-812e-b8ca3abd3f7a  ONLINE       0     0     0
errors: Permanent errors have been detected in the following files:
        Vol1-Z2/Family:<0x0>The only way i could find more details about corrupted files was this:
Code:
root@nas1-truenas[~]# du -sh /mnt/Vol1-Z2/Family/ du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4616.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4618.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4623.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4624.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4625.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4622.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4619.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4617.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4621.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4615.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4620.JPG': Invalid exchange 504G /mnt/Vol1-Z2/Family/
After all these steps, i have created a new Family dataset, moved everything from original Family dataset except the corrupted files which i copied from an old snapshot which i restored to a clone dataset.
Some hardware errors time to time:
Code:
[63959.864290] {13}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[63959.864296] {13}[Hardware Error]: It has been corrected by h/w and requires no further action
[63959.864307] {13}[Hardware Error]: event severity: corrected
[63959.864309] {13}[Hardware Error]:  Error 0, type: corrected
[63959.864311] {13}[Hardware Error]:  fru_text: CorrectedErr
[63959.864312] {13}[Hardware Error]:   section_type: memory error
[63959.864314] {13}[Hardware Error]:   node: 60840 device: 12343
[63959.864325] {13}[Hardware Error]:   error_type: 2, single-bit ECC
[64021.303808] {14}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[64021.303814] {14}[Hardware Error]: It has been corrected by h/w and requires no further action
[64021.303817] {14}[Hardware Error]: event severity: corrected
[64021.303819] {14}[Hardware Error]:  Error 0, type: corrected
[64021.303820] {14}[Hardware Error]:  fru_text: CorrectedErr
[64021.303822] {14}[Hardware Error]:   section_type: memory error
[64021.303824] {14}[Hardware Error]:   node: 60840 device: 12343
[64021.303827] {14}[Hardware Error]:   error_type: 2, single-bit ECC