Bibi40k
Contributor
- Joined
- Jan 26, 2018
- Messages
- 136
Maybe this helps someone, maybe there is a more efficient solution.
TrueNAS-SCALE-22.02.3
Pool Status: Unhealthy
Disks with Errors: 5
Total Disks: 6 (data)
Suddenly i got this pool error:
All disks are healthy due to both SHORT and LONG S.M.A.R.T. tests.
I have scrubbed the pool twice with the same result:
The only way i could find more details about corrupted files was this:
After all these steps, i have created a new Family dataset, moved everything from original Family dataset except the corrupted files which i copied from an old snapshot which i restored to a clone dataset.
Some hardware errors time to time:
TrueNAS-SCALE-22.02.3
Pool Status: Unhealthy
Disks with Errors: 5
Total Disks: 6 (data)
Suddenly i got this pool error:
All disks are healthy due to both SHORT and LONG S.M.A.R.T. tests.
I have scrubbed the pool twice with the same result:
Code:
root@nas1-truenas[~]# zpool status -v Vol1-Z2 pool: Vol1-Z2 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub in progress since Tue Sep 20 06:31:16 2022 7.60T scanned at 4.27G/s, 6.86T issued at 0B/s, 8.17T total 0B repaired, 83.94% done, no estimated completion time config: NAME STATE READ WRITE CKSUM Vol1-Z2 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 b68e0c13-0f2a-11e8-812e-b8ca3abd3f7a ONLINE 0 0 0 b97fc375-0f2a-11e8-812e-b8ca3abd3f7a ONLINE 0 0 0 bb5f70ba-0f2a-11e8-812e-b8ca3abd3f7a ONLINE 0 0 0 bc51eff8-0f2a-11e8-812e-b8ca3abd3f7a ONLINE 0 0 0 bd3fb2a4-0f2a-11e8-812e-b8ca3abd3f7a ONLINE 0 0 0 c00de708-0f2a-11e8-812e-b8ca3abd3f7a ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: Vol1-Z2/Family:<0x0>
The only way i could find more details about corrupted files was this:
Code:
root@nas1-truenas[~]# du -sh /mnt/Vol1-Z2/Family/ du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4616.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4618.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4623.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4624.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4625.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4622.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4619.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4617.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4621.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4615.JPG': Invalid exchange du: cannot access '/mnt/Vol1-Z2/Family/Bruxelles/IMG_4620.JPG': Invalid exchange 504G /mnt/Vol1-Z2/Family/
After all these steps, i have created a new Family dataset, moved everything from original Family dataset except the corrupted files which i copied from an old snapshot which i restored to a clone dataset.
Some hardware errors time to time:
Code:
[63959.864290] {13}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 [63959.864296] {13}[Hardware Error]: It has been corrected by h/w and requires no further action [63959.864307] {13}[Hardware Error]: event severity: corrected [63959.864309] {13}[Hardware Error]: Error 0, type: corrected [63959.864311] {13}[Hardware Error]: fru_text: CorrectedErr [63959.864312] {13}[Hardware Error]: section_type: memory error [63959.864314] {13}[Hardware Error]: node: 60840 device: 12343 [63959.864325] {13}[Hardware Error]: error_type: 2, single-bit ECC [64021.303808] {14}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 [64021.303814] {14}[Hardware Error]: It has been corrected by h/w and requires no further action [64021.303817] {14}[Hardware Error]: event severity: corrected [64021.303819] {14}[Hardware Error]: Error 0, type: corrected [64021.303820] {14}[Hardware Error]: fru_text: CorrectedErr [64021.303822] {14}[Hardware Error]: section_type: memory error [64021.303824] {14}[Hardware Error]: node: 60840 device: 12343 [64021.303827] {14}[Hardware Error]: error_type: 2, single-bit ECC