SOLVED Resilver every reboot

Status
Not open for further replies.

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Here is today's pool status, only files deleted was the snapshot from 20170424 so no other files (actual files are good only snapshots seems to be infected....
Code:
pool: Data

state: DEGRADED

status: One or more devices has experienced an error resulting in data

	corruption.  Applications may be affected.

action: Restore the file in question if possible.  Otherwise restore the

	entire pool from backup.

  see: http://illumos.org/msg/ZFS-8000-8A

  scan: resilvered 937G in 4h15m with 13 errors on Fri May 26 15:37:05 2017

config:


	NAME											STATE	 READ WRITE CKSUM

	Data											DEGRADED	 0	 0	34

	  raidz2-0									  DEGRADED	 0	 0	68

		gptid/a6a6f8ac-ba6f-11e6-a6a7-6805ca0cfed6  DEGRADED	 0	 0	 0  too many errors

		gptid/a6c19d32-ba6f-11e6-a6a7-6805ca0cfed6  DEGRADED	 0	 0	 0  too many errors

		gptid/a7937aca-ba6f-11e6-a6a7-6805ca0cfed6  DEGRADED	 0	 0	 0  too many errors

		gptid/41a9a23d-3f33-11e7-a87b-000c29bfa44f  ONLINE	   0	 0	 0

		gptid/aaff7d0f-ba6f-11e6-a6a7-6805ca0cfed6  DEGRADED	 0	 0	 0  too many errors

		gptid/ad0ecf4a-ba6f-11e6-a6a7-6805ca0cfed6  DEGRADED	 0	 0	 0  too many errors

		gptid/ac78268c-ba6f-11e6-a6a7-6805ca0cfed6  DEGRADED	 0	 0	 0  too many errors

		gptid/acf30023-ba6f-11e6-a6a7-6805ca0cfed6  DEGRADED	 0	 0	 0  too many errors


errors: Permanent errors have been detected in the following files:


		Data/Videos@Data-auto-20170425.2030:/Series/MacGyver/Macgyver 2x02.avi

		Data/Videos@Data-auto-20170425.2030:/KinderFilms/Suske en Wiske & De Texas Rakkers (2009).mkv

		Data/Videos@Data-auto-20170425.2030:/Series/MacGyver/Macgyver 7x07.avi

		Data/Videos@Data-auto-20170425.2030:/Films/Once Upon a Time in the West (1968).mkv

		Data/Videos@Data-auto-20170425.2030:/KinderFilms/Mary Poppins (1964) 1080p.mkv

		Data/Videos@Data-auto-20170425.2030:/KinderFilms/Pinocchio (1940).mkv

		Data/Videos@Data-auto-20170425.2030:/KinderFilms/Planes (2013).mkv

		Data/Music@Data-auto-20170425.2030:/iTunes/iTunes Media/Movies/Romeo + Juliet/Romeo + Juliet (HD).m4v

		Data/Music@Data-auto-20170425.2030:/iTunes/iTunes Media/Movies/42/42 (1080p HD).m4v

		Data/Music@Data-auto-20170425.2030:/iTunes/iTunes Media/Music/Stotijn/33613/05_Rota-Divertimetno Concertanto-Allegro.flac

		Data/Music@Data-auto-20170425.2030:/iTunes/iTunes Media/Movies/Horrible Bosses/Horrible Bosses (1080p HD).m4v

		Data/Music@Data-auto-20170425.2030:/iTunes/iTunes Media/Movies/Ice Age_ het mysterie van de eieren/Ice Age_ het mysterie van de eieren (1080p HD).m4v

		Data/Music@Data-auto-20170425.2030:/iTunes/iTunes Media/Movies/La crème de la crème/La crème de la crème (1080p HD).m4v


And the same files are mentioned in the snapshots, while the real files are not infected (so it seems).
but why are all disks degraded.... this is not going well....

When you delete a snapshot the faulty blocks will perhaps still be referenced by another snapshot, and then will become part of that snapshot.

The strange thing is that you weren't able to fix the error with 2 drives or redundancy.

A triple failure? Or very bad ram, like you said.
 

ajschot

Patron
Joined
Nov 7, 2016
Messages
341
When you delete a snapshot the faulty blocks will perhaps still be referenced by another snapshot, and then will become part of that snapshot.

The strange thing is that you weren't able to fix the error with 2 drives or redundancy.

A triple failure? Or very bad ram, like you said.
Well the situation was:
ReadyNAS with 6TB copied to FreeNAS (back then AMD A8, 32Gb DDR3 where one module of 8Gb was corrupt), i changed memory, and later on switched to Xeon E5-2558v4 system. But keeped the disks and pools.
So i think it went wrong when copying to FreeNAS, i never setup scrub in FN10/Corral. (stupid)
One drive got smart error (seek time error) i replaced it and then FreeNAS found checksum errors (what could be caused by bad memory).
Also i could play, scrub all the movies infected, but got checksum errors while scrub or resilver. After deleting there were no errors found anymore.

For now i keep an eye on this and scedueled a scrub for every 2 weeks and do a scrub in a couple of days again just to be sure. It is the only explenation i have for this errors.
No read or write errors only checksum errors, that maybe also explaines why i could play the movies but could not copy them.

So only explanation is that by malfunction of memory when putting the files on FreeNAS in the first place, because then the redundant data is also corrupt.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Which of course is exactly why ECC is recommended :)
 

ajschot

Patron
Joined
Nov 7, 2016
Messages
341
Which of course is exactly why ECC is recommended :)
Sorry but bad ECC memory gives more problems! If memory is bad, it is bad and it if it is normal ram or ecc ram, broken is broken.
I have ECC ram but i still don't believe in it, but we will see...
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
If memory is bad, it is bad and it if it is normal ram or ecc ram, broken is broken.
Yes, but if it is ECC RAM, your system will know it is broken--there's an error log in the BIOS setup utility, and if you have IPMI support you can see it there as well. If not, your system will carry on, fat, dumb, and happy, until that bad data breaks something.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Sorry but bad ECC memory gives more problems! If memory is bad, it is bad and it if it is normal ram or ecc ram, broken is broken.
I have ECC ram but i still don't believe in it, but we will see...
What? How so? If ecc is bad it will fix single bit wires and log it. Things keep on working like normal. If it's worse than that it will panic and reboot the system.

Sent from my Nexus 5X using Tapatalk
 

ajschot

Patron
Joined
Nov 7, 2016
Messages
341
What? How so? If ecc is bad it will fix single bit wires and log it. Things keep on working like normal. If it's worse than that it will panic and reboot the system.

Sent from my Nexus 5X using Tapatalk
yes i have to make things clear as the ecc chip is broken it will flip bits that are ok, so you create problems. But like i said i have ecc ram, but this is another discussion ;-)
 
Status
Not open for further replies.
Top