Possible silent corruption?

Status
Not open for further replies.

Meekolai

Dabbler
Joined
Nov 7, 2012
Messages
10
I am running FreeNAS as a media server for my home. I use it only for storing and sharing video files to any devices on my LAN. My server is about 6 months old and I believe I am starting to see evidence of silent corruption.

I have the system set to auto scrub every month, and I scrub before uploading new videos.

My server:
FreeNAS-8.3.1-RELEASE-x64 (r13452)
AMD Athlon II x2 270
Gigabyte 990xa-ud3
8G ECC Ram

6x 2TB WD Red in Raidz1
The 6 drives are connected to the onboard sata controllers.

I have one ZFS volume and 2 datasets; one for movies and one for shows.

This is the only example I have noticed so far:
corrupt.jpg


Volume status reports no errors across the board. I am running a 72+ hour memtest at the moment, but am not sure what else to try if/when it comes back with out errors.

Nothing on the server is irreplaceable, but the time invested to rip, encode and organize is.

Could I be looking at a problem only in the starting or a bit of corruption that slipped past the scrub/checksum check?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The odds of corruption that slipped past ZFS is astronomically low. We're talking 1 bit in 10000 years or something. I'd bet its more likely the encode or decoder for your video is causing the glitch. I'm not sure what program/codec you were using, but you may want to try a different one. If you didn't use VLC you might want to try it. If you are a Windows user, just because you used Windows Media Player and then Windows Media Classic doesn't mean you used a different codec. Most likely both used the same codec. VLC uses it's own codec system that is strictly internal.
 

Meekolai

Dabbler
Joined
Nov 7, 2012
Messages
10
I use handbrake to do my encodes using x264 and the mkv container. I use VLC or XBMC for playback.

I have watched this video file before and don't remember seeing this error. As this is the first error/problem I have had in 6 months I'm hoping this is a fluke. It's possible there was a problem with this file from the beginning. I'm hoping there isn't any issue server side.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I use Handbrake also. I also useVLC and Plex mostly myself, and I know Plex recently had a problem that affect a large number of my videos. The issue seemed to resolve itself through a few updates.

As a safety check you could do a RAM test. If your RAM isn't ECC you may be seeing corruption from bad RAM. This applies to both the server and the desktop. Basically the entire "data path" from the data being on the disks to the data being on your screen is suspect until proven innocent.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Howdy there.

The ECC functionality of consumer-level motherboards is always a little bit in question for me. Yes, they will support it in that they will run with ECC RAM installed, but they may not actually use that ECC functionality, despite the flags being in the BIOS in some cases. See this thread at HardOCP where users are discussing this point, specifically the latter portion of the thread where a GB tech support rep says something to the effect of "ECC will work, but won't actually be ECC"

Keep us posted on the memtest status.
 

joelmusicman

Patron
Joined
Feb 20, 2014
Messages
249
Howdy there.

The ECC functionality of consumer-level motherboards is always a little bit in question for me. Yes, they will support it in that they will run with ECC RAM installed, but they may not actually use that ECC functionality, despite the flags being in the BIOS in some cases. See this thread at HardOCP where users are discussing this point, specifically the latter portion of the thread where a GB tech support rep says something to the effect of "ECC will work, but won't actually be ECC"

Keep us posted on the memtest status.

Wow, that's pretty lame. The mfg can still claim "ECC compatible" because the system won't barf when presented with ECC ram, but we're certainly not getting what we think we are. Glad I didn't cheap out on my build and went with Intel. That said, I would get more of a warm fuzzy if the BIOS or POST screens were more clear about the ECC status.

I know there's a command to simulate a memory error, but where to I go to see if the ECC caught it? IPMI?
 

Starpulkka

Contributor
Joined
Apr 9, 2013
Messages
179
Have you opened video from server or downloaded it to pc and opened there. There are cases that intel nics have bitflips. Im guessing that if your video is old and you used old videocodec it was clean and then and "later" this year you have never videocodec wich do things just like that (have seen it for myself). For to be sure is file corrupted check sfv or md5 of that file (if you dont have md5's for files perhaps it is good time to start doing at least now). PS. memtest on your pc might not hurt either..
 

Meekolai

Dabbler
Joined
Nov 7, 2012
Messages
10
Very interesting point about consumer desktop grade motherboards not having true ECC support; I would not be surprised if this is the culprit. I will look into upgrading to a server grade motherboard in the future. I had to stop my memtest at 5 passes due to a power outage and the ups not having enough charge to last. I have restarted it, but am not expecting errors.

As for checking hashes, the thought had crossed my mind. I have done SHA256 hashes of my larger backups for periodic testing later on. I am also planning on doing extended SMART tests outside of FreeNAS when memtest is done.

That thing about ECC ram on consumer grade boards sounds like the issue the more I think about it (and read through that thread).
 

Meekolai

Dabbler
Joined
Nov 7, 2012
Messages
10
Memtest ran for 25 passes without any errors. The extended SMART tests produced nothing out of the ordinary.

It could be possible that this file was corrupt before being copied to this dataset. I am going to put this on the back burner for now but I am going to periodically check those SHA hashes.

Thanks for all the information and ideas everyone.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
To be honest, I really do consider that to be the most likely cause for your symptom.
 
Status
Not open for further replies.
Top