ZFS Checksum Mismatch?

Status
Not open for further replies.

JellyBean

Cadet
Joined
Mar 10, 2012
Messages
5
Hi, I'm new to this forum so forgive me if this has been asked many times.
I ran a scrub and came up with a bunch of mismatch errors. I am running ZFS with 3D+1P, all the drives are new.
ex.
Apr 3 02:30:08 freenas root: ZFS: checksum mismatch, zpool=zpool1 path=/dev/ada0p2 offset=10630094848 size=45056
Apr 3 02:30:58 freenas root: ZFS: checksum mismatch, zpool=zpool1 path=/dev/ada0p2 offset=8442142720 size=45056
Apr 3 02:30:58 freenas root: ZFS: checksum mismatch, zpool=zpool1 path=/dev/ada0p2 offset=8442097664 size=45056
Apr 3 02:37:24 freenas root: ZFS: checksum mismatch, zpool=zpool1 path=/dev/ada0p2 offset=8457822208 size=45056

I checked the status afterwards and it all seems to be okay but if I run it again I will get more error messages.

[root@freenas] ~# zpool status
pool: zpool1
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: scrub completed after 2h4m with 0 errors on Tue Apr 3 03:29:47 2012
config:

NAME STATE READ WRITE CKSUM
zpool1 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
ada0p2 ONLINE 0 0 65 26.6M repaired
ada1p2 ONLINE 0 0 0
ada2p2 ONLINE 0 0 0
ada3p2 ONLINE 0 0 0

errors: No known data errors

What would cause all these mismatch errors? The drives are used for my media center and are mostly used for reads.
Any insight would be appreciated. Thanks.
 

delphij

FreeNAS Core Team
Joined
Jan 10, 2012
Messages
37
These errors means you have "silent" data corruption on your hard drive, and ZFS have repaired them because it checksums all data. This might be found by your scheduled scrubbing (luckily, you use RAID-Z which provided redundancy to recover from this type of error).

These errors *may* be an indication that you have a bad hard drive or a bad disk controller. Please watch if you continue to see these errors in the future, and replace the hardware in question if they come again.
 

JellyBean

Cadet
Joined
Mar 10, 2012
Messages
5
I'll keep an eye on it and keep scrubbing away but I've run it four times in two days and I haven't see it run clean yet.
 

JellyBean

Cadet
Joined
Mar 10, 2012
Messages
5
Is there a way to prevent these locations from being used?
Sucks to have to replace a drive already.
 

djoole

Contributor
Joined
Oct 3, 2011
Messages
158
I also have CKSUM errors quite often and nearly on every disks.

Here are some stats.
Scrubs are done weekly.

2011-12-05 03:00:00: Starting scrub on zepool
2011-12-05 08:42:09: Finished scrub on zepool
NAME STATE READ WRITE CKSUM
zepool ONLINE 0 0 0
raidz2 ONLINE 0 0 0
gptid/083e4e29-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/089b9ddd-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/090aa30f-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/095871cd-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 1 20K repaired
gptid/09973360-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/09d1e8a0-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/0a256e97-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/0a639b32-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 2 48K repaired

2012-02-20 03:00:00: Starting scrub on zepool
2012-02-20 10:03:10: Finished scrub on zepool
NAME STATE READ WRITE CKSUM
zepool ONLINE 0 0 0
raidz2 ONLINE 0 0 0
gptid/083e4e29-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/089b9ddd-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/090aa30f-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 1 20K repaired
gptid/095871cd-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/09973360-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/09d1e8a0-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 1 24K repaired
gptid/0a256e97-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/0a639b32-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0


2012-03-05 03:00:00: Starting scrub on zepool
2012-03-05 10:52:11: Finished scrub on zepool
NAME STATE READ WRITE CKSUM
zepool ONLINE 0 0 0
raidz2 ONLINE 0 0 0
gptid/083e4e29-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/089b9ddd-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 4 92K repaired
gptid/090aa30f-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/095871cd-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 1 24K repaired
gptid/09973360-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/09d1e8a0-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/0a256e97-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/0a639b32-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0


2012-03-12 03:00:00: Starting scrub on zepool
2012-03-12 11:03:10: Finished scrub on zepool
NAME STATE READ WRITE CKSUM
zepool ONLINE 0 0 0
raidz2 ONLINE 0 0 0
gptid/083e4e29-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 1 20K repaired
gptid/089b9ddd-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 5 24K repaired
gptid/090aa30f-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/095871cd-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 1
gptid/09973360-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/09d1e8a0-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 1 24K repaired
gptid/0a256e97-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
gptid/0a639b32-f75c-11e0-a0ad-f46d045353d3 ONLINE 0 0 0
 

JellyBean

Cadet
Joined
Mar 10, 2012
Messages
5
Are they the same offsets if you run the test again?

This is my latest. Always on the same drive for around the same amount of data. Almost 20Mb is quite a bit :(

NAME STATE READ WRITE CKSUM
zpool1 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
ada0p2 ONLINE 0 0 24 19.8M repaired
ada1p2 ONLINE 0 0 0
ada2p2 ONLINE 0 0 0
ada3p2 ONLINE 0 0 0
 

djoole

Contributor
Joined
Oct 3, 2011
Messages
158
As you can see, no they are not the same offset.
Some times i get scrubs with no error, some times i get some errors, not always the same disks, and just a few K
I don't know if i should worry about that or no...
 

peterh

Patron
Joined
Oct 19, 2011
Messages
315
(single)disk that reports error efter scrub are corrupting data and should be replaced.
If several disks reports errors you have problem with mobo/diskcontroller or memory.
 

djoole

Contributor
Joined
Oct 3, 2011
Messages
158
The 8 disks are on 2 different SATA controllers (but integrated controllers).

Do you know what i could do to troubleshoot the problem?
Thanks in advance
 

survive

Behold the Wumpus
Moderator
Joined
May 28, 2011
Messages
875
Hi djoole,

First thing I would do is memtest the system. Grab a copy from memtest.org and let it run for at least a full pass, I usually let it run overnight.

-Will
 

djoole

Contributor
Joined
Oct 3, 2011
Messages
158
Ok thanks i'll do this. My server is headless.. i assume i will need a screen to launch the test, or is possible to test the memory with FreeNAS launched?
 

survive

Behold the Wumpus
Moderator
Joined
May 28, 2011
Messages
875
Yes, you will need a head on it to see the results.

-Will
 

JellyBean

Cadet
Joined
Mar 10, 2012
Messages
5
Is there a recommended program that can thoroughly test the drive and produce a decent report?
If the drive is still under warranty I would like to have it replaced.
 

Hexland

Contributor
Joined
Jan 17, 2012
Messages
110
Search for Hiren's Boot DVD -- it is absolutely loaded with an invaluable set of diagnostic tools for checking HDD, Memory, etc... I've used HDDGuru in the past (I believe it's on the Hiren DVD) and it did a bunch of low-level checks for me (it was a while ago now, so it may have been superseded with something else by now)
 
Status
Not open for further replies.
Top