Error Report with Checksum Errors

Status
Not open for further replies.

eichof

Dabbler
Joined
Apr 22, 2015
Messages
13
Hello together

I do need some help, I get an error report from my freenas:
" One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected."
I've found some checksum errors on all 3 disks (3 * WD Red 3TB), what does this mean to my zfs system is it corrupted? What do I have to do know? Can the system heal itself?
I don't use ECC Ram, but I have tested the RAM with ramtest and it is fine.
Thx for your help.

Code:
[root@freenas ~]# zpool status                                                                                                     
  pool: freenas-boot                                                                                                               
state: ONLINE                                                                                                                     
  scan: scrub repaired 0 in 0h2m with 0 errors on Mon Aug  8 03:47:06 2016                                                         
config:                                                                                                                             
                                                                                                                                   
        NAME        STATE     READ WRITE CKSUM                                                                                     
        freenas-boot  ONLINE       0     0     0                                                                                   
          da0p2     ONLINE       0     0     0                                                                                     
                                                                                                                                   
errors: No known data errors                                                                                                       
                                                                                                                                   
  pool: volume1                                                                                                                     
state: ONLINE                                                                                                                     
status: One or more devices has experienced an unrecoverable error.  An                                                             
        attempt was made to correct the error.  Applications are unaffected.                                                       
action: Determine if the device needs to be replaced, and clear the errors                                                         
        using 'zpool clear' or replace the device with 'zpool replace'.                                                             
   see: http://illumos.org/msg/ZFS-8000-9P                                                                                         
  scan: scrub repaired 2.87M in 2h56m with 0 errors on Sun Aug  7 02:56:09 2016                                                     
config:                                                                                                                             
                                                                                                                                   
        NAME                                            STATE     READ WRITE CKSUM                                                 
        volume1                                         ONLINE       0     0     0                                                 
          raidz1-0                                      ONLINE       0     0     0                                                 
            gptid/2a9442d5-a35e-11e4-9d35-d050995092f9  ONLINE       0     0    20                                                 
            gptid/2af34812-a35e-11e4-9d35-d050995092f9  ONLINE       0     0    12                                                 
            ada0                                        ONLINE       0     0    16                                                 
                                                                                                                                   
errors: No known data errors                                     
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
Have you run long smart tests on these drives?
I don't use ECC Ram, but I have tested the RAM with ramtest and it is fine
How many hours did you test your memory?
AND what the heck is "ramtest"???
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Are you shutting down the system periodically or rebooting it?

Have you run long smart tests on these drives?
Agreed, if you are not doing this, please do so. Also report the results.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Verify you have smart testing running and provide the output of smartclt -a /dev/adaX where x is the number of the drive you want to look at.

Also you replaced a drive using the cli which should be fixed.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
You have so many problems with your pool it can't be trusted. Every post you make is about some random thing that no one has ever seen.
The only post I've made recently about "some random thing that no one has ever seen" is the one I just linked above, and that drive has since been replaced. How does that equate to a pool that "can't be trusted"?
 

eichof

Dabbler
Joined
Apr 22, 2015
Messages
13
Ok the problem are the ram's.
Did another memtest86 run about 12 hours and I get errors.
So I'm thinking about building another system this time with ecc ram. My only problem is to find a mini itx board wich didnt cost that much.

So know I have several checksum errors because of the ram, how do I deal with it?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Turn the system off now and hope nothing is serious. Wait to boot until you replace your RAM.

Sent from my Nexus 5X using Tapatalk
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215

eichof

Dabbler
Joined
Apr 22, 2015
Messages
13
Thank you guys, problem is solved

After replacing the ram's the system looks fine, I also fixed the gptid over the gui, it's now resilvering...
Code:
Terminal
[root@freenas ~]# zpool status                                                                                                     
  pool: freenas-boot                                                                                                               
state: ONLINE                                                                                                                     
  scan: scrub repaired 0 in 0h2m with 0 errors on Mon Aug  8 03:47:06 2016                                                         
config:                                                                                                                             
                                                                                                                                   
        NAME        STATE     READ WRITE CKSUM                                                                                     
        freenas-boot  ONLINE       0     0     0                                                                                   
          da0p2     ONLINE       0     0     0                                                                                     
                                                                                                                                   
errors: No known data errors                                                                                                       
                                                                                                                                   
  pool: volume1                                                                                                                     
state: ONLINE                                                                                                                     
status: One or more devices is currently being resilvered.  The pool will                                                           
        continue to function, possibly in a degraded state.                                                                         
action: Wait for the resilver to complete.                                                                                         
  scan: resilver in progress since Sat Sep  3 04:49:59 2016                                                                         
        145G scanned out of 3.02T at 317M/s, 2h38m to go                                                                           
        48.3G resilvered, 4.68% done                                                                                               
config:                                                                                                                             
                                                                                                                                   
        NAME                                            STATE     READ WRITE CKSUM                                                 
        volume1                                         ONLINE       0     0     0                                                 
          raidz1-0                                      ONLINE       0     0     0                                                 
            gptid/2a9442d5-a35e-11e4-9d35-d050995092f9  ONLINE       0     0     0                                                 
            gptid/2af34812-a35e-11e4-9d35-d050995092f9  ONLINE       0     0     0                                                 
            gptid/0a249f81-7181-11e6-841d-d050995092f9  ONLINE       0     0     0  (resilvering)                                   
                                                                                                                                   
errors: No known data errors     
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
A scrub would be redundant here--with only one vdev, the resilver operation is going to have to read all the data on the other disks, and will inherently verify the checksum.
 
Status
Not open for further replies.
Top